Proceedings of the International scientific and practical conference ―Education and Scientific Progress‖ (April 24-26, 2026) / Publisher website: www.naukainfo.com. – Manchester, United Kingdom, 2026. - 218 p.
197 UDC 004.891.2:519.237.8 Sozanskyi Nazarii Tarasovych master‘s student Lviv Polytechnic National University Department of Information Systems and Networks Lviv, Ukraine ORCID: 0009-0003-0170-9054 A COMPARATIVE ANALYSIS OF CLUSTERING METHODS FOR MULTIMODAL DATASETS Abstract. The paper presents a comparative analysis of clustering methods - k- means, DBSCAN, Gaussian Mixture Models and Spectral Clustering - applied to multimodal datasets containing numerical, textual and categorical features. The theoretical foundations of each method are discussed, procedures for heterogeneous data preprocessing and selection of appropriate similarity metrics are described. Computational experiments evaluate partition quality using the Calinski-Harabasz index, Davies-Bouldin index and Silhouette Coefficient. It is established that GMM and Spectral Clustering demonstrate consistently higher quality on multimodal datasets, while k-means retains an advantage in computational speed. Practical recommendations for method selection based on dataset structure are proposed. Keywords: clustering, multimodal data, k-means, DBSCAN, Gaussian Mixture Models, spectral clustering, silhouette index. Clustering is a key task in unsupervised learning, the essence of which lies in grouping objects based on maximizing intra-cluster similarity and minimizing inter- cluster similarity [1, p. 341]. Over the past decade, the scale and variability of data in industrial systems have increased significantly: modern datasets increasingly
Made with FlippingBook
RkJQdWJsaXNoZXIy MTAxMzIwNA==