Proceedings of the International scientific and practical conference ―Education and Scientific Progress‖ (April 24-26, 2026) / Publisher website: www.naukainfo.com. – Manchester, United Kingdom, 2026. - 218 p.
203 spectral approach [10, p. 1410]. The GMM algorithm ranked second (SC = 0.40). The low performance of k-means (SC = 0.31) and DBSCAN (SC = 0.29) confirms the limitations of these methods when working with sparse text spaces. On the DS-3 dataset (medical data with mixed features), the GMM model demonstrated the highest performance (SC = 0.49), while the results of spectral clustering were comparable (SC = 0.48). It is worth noting that the Davis-Bouldin (DB) index for the DBSCAN algorithm on this dataset was 2.14, which is the highest value among the methods studied and indicates the low quality of the resulting clustering. Table 3 Average training time (seconds), averaged over 10 runs Dataset k-means DBSCAN GMM Spectral DS-1 (n = 4338) 0.8 4.2 3.1 18.7 DS-2 (n = 3000) 1.2 5.8 4.0 22.4 DS-3 (n = 1025) 0.3 1.1 0.9 5.3 The data presented in Table 3 confirm the significant performance advantage of the k-means algorithm: the processing time for the largest dataset, DS-1, was 0.8 s, whereas for spectral clustering, this figure reached 18.7 s. The quadratic computational complexity of the Laplacian matrix, O(n²), is a significant limitation of spectral methods [10, p. 1412]. For effective scaling to large datasets, it is advisable to use approximation methods, in particular the Nyström method [14, p. 2152]. Based on the analysis, the following recommendations have been formulated regarding the selection of a clustering method for multimodal datasets: k-means: the optimal solution when numerical features predominate, spherical clusters are present, and there is a need to minimize computational costs or ensure real-time data processing. DBSCAN: it is recommended for use when the dataset contains a significant amount of noise, the clusters have arbitrary geometric shapes, and there is no prior information about their number.
Made with FlippingBook
RkJQdWJsaXNoZXIy MTAxMzIwNA==