Proceedings of the International scientific and practical conference ―Education and Scientific Progress‖ (April 24-26, 2026) / Publisher website: www.naukainfo.com. – Manchester, United Kingdom, 2026. - 218 p.

204  GMM: the recommended option for working with normally distributed data, tasks requiring probabilistic interpretation (soft clustering), as well as for analyzing mixed-type datasets of medium size.  Spectral clustering: demonstrates the highest performance when working with non-linearly structured data (particularly text data), in cases of low linear separability of clusters in the original space, as well as in scenarios where result quality takes priority over algorithm execution speed. It is worth highlighting the effectiveness of ensemble approaches: in particular, pre-determining the number of clusters using GMM followed by the application of the k-means algorithm allows for the integration of the advantages of both methods [9, p. 211]. Similarly, combining DBSCAN (for identifying noise components) and spectral clustering (for performing the main partitioning) helps improve the overall quality of modeling [15, p. 9]. Within the scope of this study, a systematic comparison of four clustering methods - k-means, DBSCAN, GMM, and spectral clustering was conducted based on three multimodal datasets. It was found that methods capable of modeling more complex structures (GMM and spectral clustering) provide higher segmentation quality according to the silhouette coefficient (by 12-40% compared to k-means). At the same time, the k-means method demonstrates a significant advantage in training time, which is 20-25 times shorter than that of the spectral method. It has been established that the integration of text modality leads to a significant decrease in the effectiveness of all the methods considered; however, under such conditions, the relative advantage of spectral clustering increases. The application of the Hauer metric is an effective tool for unifying the feature space; however, its use requires careful normalization of the numerical components. Prospects for further research include: the development of adaptive kernel functions for automatically accounting for heterogeneous feature types; investigating methods for scaled approximation of spectral algorithms on samples containing more than 10⁵ objects; and the integration of dimension reduction methods (UMAP, t-SNE) as a preprocessing step prior to clustering.

RkJQdWJsaXNoZXIy MTAxMzIwNA==