Proceedings of the International scientific and practical conference ―Education and Scientific Progress‖ (April 24-26, 2026) / Publisher website: www.naukainfo.com. – Manchester, United Kingdom, 2026. - 218 p.

201 efficient clustering. A significant advantage of this index is its low computational complexity. The Davis-Bouldin (DB) index defines the average ratio of the sum of intra- cluster variances to the distance between centroids, where lower values indicate higher clustering quality. This metric is sensitive to cluster configuration, ensuring high informativeness even when clusters have complex geometric shapes [12, p. 224]. Three types of datasets were created for the comparative analysis (Table 1). Numerical features were normalized using z-scores, categorical variables were processed using one-hot encoding, and text data were vectorized using TF-IDF, limited to the 300 most frequently used tokens. When applying the DBSCAN, GMM, and spectral clustering algorithms, distances between objects were calculated using the Hauer metric in a mixed feature space. Hyperparameter optimization was performed using the grid search method: for the k-means and GMM algorithms, the number of clusters K ∈ {2, ..., 10} was varied; for DBSCAN-the parameters ε ∈ {0.3, 0.5, 0.8, 1.2} and MinPts ∈ {3, 5, 10}; for spectral clustering - the number of clusters K ∈ {2, ..., 10} and the Gaussian kernel parameter γ ∈ {0.1, 1.0, 5.0}. The selection of the optimal configuration was based on maximizing the silhouette coefficient (SC) on a validation set comprising 30% of the total data volume. The computations were performed on the following hardware configuration: an Intel Core i7-12700H processor (2.3 GHz) and 32 GB of RAM. The algorithms were implemented in Python 3.11 using the scikit-learn 1.3, pandas 2.1, and numpy 1.26 libraries [13]. To ensure the statistical significance of the results, each algorithm was run 10 times, and the obtained values were then averaged.

RkJQdWJsaXNoZXIy MTAxMzIwNA==