Proceedings of the International scientific and practical conference ―Education and Scientific Progress‖ (April 24-26, 2026) / Publisher website: www.naukainfo.com. – Manchester, United Kingdom, 2026. - 218 p.
199 The DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm performs clustering based on an analysis of local data density: a point is classified as a cluster member if its ε -neighborhood contains at least the specified threshold MinPts [7, p. 226]. Key advantages of this method, which are crucial for analyzing real-world data, include the ability to automatically determine the number of clusters and effectively identify noise. At the same time, the effectiveness of DBSCAN depends on the choice of the parameters ε and MinPts , which requires prior knowledge of the data distribution density. The algorithm demonstrates low performance when processing datasets with non-uniform density. Furthermore, when working with multimodal data, it is necessary to predefine a common similarity metric for heterogeneous features, which significantly affects the accuracy of the results [4, p. 527]. The GMM method models the data distribution as a mixture of K Gaussians: p(x) = ∑ π k · N(x | μ k , ∑ k ) , where πₖ is the weight of the k-th component, μₖ is the mean vector, and Σₖ is the covariance matrix of the corresponding component [8, p. 430]. The parameters are estimated using the expectation-maximization (EM) algorithm. The Gaussian mixture model (GMM) determines the probability of each object belonging to a cluster, which objectively reflects the uncertainty of the classification process. Key drawbacks of GMM include sensitivity to deviations from a Gaussian data distribution, as well as high computational complexity when working with multidimensional feature spaces, due to the need to invert covariance matrices [9, p. 204]. The use of diagonal approximation of the ∑ matrix allows this effect to be partially minimized. Spectral clustering is based on constructing a similarity graph W among data points, computing the Laplacian matrix L = D - W (where D is the diagonal matrix of vertex degrees), and subsequently clustering in the eigenspace of matrix L [10, p. 1403]. This approach is effective for detecting clusters of arbitrary geometric shapes
Made with FlippingBook
RkJQdWJsaXNoZXIy MTAxMzIwNA==