separate_and_agglomerate¶
- one_pass_fitting.separate_and_agglomerate(xy_arr: ndarray, distance_threshold: float = 0.5, n_divisions: int = 3, **kwargs) ndarray¶
Separates and agglomerates data points into clusters based on distance in a multi-step process.
This function takes an array of 2D data points, divides them into smaller blocks to reduce memory usage, and then clusters the points within each block. It returns an array of labels indicating the cluster assignment of each data point.
Parameters:¶
- xy_arrnumpy.ndarray
An array of data points with two columns (x and y coordinates).
- distance_thresholdfloat, optional
The distance threshold for clustering data points (default is 0.5).
- n_divisionsint, optional
The number of divisions to reduce memory usage (default is 3). A higher value creates smaller blocks.
- **kwargskeyword arguments
Additional keyword arguments to be passed to the AgglomerativeClustering object.
Returns:¶
- numpy.ndarray
An array of labels indicating the cluster assignment of each data point.
Notes:¶
The ‘separate_and_agglomerate’ function is a multi-step clustering process. It first divides the input data points into smaller blocks to reduce memory usage. These blocks are processed separately and then combined into a final clustering result.
This function prints informative messages about its progress, such as the number of data points and blocks being processed. The progress is displayed using the tqdm library.
For each block, the function clusters the data points using the AgglomerativeClustering with the provided distance threshold and any additional keyword arguments.
The function returns an array of labels, with each label indicating the cluster assignment of the corresponding data point.