YZ Sözlüğü
Yapay Zekanın tam sözlüğü
CF Tree (Clustering Feature Tree)
Tree data structure at the core of BIRCH, storing statistical summaries (Clustering Features) in its nodes to compactly represent subclusters.
Clustering Feature (CF)
A triplet (N, LS, SS) that statistically summarizes a subcluster, where N is the number of points, LS the linear sum of the points, and SS the sum of the squares of the points.
Threshold
BIRCH parameter defining the maximum diameter of a subcluster in a leaf of the CF tree, controlling the granularity of the clustering summary.
Branching Factor
Parameter limiting the number of entries (children) per node in the CF tree, influencing the size and shape of the tree to optimize performance.
Micro-clustering
Initial phase of BIRCH where data points are organized into micro-clusters, represented by the entries in the leaf nodes of the CF tree.
Macro-clustering
Final phase of BIRCH applying a clustering algorithm (like K-Means) on the micro-clusters (leaf nodes of the CF tree) to generate the final clusters.
Incremental summarization
Ability of BIRCH to update the CF tree with new data points without needing a complete recalculation from the beginning, ideal for data streams.
CF Additive Distance
Distance metric used in BIRCH to measure the proximity between two Clustering Features, directly calculable from their statistical summaries without accessing the original points.
Leaf Entry
Element of a leaf of the CF tree representing a micro-cluster, containing a Clustering Feature and a pointer to the next node in the leaf linked list.
Leaf Linked List
Structure in the CF tree linking all leaves for efficient sequential scanning during the macro-clustering phase.
Point Absorption
Process in BIRCH where a new data point is integrated into the nearest micro-cluster if the addition does not exceed the diameter threshold.
Node Splitting
Mechanism triggered in BIRCH when the insertion of a point would exceed the diameter threshold or branching factor, dividing the node to maintain constraints.
Rebuilding Phase
Optional step in BIRCH where the CF tree is rebuilt with a lower diameter threshold to increase clustering precision before the final phase.
Incremental computational cost
Key advantage of BIRCH, where the cost to insert a data point is logarithmic with respect to the number of points, making the algorithm very scalable.
Cluster Summary
Fundamental concept of BIRCH where a group of points is represented by a statistical summary (the CF) rather than by individual points, reducing memory space.