Glossario IA
Il dizionario completo dell'Intelligenza Artificiale
Leaf-wise Growth
Tree splitting strategy that chooses the leaf with the largest reduction in loss to split, unlike level-wise growth, allowing for faster convergence with less depth.
Feature Binning
Technique for discretizing continuous features into discrete intervals (bins) to speed up the calculation of split points and reduce memory footprint, at the cost of a slight loss in precision.
Gradient-Based One-Side Sampling (GOSS)
Innovative sampling method from LightGBM that keeps all instances with large gradients and performs random sampling on those with small gradients, speeding up training without significant loss of accuracy.
Exclusive Feature Bundling (EFB)
Dimensionality reduction algorithm that identifies and groups mutually exclusive features (rarely non-zero at the same time) into a single composite feature, thus reducing the number of features.
Gradient Histogram
Data structure used by LightGBM to store gradients and hessians in bins, allowing for fast calculation of statistics for each potential split point during tree construction.
Num Leaves
Main parameter of LightGBM that controls the maximum number of leaves in each tree, directly influencing model complexity and the bias-variance tradeoff; more important than `max_depth` for leaf-wise growth.
L1 and L2 Regularization
Regularization parameters (`lambda_l1`, `lambda_l2`) applied to leaf weights to control model complexity and prevent overfitting by respectively penalizing high weights and the magnitude of the weights.
Min Data in Leaf
Minimum number of samples required in a leaf (or minimum total weight), a key parameter to avoid creating overly specific leaves and combat overfitting in LightGBM models.
CatBoost Feature Handling
LightGBM's ability to natively handle categorical features using a specific transformation that maps them to integers, thus avoiding manual one-hot encoding and improving efficiency.
Leaf-wise Growth Overfitting
Specific risk of leaf-wise growth where the model can overfit by creating very deep and specialized leaves, requiring increased regularization (e.g., `num_leaves`, `min_data_in_leaf`) to control it.
DART (Dropouts meet Multiple Additive Regression Trees)
Boosting variant implemented in LightGBM that applies the dropout technique to previous trees when adding a new tree, improving regularization and performance on certain datasets.