White Box Models - Glossario IA

📖

termini

Linear Regression

Statistical model that establishes a linear relationship between a dependent variable and one or more independent variables by minimizing the sum of squared residuals. This model is considered a white box because the coefficients can be directly interpreted as the impact of each variable on the prediction.

📖

termini

K-Nearest Neighbors (KNN)

Supervised learning algorithm that classifies a new sample based on the majority class of its k nearest neighbors in the feature space. This model is fully interpretable because predictions can be explained by explicitly showing the neighbors used for the decision.

📖

termini

Association Rules

Method for discovering relationships between variables in large databases, typically represented in IF-THEN form with support and confidence measures. These rules are inherently interpretable because they directly express understandable logical relationships between attributes.

📖

termini

Generalized Linear Model (GLM)

Extension of linear regression that allows for response distributions other than normal and nonlinear link functions, while maintaining an additive structure. GLMs remain interpretable because coefficients can be transformed to reveal the marginal effect of each predictor.

📖

termini

Generalized Additive Model (GAM)

Extension of GLMs where the prediction is a sum of smooth functions of individual variables rather than linear terms. GAMs offer high interpretability because they allow visualization of the separate effect of each variable on the prediction while capturing nonlinear relationships.

📖

termini

Linear Discriminant Analysis (LDA)

Classification method that seeks to find a linear combination of features that best separates two or more classes by maximizing the ratio of between-class variance to within-class variance. Interpretability comes from the eigenvectors that indicate the most discriminative directions in the feature space.

📖

termini

CART Trees

Decision tree construction algorithm that uses the Gini index for classification and mean squared error for regression, with binary splits at each node. The binary structure of CART trees facilitates interpretation of decision paths and extracted rules.

📖

termini

ID3 Algorithm

Historical decision tree construction algorithm that uses information gain based on entropy to select splitting attributes. ID3 produces highly interpretable trees where each path represents a clear decision rule based on binary or multi-class tests.

📖

termini

C4.5 Algorithm

Improvement of the ID3 algorithm that uses the information gain ratio to avoid bias towards attributes with many values, and handles continuous attributes and missing values. C4.5 generates optimized decision trees while preserving complete interpretability of the decision process.

📖

termini

CHAID Algorithm

Decision tree construction algorithm that uses chi-square tests for categorical variables and F-tests for continuous variables, with multi-way splits rather than binary ones. CHAID produces particularly interpretable trees for survey and marketing data.

📖

termini

Decision List

Classification structure represented as an ordered sequence of IF-THEN rules, where each rule is tested sequentially until one is satisfied. Decision lists offer superior interpretability to trees because they present a linear decision flow rather than a complex tree structure.

📖

termini

Rule-based Model

Classification or regression system that uses a set of logical rules to make predictions, often organized as a covering set or decision list. These models are among the most interpretable because each prediction can be explained by one or more explicit rules understandable by non-experts.

📖

termini

Simple Perceptron

Binary linear classification algorithm that learns a separating hyperplane by iteratively adjusting weights based on classification errors. Although simple, the perceptron remains interpretable because the weights can be examined to understand the importance and direction of each feature's influence.

📖

termini

Poisson Regression

Regression model for count data that assumes the response variable follows a Poisson distribution, with a logarithm link for the mean function. The model's exponential coefficients allow direct interpretation as multipliers of expected event rates.

📖

termini

Stochastic Gradient Boosting (SGB)

Ensemble method that combines simple interpretable models (often shallow trees) by sequentially building each new model to correct the errors of the previous one. Although powerful, SGB with shallow trees retains some interpretability through the contributions of each individual tree.

Glossario IA

Linear Regression

K-Nearest Neighbors (KNN)

Association Rules

Generalized Linear Model (GLM)

Generalized Additive Model (GAM)

Linear Discriminant Analysis (LDA)

CART Trees

ID3 Algorithm

C4.5 Algorithm

CHAID Algorithm

Decision List

Rule-based Model

Simple Perceptron

Poisson Regression

Stochastic Gradient Boosting (SGB)

Nessun risultato trovato