Thuật ngữ AI
Từ điển đầy đủ về Trí tuệ nhân tạo
Feature Selection
Process of automatic selection of the most relevant features for a supervised model by eliminating redundant or non-informative variables to improve performance and reduce complexity.
Label Encoding
Transformation of categorical variables into integer numerical values where each unique category receives a distinct numerical identifier, suitable for algorithms requiring numerical inputs.
Feature Scaling
Normalization or standardization of numerical features to bring them into a comparable range, essential for algorithms sensitive to variable scale such as SVMs and neural networks.
Polynomial Features
Generation of new features by creating polynomial combinations of existing variables, allowing to capture non-linear relationships between features and the target variable.
Interaction Features
Creation of new variables representing interactions between existing features, typically through multiplication or combination, to reveal synergistic effects in supervised data.
Recursive Feature Elimination
Iterative selection algorithm that builds a model, eliminates the least important features according to a specific criterion, and repeats this process until reaching the optimal number of features.
Target Encoding
Technique for transforming categorical variables using statistics of the target variable (mean, median) for each category, thus directly capturing the relationship with prediction.
Feature Importance
Quantitative measure of the impact of each feature on the supervised model's predictions, calculated by methods such as permutation importance, SHAP values, or model coefficients.
Principal Component Analysis
Linear dimensionality reduction technique that transforms features into uncorrelated orthogonal components, maximizing explained variance with a reduced number of dimensions.
Binning/Discretization
Process of converting continuous variables into discrete categories (bins) to simplify relationships, handle outliers, and improve performance of some supervised algorithms.
Feature Hashing
Reduced dimensionality technique that applies a hashing function to features to map them into a fixed-dimensional space, useful for high-dimensional data with many categories.
Missing Value Imputation
Set of statistical or algorithmic strategies to replace missing values in features with appropriate estimates, essential for maintaining supervised data integrity.
Feature Crosses
Combination of features to create new features representing specific interactions, particularly effective in linear models to capture non-additive relationships.
Feature Engineering Pipeline
Automated and reproducible sequence of transformations applied to features, integrating cleaning, creation, selection, and scaling to ensure consistency between training and prediction.
Domain-Specific Feature Creation
Development of features based on business expertise and domain knowledge, creating informative variables that capture specific non-obvious patterns in raw data.
Temporal Feature Engineering
Creation of features specific to time-series data like lag features, rolling statistics, time components, and seasonal trends to improve chronological supervised predictions.