AI 용어집
인공지능 완전 사전
Mean Imputation
Imputation technique that replaces missing values with the mean calculated from available observations of the same variable. This simple method preserves the overall mean of the variable but may underestimate the variance.
Median Imputation
Robust method that substitutes missing values with the median of observed values, particularly suitable for skewed distributions. This approach minimizes the influence of outliers compared to mean imputation.
KNN Imputation
Algorithm that imputes missing values based on the k nearest neighbors in the feature space, using a weighted average of neighboring values. This method preserves local relationships between variables but can be computationally expensive.
Multiple Imputation
Advanced statistical approach that generates multiple imputed values for each missing data point, reflecting the uncertainty of imputation. The results are then combined to produce more robust estimates and valid confidence intervals.
Regression Imputation
Technique that predicts missing values using a regression model based on other available variables as predictors. This method captures linear relationships between variables but may introduce regression-to-the-mean bias.
EM Imputation
Expectation-Maximization algorithm that iteratively estimates model parameters and imputes missing values by maximizing likelihood. This statistical approach is particularly effective for missing data under MAR (Missing At Random) assumption.
Hot-deck Imputation
Method that replaces each missing value with an observed value from a randomly selected similar donor in the same dataset. This technique preserves the original data distribution and correlations between variables.
Interpolation Imputation
Technique primarily used for time series that estimates missing values based on adjacent temporal values (linear, spline, polynomial). This method maintains temporal continuity and underlying trends.
MICE Imputation
Multiple Imputation by Chained Equations, a method that imputes each variable with a specific model adapted to its nature, iterating until convergence. This flexible approach handles different types of variables and complex relationships.
Matrix Completion Imputation
Technique that decomposes the data matrix into low-rank matrices to predict missing values, using methods like SVD (Singular Value Decomposition). This approach captures latent structures in multidimensional data.
Autoencoder Imputation
Deep learning approach that trains a neural network to compress and then reconstruct data, thus learning to predict missing values. This method captures complex non-linear relationships in high-dimensional data.
Bayesian Imputation
Method that uses prior distributions and Bayes' theorem to estimate missing values, generating posterior distributions for each imputation. This approach naturally quantifies uncertainty and incorporates domain knowledge.
MissForest Imputation
Non-parametric algorithm based on random forests that imputes missing values using decision tree models trained on complete observations. This method effectively handles non-linear interactions and different types of variables.
Clustering Imputation
Technique that groups similar observations then imputes missing values using statistics (mean, median) from the corresponding cluster. This approach preserves underlying structures in multi-modal data.
Markov Chain Imputation
Method that models transitions between data states to predict missing values based on previous or subsequent states in a sequence. This technique is particularly suited for sequential and temporal data.
Decision Tree Imputation
Approach that uses decision trees to predict missing values based on segmentation rules learned from complete observations. This method automatically captures non-linear interactions between variables.
PCA Imputation
A technique based on Principal Component Analysis that projects data into a reduced-dimensional space and then reconstructs missing values. This method is effective for multivariate data with strong correlation structure.
Constant Value Imputation
A simple strategy that replaces all missing values with a predefined constant (often 0, -1, or a domain-specific value). This method is fast but can introduce significant bias if the constant is not chosen carefully.