Handling Missing Values - 인공지능 용어집

📖

용어

Mean Imputation

Imputation technique that replaces missing values with the mean calculated from available observations of the same variable. This simple method preserves the overall mean of the variable but may underestimate the variance.

📖

용어

Median Imputation

Robust method that substitutes missing values with the median of observed values, particularly suitable for skewed distributions. This approach minimizes the influence of outliers compared to mean imputation.

📖

용어

KNN Imputation

Algorithm that imputes missing values based on the k nearest neighbors in the feature space, using a weighted average of neighboring values. This method preserves local relationships between variables but can be computationally expensive.

📖

용어

Multiple Imputation

Advanced statistical approach that generates multiple imputed values for each missing data point, reflecting the uncertainty of imputation. The results are then combined to produce more robust estimates and valid confidence intervals.

📖

용어

Regression Imputation

Technique that predicts missing values using a regression model based on other available variables as predictors. This method captures linear relationships between variables but may introduce regression-to-the-mean bias.

📖

용어

EM Imputation

Expectation-Maximization algorithm that iteratively estimates model parameters and imputes missing values by maximizing likelihood. This statistical approach is particularly effective for missing data under MAR (Missing At Random) assumption.

📖

용어

Hot-deck Imputation

Method that replaces each missing value with an observed value from a randomly selected similar donor in the same dataset. This technique preserves the original data distribution and correlations between variables.

📖

용어

Interpolation Imputation

Technique primarily used for time series that estimates missing values based on adjacent temporal values (linear, spline, polynomial). This method maintains temporal continuity and underlying trends.

📖

용어

MICE Imputation

Multiple Imputation by Chained Equations, a method that imputes each variable with a specific model adapted to its nature, iterating until convergence. This flexible approach handles different types of variables and complex relationships.

📖

용어

Matrix Completion Imputation

Technique that decomposes the data matrix into low-rank matrices to predict missing values, using methods like SVD (Singular Value Decomposition). This approach captures latent structures in multidimensional data.

📖

용어

Autoencoder Imputation

Deep learning approach that trains a neural network to compress and then reconstruct data, thus learning to predict missing values. This method captures complex non-linear relationships in high-dimensional data.

📖

용어

Bayesian Imputation

Method that uses prior distributions and Bayes' theorem to estimate missing values, generating posterior distributions for each imputation. This approach naturally quantifies uncertainty and incorporates domain knowledge.

📖

용어

MissForest Imputation

Non-parametric algorithm based on random forests that imputes missing values using decision tree models trained on complete observations. This method effectively handles non-linear interactions and different types of variables.

📖

용어

Clustering Imputation

Technique that groups similar observations then imputes missing values using statistics (mean, median) from the corresponding cluster. This approach preserves underlying structures in multi-modal data.

📖

용어

Markov Chain Imputation

Method that models transitions between data states to predict missing values based on previous or subsequent states in a sequence. This technique is particularly suited for sequential and temporal data.

📖

용어

Decision Tree Imputation

Approach that uses decision trees to predict missing values based on segmentation rules learned from complete observations. This method automatically captures non-linear interactions between variables.

📖

용어

PCA Imputation

A technique based on Principal Component Analysis that projects data into a reduced-dimensional space and then reconstructs missing values. This method is effective for multivariate data with strong correlation structure.

📖

용어

Constant Value Imputation

A simple strategy that replaces all missing values with a predefined constant (often 0, -1, or a domain-specific value). This method is fast but can introduce significant bias if the constant is not chosen carefully.

AI 용어집