Glossario IA
Il dizionario completo dell'Intelligenza Artificiale
Stream Classification
Process of assigning predefined labels to data instances arriving sequentially in a continuous stream, without the possibility of revisiting previous data. This technique allows for real-time data classification while adapting to dynamic changes in distributions.
Hoeffding Tree
Incremental decision tree algorithm that builds a model from a data stream using Hoeffding's inequality to decide when to split a node. It guarantees that the constructed tree is asymptotically identical to one built on batch data with controllable probability.
Data Stream Mining
Field of study focusing on algorithms and techniques for extracting knowledge from continuous and potentially infinite data streams. These algorithms must process data in a single pass with limited memory and computational resources.
Incremental Learning
Learning paradigm where the model is continuously updated as new data becomes available, without requiring complete retraining. This approach is essential for systems operating in dynamic environments with continuous data streams.
Concept Evolution
Phenomenon distinct from concept drift where new classes emerge in the data stream over time. Detecting concept evolution is critical for maintaining the relevance of classification models in environments where labels can evolve.
Ensemble Methods for Streams
Techniques combining multiple classifiers to improve performance and robustness in data stream classification. These methods include adaptive bagging, online boosting, and diversity-based approaches to effectively manage concept drift.
VFDT (Very Fast Decision Tree)
Pioneering decision tree algorithm for data streams using Hoeffding's inequality to guarantee statistically valid decisions with a minimal number of instances. It forms the basis of many modern stream classification algorithms.
Drift Detection Method (DDM)
Statistical technique for detecting concept drift by monitoring the classifier's error rate and its variations. It uses confidence bounds based on the binomial distribution to identify when model performance degrades significantly.
K-Nearest Neighbors for Streams
Adaptation of the KNN algorithm for data streams using efficient data structures like kd-trees or LSH to maintain fast neighbor queries. These methods must handle data evolution and the inherent memory constraints of streams.
Naive Bayes for Streams
Incremental version of the Naive Bayes classifier that updates conditional probabilities as new instances arrive in the stream. This algorithm is particularly effective for high-dimensional data streams due to its linear computational complexity.
Time-Decay Functions
Mechanisms assigning decreasing weights to older instances in a stream to give more importance to recent data. These functions are essential for adapting models to gradual changes and maintaining their temporal relevance.
Resource-Aware Stream Mining
Stream classification approach that dynamically adapts the use of computational and memory resources based on system constraints and load. It allows maintaining acceptable performance even under strict resource limitations.
Prequential Evaluation
Evaluation methodology specific to data streams where each instance is first used to test the model before being used for training. This test-then-train approach provides a realistic measure of performance on non-stationary data.