KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
Distributed Machine Learning
Paradigm for training ML models where computations are distributed across multiple machines to process massive datasets and reduce training time.
Parameter Server
Distributed architecture centralizing model parameters on dedicated servers, allowing workers to update and synchronize gradients asynchronously.
AllReduce
Collective communication algorithm enabling synchronized reduction and broadcasting of gradients between all nodes in a distributed training environment.
Data Parallelism
Parallelization strategy where data is partitioned across multiple machines, each training an identical copy of the model with different batches.
Spark MLlib
Scalable machine learning library built on Apache Spark, offering distributed implementations of classical ML algorithms.
TensorFlow Distributed
TensorFlow's distributed training framework using strategies like MirroredStrategy and MultiWorkerMirroredStrategy to scale training.
Horovod
Open-source framework developed by Uber using the AllReduce algorithm via MPI for efficient distributed training of deep learning models.
Ray
Distributed computing framework optimized for machine learning and AI, providing primitives for parallel execution and large-scale state management.
Petastorm
Library enabling efficient access to large datasets stored in Apache Parquet for distributed deep learning model training.
Dask-ML
Dask extension integrating scalable machine learning algorithms and parallelization tools for ML workflows on clusters.
Kubeflow
Open-source platform based on Kubernetes for deploying and managing complex ML pipelines at scale with containerized orchestration.
MLflow
Open source platform for managing the complete lifecycle of ML projects, including tracking, model management, and reproducibility at scale.
Feast
Open source feature store providing an abstraction layer for managing, versioning, and serving features at scale.
Vertex AI
Google Cloud's unified platform for training, deploying, and managing ML models at scale with integrated AutoML and MLOps.
SageMaker
Fully managed AWS service for distributed training, deployment, and monitoring of ML models with automatic resource optimization.
Sharding
Horizontal partitioning of data or model across multiple nodes to enable parallel processing and reduce load per machine.
Elastic Training
Ability to dynamically adjust the number of workers during training to optimize resource utilization and reduce costs.