KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
Distributed Matrix Factorization
Set of algorithmic techniques aimed at decomposing a very large matrix into products of smaller matrices, distributing computations and data across a cluster of machines to overcome the memory and computing power limitations of a single node.
Distributed Alternating Least Squares (ALS)
Parallelized matrix factorization algorithm that solves the least squares problem alternately for one of the matrix factors while keeping the other fixed, naturally adapting to distributed environments like Spark MLlib due to the independence of computations on each row or column.
Distributed Stochastic Gradient Descent (SGD)
Parallel variant of stochastic gradient descent where the update of factorization parameters is performed asynchronously or synchronously across multiple data partitions, requiring consistency management mechanisms to converge properly in a distributed context.
MapReduce for Factorization
Programming paradigm that decomposes matrix factorization algorithms into two main stages: a 'Map' stage for local computations on data fragments and a 'Reduce' stage to aggregate partial results and update matrix factors, used notably in Hadoop implementations.
Spark MLlib ALS
Optimized and distributed implementation of the Alternating Least Squares algorithm within Spark's Machine Learning library, designed for large-scale matrix factorization by leveraging the RDD or DataFrame programming model for maximum efficiency on iterative data.
Matrix Partitioning
Strategy for splitting a massive matrix into sub-blocks (by rows, by columns, or by square blocks) distributed across cluster nodes, a crucial choice that directly impacts workload, inter-node communication, and overall performance of factorization algorithms.
Consistency Model
Rules defining the visibility of matrix factor updates across cluster nodes, oscillating between strong consistency (BSP model - Bulk Synchronous Parallel) that guarantees convergence at the cost of latency, and weak consistency (asynchronous model) that speeds up iterations but may compromise stability.
Online Matrix Factorization
Distributed approach suitable for continuous data streams, where the factorization model is updated incrementally as new observations arrive without requiring complete retraining on historical data, often implemented with distributed variants of SGD.
Parametric Distributed Matrix Factorization
Advanced method where matrix factors are not learned directly but are generated by shared and distributed parametric functions (e.g., neural networks), thereby reducing the amount of data to communicate between nodes and improving generalization capability.
Stragglers (Slow Nodes)
Phenomenon in distributed systems where some machines execute their computation tasks much slower than others, delaying the entire synchronous factorization process; techniques like speculation or delay-tolerant algorithms are designed to mitigate their impact.
Distributed Non-Negative Matrix Factorization (NMF)
Distributed extension of non-negative matrix factorization, where non-negativity constraints on the factors are enforced through update rules (multiplicative or projection) adapted for parallel execution, often used for large-scale text clustering.
Checkpointing in Iterative Algorithms
Technique of periodically saving the state of matrix factors to reliable storage (e.g., HDFS) during the iterations of a distributed algorithm, allowing the computation to resume from an intermediate point in case of node failure and avoiding restarting from scratch.
Distributed Tensor Factorization
Generalization of matrix factorization to tensors (multi-dimensional arrays) in a distributed context, used to model data with more than two modes (e.g., users, items, time) and requiring specific parallel algorithms like distributed PARAFAC or Tucker.
Distributed Loss Function
Calculation of the matrix factorization reconstruction error performed in a partitioned manner where each node evaluates the loss on its data subset before a global reduction step computes the total loss to guide model updates in a centralized or decentralized manner.
Distributed Regularization
Application of penalties (such as L2 norm) on matrix factors to prevent overfitting, where the regularization term is computed locally on each node and aggregated during global parameter updates, ensuring consistent regularization across the cluster.
Spark GraphX for Factorization
Use of Spark's GraphX graph processing API to model the matrix as a bipartite graph (users-items) and execute factorization algorithms based on message passing between graph nodes, offering an alternative to DataFrame-based implementations.