Distributed Matrix Factorization

📖

thuật ngữ

Set of algorithmic techniques aimed at decomposing a very large matrix into products of smaller matrices, distributing computations and data across a cluster of machines to overcome the memory and computing power limitations of a single node.

📖

thuật ngữ

Distributed Alternating Least Squares (ALS)

Parallelized matrix factorization algorithm that solves the least squares problem alternately for one of the matrix factors while keeping the other fixed, naturally adapting to distributed environments like Spark MLlib due to the independence of computations on each row or column.

📖

thuật ngữ

Distributed Stochastic Gradient Descent (SGD)

Parallel variant of stochastic gradient descent where the update of factorization parameters is performed asynchronously or synchronously across multiple data partitions, requiring consistency management mechanisms to converge properly in a distributed context.

📖

thuật ngữ

MapReduce for Factorization

Programming paradigm that decomposes matrix factorization algorithms into two main stages: a 'Map' stage for local computations on data fragments and a 'Reduce' stage to aggregate partial results and update matrix factors, used notably in Hadoop implementations.

📖

thuật ngữ

Spark MLlib ALS

Optimized and distributed implementation of the Alternating Least Squares algorithm within Spark's Machine Learning library, designed for large-scale matrix factorization by leveraging the RDD or DataFrame programming model for maximum efficiency on iterative data.

📖

thuật ngữ

Matrix Partitioning

Strategy for splitting a massive matrix into sub-blocks (by rows, by columns, or by square blocks) distributed across cluster nodes, a crucial choice that directly impacts workload, inter-node communication, and overall performance of factorization algorithms.

📖

thuật ngữ

Consistency Model

Rules defining the visibility of matrix factor updates across cluster nodes, oscillating between strong consistency (BSP model - Bulk Synchronous Parallel) that guarantees convergence at the cost of latency, and weak consistency (asynchronous model) that speeds up iterations but may compromise stability.

📖

thuật ngữ

Online Matrix Factorization

Distributed approach suitable for continuous data streams, where the factorization model is updated incrementally as new observations arrive without requiring complete retraining on historical data, often implemented with distributed variants of SGD.

📖

thuật ngữ

Parametric Distributed Matrix Factorization

Advanced method where matrix factors are not learned directly but are generated by shared and distributed parametric functions (e.g., neural networks), thereby reducing the amount of data to communicate between nodes and improving generalization capability.

📖

thuật ngữ

Stragglers (Slow Nodes)

Phenomenon in distributed systems where some machines execute their computation tasks much slower than others, delaying the entire synchronous factorization process; techniques like speculation or delay-tolerant algorithms are designed to mitigate their impact.

📖

thuật ngữ

Distributed Non-Negative Matrix Factorization (NMF)

Distributed extension of non-negative matrix factorization, where non-negativity constraints on the factors are enforced through update rules (multiplicative or projection) adapted for parallel execution, often used for large-scale text clustering.

📖

thuật ngữ

Checkpointing in Iterative Algorithms

Technique of periodically saving the state of matrix factors to reliable storage (e.g., HDFS) during the iterations of a distributed algorithm, allowing the computation to resume from an intermediate point in case of node failure and avoiding restarting from scratch.

📖

thuật ngữ

Distributed Tensor Factorization

Generalization of matrix factorization to tensors (multi-dimensional arrays) in a distributed context, used to model data with more than two modes (e.g., users, items, time) and requiring specific parallel algorithms like distributed PARAFAC or Tucker.

📖

thuật ngữ

Distributed Loss Function

Calculation of the matrix factorization reconstruction error performed in a partitioned manner where each node evaluates the loss on its data subset before a global reduction step computes the total loss to guide model updates in a centralized or decentralized manner.

📖

thuật ngữ

Distributed Regularization

Application of penalties (such as L2 norm) on matrix factors to prevent overfitting, where the regularization term is computed locally on each node and aggregated during global parameter updates, ensuring consistent regularization across the cluster.

📖

thuật ngữ

Spark GraphX for Factorization

Use of Spark's GraphX graph processing API to model the matrix as a bipartite graph (users-items) and execute factorization algorithms based on message passing between graph nodes, offering an alternative to DataFrame-based implementations.

Thuật ngữ AI