Distributed Computing Models - Bảng thuật ngữ Trí tuệ nhân tạo

📖

thuật ngữ

MapReduce

Parallel programming model for processing large datasets on clusters, dividing processing into two main phases: Map for filtering and transforming, and Reduce for aggregating results.

📖

thuật ngữ

Lambda Architecture

Data processing architecture combining a batch path for comprehensive analysis and a speed path for real-time results, with a unified service layer to merge both views.

📖

thuật ngữ

Kappa Architecture

Simplification of Lambda architecture using only a stream processing pipeline, where data is processed in real-time and historical queries are satisfied by replaying events.

📖

thuật ngữ

Batch Processing

Processing mode where data is collected and processed in batches at predefined intervals, optimized for throughput rather than latency, typical of traditional ETL analyses.

📖

thuật ngữ

Stream Processing

Continuous processing of data in motion as it is generated, enabling real-time analysis with minimal latency between capture and processing.

📖

thuật ngữ

Distributed File System

File system storing data across multiple servers while appearing as a single system to users, ensuring replication and fault tolerance for reliability.

📖

thuật ngữ

HDFS

Hadoop Distributed File System, distributed file system designed to store petabytes of data on standard hardware with high fault tolerance through block replication.

📖

thuật ngữ

YARN

Yet Another Resource Negotiator, Hadoop resource manager separating data processing from resource management, enabling execution of multiple frameworks on the same cluster.

📖

thuật ngữ

RDD

Resilient Distributed Dataset, fundamental data structure of Spark representing an immutable and partitioned collection of objects that can be computed in parallel with automatic fault tolerance.

📖

thuật ngữ

Data Locality

Distributed computing principle where tasks are executed on nodes containing the necessary data, minimizing network transfer and significantly improving performance.

📖

thuật ngữ

Speculative Execution

Fault tolerance mechanism launching copies of slow tasks on other nodes, using the first completed result to reduce the impact of faulty or overloaded nodes.

📖

thuật ngữ

DAG

Directed Acyclic Graph, representation of the Spark workflow where transformations are organized in a directed acyclic graph, optimizing parallel execution of steps.

📖

thuật ngữ

Fault Tolerance

Ability of a distributed system to continue functioning correctly in case of component failures, typically through redundancy, replication, and automatic recovery mechanisms.

📖

thuật ngữ

Consistency Model

Contract defining data consistency guarantees in a distributed system, ranging from strong consistency to eventual consistency based on application needs.

📖

thuật ngữ

Combiner

MapReduce optimization function executed locally on each mapper to reduce the volume of data transferred during shuffle, applying pre-aggregation before the reduce phase.

Thuật ngữ AI

MapReduce

Lambda Architecture

Kappa Architecture

Batch Processing

Stream Processing

Distributed File System

HDFS

YARN

RDD

Data Locality

Speculative Execution

DAG

Fault Tolerance

Consistency Model

Combiner

Không tìm thấy kết quả