Distributed Computing Models

📖

istilah

MapReduce

Parallel programming model for processing large datasets on clusters, dividing processing into two main phases: Map for filtering and transforming, and Reduce for aggregating results.

📖

istilah

Lambda Architecture

Data processing architecture combining a batch path for comprehensive analysis and a speed path for real-time results, with a unified service layer to merge both views.

📖

istilah

Kappa Architecture

Simplification of Lambda architecture using only a stream processing pipeline, where data is processed in real-time and historical queries are satisfied by replaying events.

📖

istilah

Batch Processing

Processing mode where data is collected and processed in batches at predefined intervals, optimized for throughput rather than latency, typical of traditional ETL analyses.

📖

istilah

Stream Processing

Continuous processing of data in motion as it is generated, enabling real-time analysis with minimal latency between capture and processing.

📖

istilah

Distributed File System

File system storing data across multiple servers while appearing as a single system to users, ensuring replication and fault tolerance for reliability.

📖

istilah

HDFS

Hadoop Distributed File System, distributed file system designed to store petabytes of data on standard hardware with high fault tolerance through block replication.

📖

istilah

YARN

Yet Another Resource Negotiator, Hadoop resource manager separating data processing from resource management, enabling execution of multiple frameworks on the same cluster.

📖

istilah

RDD

Resilient Distributed Dataset, fundamental data structure of Spark representing an immutable and partitioned collection of objects that can be computed in parallel with automatic fault tolerance.

📖

istilah

Data Locality

Distributed computing principle where tasks are executed on nodes containing the necessary data, minimizing network transfer and significantly improving performance.

📖

istilah

Speculative Execution

Fault tolerance mechanism launching copies of slow tasks on other nodes, using the first completed result to reduce the impact of faulty or overloaded nodes.

📖

istilah

DAG

Directed Acyclic Graph, representation of the Spark workflow where transformations are organized in a directed acyclic graph, optimizing parallel execution of steps.

📖

istilah

Fault Tolerance

Ability of a distributed system to continue functioning correctly in case of component failures, typically through redundancy, replication, and automatic recovery mechanisms.

📖

istilah

Consistency Model

Contract defining data consistency guarantees in a distributed system, ranging from strong consistency to eventual consistency based on application needs.

📖

istilah

Combiner

MapReduce optimization function executed locally on each mapper to reduce the volume of data transferred during shuffle, applying pre-aggregation before the reduce phase.

Glosarium AI

MapReduce

Lambda Architecture

Kappa Architecture

Batch Processing

Stream Processing

Distributed File System

HDFS

YARN

RDD

Data Locality

Speculative Execution

DAG

Fault Tolerance

Consistency Model

Combiner

Tidak ada hasil ditemukan