Hadoop Ecosystem - Glosarium AI

📖

istilah

HDFS

Hadoop's primary distributed file system designed to store petabytes of data on standard machine clusters with automatic replication and fault tolerance.

📖

istilah

MapReduce

Programming paradigm and implementation for distributed processing of large datasets on clusters, dividing tasks into mapping and reduction phases.

📖

istilah

YARN

Hadoop's resource manager that orchestrates the allocation of CPU and memory resources to applications while managing task lifecycles in the cluster.

📖

istilah

HBase

Distributed, column-oriented, non-relational NoSQL database built on HDFS, offering real-time access to massive data with strong consistency.

📖

istilah

Hive

Data warehouse infrastructure on Hadoop enabling querying of large datasets with a SQL-like language (HiveQL) while using MapReduce for execution.

📖

istilah

Pig

High-level data analysis platform using the Pig Latin language to express complex data transformation programs executed on Hadoop.

📖

istilah

Spark

Ultra-fast unified processing engine for Big Data, offering APIs in Scala, Java, Python and R with support for SQL, streaming, machine learning and graph processing.

📖

istilah

ZooKeeper

Centralized distributed coordination service for maintaining configuration information, naming, distributed synchronization, and group service management.

📖

istilah

Flume

Distributed, reliable, and available service for collecting, aggregating, and moving large amounts of streaming data to HDFS with an agent-based architecture.

📖

istilah

Sqoop

Tool designed to efficiently transfer bulk data between Hadoop and structured databases such as relational databases.

📖

istilah

Oozie

Workflow and coordinator system for managing and executing complex Hadoop data processing pipelines with time-based and conditional dependencies.

📖

istilah

Mahout

Library of distributed machine learning and data mining algorithms implemented on Hadoop MapReduce for processing large datasets.

📖

istilah

Ambari

Hadoop cluster management and monitoring platform offering a web interface for provisioning, managing, and monitoring the complete Hadoop ecosystem.

📖

istilah

HCatalog

Metadata and table management service for the Hadoop ecosystem, providing a unified view of data for tools like Pig, Hive, and MapReduce.

📖

istilah

Avro

Data serialization system with evolving schema, providing compact and fast data formats for exchanges between Hadoop services.

📖

istilah

Parquet

Columnar file format optimized for analytical query performance on Hadoop, with efficient compression and support for complex types.

📖

istilah

Impala

Massively parallel SQL query engine for Hadoop providing low-latency interactive query performance on data stored in HDFS and HBase.

📖

istilah

Tez

Generalized acyclic data execution framework for Hadoop YARN, optimizing performance of complex processing by eliminating unnecessary MapReduce phases.

📖

istilah

Storm

Distributed real-time stream processing system for Hadoop, capable of processing massive volumes of data with millisecond-level latencies.

📖

istilah

Kafka

High-performance, high-availability distributed messaging platform for collecting and processing real-time data streams in the Hadoop ecosystem.

Glosarium AI

HDFS

MapReduce

YARN

HBase

Hive

Pig

Spark

ZooKeeper

Flume

Sqoop

Oozie

Mahout

Ambari

HCatalog

Avro

Parquet

Impala

Tez

Storm

Kafka

Tidak ada hasil ditemukan