Hadoop Ecosystem - KI-Glossar

📖

Begriffe

HDFS

Hadoop's primary distributed file system designed to store petabytes of data on standard machine clusters with automatic replication and fault tolerance.

📖

Begriffe

MapReduce

Programming paradigm and implementation for distributed processing of large datasets on clusters, dividing tasks into mapping and reduction phases.

📖

Begriffe

YARN

Hadoop's resource manager that orchestrates the allocation of CPU and memory resources to applications while managing task lifecycles in the cluster.

📖

Begriffe

HBase

Distributed, column-oriented, non-relational NoSQL database built on HDFS, offering real-time access to massive data with strong consistency.

📖

Begriffe

Hive

Data warehouse infrastructure on Hadoop enabling querying of large datasets with a SQL-like language (HiveQL) while using MapReduce for execution.

📖

Begriffe

Pig

High-level data analysis platform using the Pig Latin language to express complex data transformation programs executed on Hadoop.

📖

Begriffe

Spark

Ultra-fast unified processing engine for Big Data, offering APIs in Scala, Java, Python and R with support for SQL, streaming, machine learning and graph processing.

📖

Begriffe

ZooKeeper

Centralized distributed coordination service for maintaining configuration information, naming, distributed synchronization, and group service management.

📖

Begriffe

Flume

Distributed, reliable, and available service for collecting, aggregating, and moving large amounts of streaming data to HDFS with an agent-based architecture.

📖

Begriffe

Sqoop

Tool designed to efficiently transfer bulk data between Hadoop and structured databases such as relational databases.

📖

Begriffe

Oozie

Workflow and coordinator system for managing and executing complex Hadoop data processing pipelines with time-based and conditional dependencies.

📖

Begriffe

Mahout

Library of distributed machine learning and data mining algorithms implemented on Hadoop MapReduce for processing large datasets.

📖

Begriffe

Ambari

Hadoop cluster management and monitoring platform offering a web interface for provisioning, managing, and monitoring the complete Hadoop ecosystem.

📖

Begriffe

HCatalog

Metadata and table management service for the Hadoop ecosystem, providing a unified view of data for tools like Pig, Hive, and MapReduce.

📖

Begriffe

Avro

Data serialization system with evolving schema, providing compact and fast data formats for exchanges between Hadoop services.

📖

Begriffe

Parquet

Columnar file format optimized for analytical query performance on Hadoop, with efficient compression and support for complex types.

📖

Begriffe

Impala

Massively parallel SQL query engine for Hadoop providing low-latency interactive query performance on data stored in HDFS and HBase.

📖

Begriffe

Tez

Generalized acyclic data execution framework for Hadoop YARN, optimizing performance of complex processing by eliminating unnecessary MapReduce phases.

📖

Begriffe

Storm

Distributed real-time stream processing system for Hadoop, capable of processing massive volumes of data with millisecond-level latencies.

📖

Begriffe

Kafka

High-performance, high-availability distributed messaging platform for collecting and processing real-time data streams in the Hadoop ecosystem.

KI-Glossar

HDFS

MapReduce

YARN

HBase

Hive

Pig

Spark

ZooKeeper

Flume

Sqoop

Oozie

Mahout

Ambari

HCatalog

Avro

Parquet

Impala

Tez

Storm

Kafka

Keine Ergebnisse gefunden