Hadoop Ecosystem - Glossario IA

📖

termini

HDFS

Hadoop's primary distributed file system designed to store petabytes of data on standard machine clusters with automatic replication and fault tolerance.

📖

termini

MapReduce

Programming paradigm and implementation for distributed processing of large datasets on clusters, dividing tasks into mapping and reduction phases.

📖

termini

YARN

Hadoop's resource manager that orchestrates the allocation of CPU and memory resources to applications while managing task lifecycles in the cluster.

📖

termini

HBase

Distributed, column-oriented, non-relational NoSQL database built on HDFS, offering real-time access to massive data with strong consistency.

📖

termini

Hive

Data warehouse infrastructure on Hadoop enabling querying of large datasets with a SQL-like language (HiveQL) while using MapReduce for execution.

📖

termini

Pig

High-level data analysis platform using the Pig Latin language to express complex data transformation programs executed on Hadoop.

📖

termini

Spark

Ultra-fast unified processing engine for Big Data, offering APIs in Scala, Java, Python and R with support for SQL, streaming, machine learning and graph processing.

📖

termini

ZooKeeper

Centralized distributed coordination service for maintaining configuration information, naming, distributed synchronization, and group service management.

📖

termini

Flume

Distributed, reliable, and available service for collecting, aggregating, and moving large amounts of streaming data to HDFS with an agent-based architecture.

📖

termini

Sqoop

Tool designed to efficiently transfer bulk data between Hadoop and structured databases such as relational databases.

📖

termini

Oozie

Workflow and coordinator system for managing and executing complex Hadoop data processing pipelines with time-based and conditional dependencies.

📖

termini

Mahout

Library of distributed machine learning and data mining algorithms implemented on Hadoop MapReduce for processing large datasets.

📖

termini

Ambari

Hadoop cluster management and monitoring platform offering a web interface for provisioning, managing, and monitoring the complete Hadoop ecosystem.

📖

termini

HCatalog

Metadata and table management service for the Hadoop ecosystem, providing a unified view of data for tools like Pig, Hive, and MapReduce.

📖

termini

Avro

Data serialization system with evolving schema, providing compact and fast data formats for exchanges between Hadoop services.

📖

termini

Parquet

Columnar file format optimized for analytical query performance on Hadoop, with efficient compression and support for complex types.

📖

termini

Impala

Massively parallel SQL query engine for Hadoop providing low-latency interactive query performance on data stored in HDFS and HBase.

📖

termini

Tez

Generalized acyclic data execution framework for Hadoop YARN, optimizing performance of complex processing by eliminating unnecessary MapReduce phases.

📖

termini

Storm

Distributed real-time stream processing system for Hadoop, capable of processing massive volumes of data with millisecond-level latencies.

📖

termini

Kafka

High-performance, high-availability distributed messaging platform for collecting and processing real-time data streams in the Hadoop ecosystem.

Glossario IA

HDFS

MapReduce

YARN

HBase

Hive

Pig

Spark

ZooKeeper

Flume

Sqoop

Oozie

Mahout

Ambari

HCatalog

Avro

Parquet

Impala

Tez

Storm

Kafka

Nessun risultato trovato