Cloud Computing Platforms

📖

terms

Amazon S3

Highly scalable cloud object storage service from AWS offering 99.999999999% durability and used as the primary repository for Big Data with storage classes adapted to different access patterns.

📖

terms

Amazon EMR

Managed AWS service for running Big Data frameworks like Apache Spark, Hadoop, and Presto on dynamic clusters, enabling large-scale distributed processing with simplified infrastructure management.

📖

terms

Amazon Redshift

Fully managed cloud data warehouse from AWS using a massively parallel architecture (MPP) to analyze petabytes of data with performance optimized for complex analytical queries.

📖

terms

Amazon Athena

Serverless interactive query service from AWS allowing direct analysis of data in S3 using standard SQL, without requiring infrastructure management or prior data loading.

📖

terms

AWS Glue

Serverless ETL service from AWS that automates data discovery, preparation, and loading with a centralized data catalog and built-in transformation capabilities based on Apache Spark.

📖

terms

Azure Data Lake Storage

Massively scalable and secure data repository from Azure optimized for Big Data analytical workloads, combining the storage capacity of a data lake with the performance of a file system.

📖

terms

Azure Synapse Analytics

Unified hybrid analytics platform from Azure integrating data warehousing, data integration, and Big Data analytics with SQL and Spark processing capabilities in the same environment.

📖

terms

Azure Databricks

Unified analytics service based on Apache Spark in Azure, offering a collaborative environment for Big Data processing, machine learning, and real-time analytics with optimized clusters.

📖

terms

Google Cloud Storage

Google Cloud's unified object storage service offering high availability, durability, and performance for Big Data with different storage classes optimized based on access frequencies.

📖

terms

Google BigQuery

Google Cloud's serverless data warehouse enabling real-time analysis of petabytes with interactive SQL queries and a serverless architecture that automatically scales according to needs.

📖

terms

Google Dataproc

Google Cloud's managed service for running Apache Spark and Hadoop with quickly provisioned clusters, offering native integration with the GCP ecosystem and optimized costs for Big Data processing.

📖

terms

Google Dataflow

Google Cloud's serverless stream and batch processing service based on Apache Beam, enabling execution of distributed data pipelines with automatic autoscaling and simplified management.

📖

terms

Snowflake

Multi-cloud Data Cloud platform offering a fully managed data warehouse with compute architecture separated from storage, enabling independent scaling and secure data sharing.

📖

terms

ELT Pipeline

Modern data integration pattern where data is first loaded raw into a cloud warehouse then transformed using its computing capabilities, optimizing performance for massive volumes.

📖

terms

Auto-scaling Cluster

Capability of cloud Big Data platforms to dynamically adjust the number of compute nodes based on workload, optimizing costs and performance without manual intervention.

📖

terms

Serverless Analytics

Data analytics paradigm where the underlying infrastructure is fully managed by the cloud provider, allowing users to focus on analytical logic without managing servers or clusters.

AI Glossary

Amazon S3

Amazon EMR

Amazon Redshift

Amazon Athena

AWS Glue

Azure Data Lake Storage

Azure Synapse Analytics

Azure Databricks

Google Cloud Storage

Google BigQuery

Google Dataproc

Google Dataflow

Snowflake

ELT Pipeline

Auto-scaling Cluster

Serverless Analytics

No results found