Glosarium AI
Kamus lengkap Kecerdasan Buatan
Amazon S3
Highly scalable cloud object storage service from AWS offering 99.999999999% durability and used as the primary repository for Big Data with storage classes adapted to different access patterns.
Amazon EMR
Managed AWS service for running Big Data frameworks like Apache Spark, Hadoop, and Presto on dynamic clusters, enabling large-scale distributed processing with simplified infrastructure management.
Amazon Redshift
Fully managed cloud data warehouse from AWS using a massively parallel architecture (MPP) to analyze petabytes of data with performance optimized for complex analytical queries.
Amazon Athena
Serverless interactive query service from AWS allowing direct analysis of data in S3 using standard SQL, without requiring infrastructure management or prior data loading.
AWS Glue
Serverless ETL service from AWS that automates data discovery, preparation, and loading with a centralized data catalog and built-in transformation capabilities based on Apache Spark.
Azure Data Lake Storage
Massively scalable and secure data repository from Azure optimized for Big Data analytical workloads, combining the storage capacity of a data lake with the performance of a file system.
Azure Synapse Analytics
Unified hybrid analytics platform from Azure integrating data warehousing, data integration, and Big Data analytics with SQL and Spark processing capabilities in the same environment.
Azure Databricks
Unified analytics service based on Apache Spark in Azure, offering a collaborative environment for Big Data processing, machine learning, and real-time analytics with optimized clusters.
Google Cloud Storage
Google Cloud's unified object storage service offering high availability, durability, and performance for Big Data with different storage classes optimized based on access frequencies.
Google BigQuery
Google Cloud's serverless data warehouse enabling real-time analysis of petabytes with interactive SQL queries and a serverless architecture that automatically scales according to needs.
Google Dataproc
Google Cloud's managed service for running Apache Spark and Hadoop with quickly provisioned clusters, offering native integration with the GCP ecosystem and optimized costs for Big Data processing.
Google Dataflow
Google Cloud's serverless stream and batch processing service based on Apache Beam, enabling execution of distributed data pipelines with automatic autoscaling and simplified management.
Snowflake
Multi-cloud Data Cloud platform offering a fully managed data warehouse with compute architecture separated from storage, enabling independent scaling and secure data sharing.
ELT Pipeline
Modern data integration pattern where data is first loaded raw into a cloud warehouse then transformed using its computing capabilities, optimizing performance for massive volumes.
Auto-scaling Cluster
Capability of cloud Big Data platforms to dynamically adjust the number of compute nodes based on workload, optimizing costs and performance without manual intervention.
Serverless Analytics
Data analytics paradigm where the underlying infrastructure is fully managed by the cloud provider, allowing users to focus on analytical logic without managing servers or clusters.