🏠 Ana Sayfa
Benchmarklar
📊 Tüm Benchmarklar 🦖 Dinozor v1 🦖 Dinozor v2 ✅ To-Do List Uygulamaları 🎨 Yaratıcı Serbest Sayfalar 🎯 FSACB - Nihai Gösteri 🌍 Çeviri Benchmarkı
Modeller
🏆 En İyi 10 Model 🆓 Ücretsiz Modeller 📋 Tüm Modeller ⚙️ Kilo Code
Kaynaklar
💬 Prompt Kütüphanesi 📖 YZ Sözlüğü 🔗 Faydalı Bağlantılar

YZ Sözlüğü

Yapay Zekanın tam sözlüğü

162
kategoriler
2.032
alt kategoriler
23.060
terimler
📖
terimler

Apache Spark

Open-source distributed processing framework designed in-memory to accelerate Big Data analytics with optimized parallel execution.

📖
terimler

RDD (Resilient Distributed Dataset)

Fundamental data structure of Spark, immutable and partitioned, enabling fault tolerance through reconstruction of lost data.

📖
terimler

DataFrame

Distributed data collection organized into named columns, similar to a database table, optimized for structured queries.

📖
terimler

Spark SQL

Spark module integrating SQL queries and DataFrame operations with automatic optimization via the Catalyst Optimizer.

📖
terimler

Spark Streaming

Spark extension enabling real-time data stream processing with micro-batches for near-real-time latency.

📖
terimler

MLlib

Spark's distributed machine learning library providing classification, regression, clustering, and recommendation algorithms.

📖
terimler

GraphX

Spark API for distributed graph processing, combining the advantages of graphs with RDD performance.

📖
terimler

DAG (Directed Acyclic Graph)

Representation of Spark execution plan for transformations, optimized to eliminate redundancies and parallelize processing.

📖
terimler

Spark Driver

Main process coordinating Spark task execution, creating the SparkContext and dividing operations into stages.

📖
terimler

Spark Executor

Worker process executing tasks assigned by the Driver on each cluster node, managing memory and partitioned data.

📖
terimler

Spark Context

Main entry point of the Spark application, managing cluster connections and coordinating access to distributed resources.

📖
terimler

Partition

Logical unit of data distribution in Spark, enabling parallelism by dividing RDDs/DataFrames into independent fragments.

📖
terimler

Shuffle

Costly data redistribution operation between partitions, necessary during aggregations, joins, or groupings in Spark.

📖
terimler

Catalyst Optimizer

Spark query optimization engine transforming and reorganizing execution plans to improve performance.

📖
terimler

Tungsten

Spark execution backend optimizing memory and CPU through binary data management and bytecode generation.

📖
terimler

Cache/Persist

Mechanism for persisting RDDs/DataFrames in memory or on disk for fast reuse and to avoid costly recalculations.

📖
terimler

Broadcast Variable

Read-only variable efficiently distributed to all executors to minimize network transfers during joins.

📖
terimler

Accumulator

Additive shared variable used to aggregate information from parallel tasks in a thread-safe manner.

📖
terimler

Transformation

Lazy operation creating a new RDD/DataFrame without immediate execution, deferred until a triggering action.

📖
terimler

Action

Operation triggering the execution of the DAG plan to produce a result, forcing the computation of all previous transformations.

🔍

Sonuç bulunamadı