YZ Sözlüğü
Yapay Zekanın tam sözlüğü
End-to-End Latency
Measurement of the total time elapsed between a user sending a request and receiving the complete response, including all processing steps of the QA system.
Semantic Cache
Mechanism for temporarily storing answers based on the semantic similarity of queries, allowing for quick serving of pre-computed answers for similar questions without recalculation.
Real-Time Inverted Index
A data structure that continuously updates the mapping of terms to documents, enabling instant querying of newly added or modified data.
Dense Retrieval Model
An approach using vector embeddings to represent documents and queries in a common semantic space, optimized for fast and accurate search.
Online Neural Reranking
The process of re-evaluating search results by a deep learning model applied dynamically to refine the order of the most relevant answers.
Asynchronous Processing Pipeline
An architecture where processing steps run in parallel without blocking the main flow, reducing the user-perceived latency.
Pre-computation of Representations
A strategy involving generating and storing document encoding vectors in advance to eliminate this costly step during real-time queries.
Knowledge Sharding
Horizontal partitioning of the knowledge base across multiple nodes to parallelize searches and increase the throughput of simultaneous queries.
Low-Latency Filtering
Fast filtering layer using heuristics or lightweight models to eliminate irrelevant candidates before processing by more complex models.
Response Streaming
Method of transmitting responses in successive fragments as they are generated, improving the perceived response time for long answers.
Vector Pruning
Process of reducing the search space by eliminating less relevant vectors based on pre-calculated distance or similarity metrics.
Batched GPU Inference
Optimization technique that groups multiple requests to process them simultaneously on a GPU, maximizing resource utilization and reducing per-request latency.
Hybrid Search System
Architecture combining keyword-based (sparse) and semantic (dense) search to balance precision and recall while maintaining low latency.
Persistent Connection (WebSocket)
Bidirectional communication protocol kept open between client and server, allowing instant exchanges without connection overhead for each request.
Multi-Level Caching
Strategy for storing responses at multiple layers (e.g., memory, Redis, CDN) to serve requests from the fastest available cache.
Request Path Optimization
Analysis and refinement of a request's journey through the system to eliminate bottlenecks and minimize each network hop or processing step.