Hybrid Retrieval

📖

thuật ngữ

Retrieval approach combining vector search and keyword-based methods to simultaneously optimize precision and recall in RAG systems. This technique leverages the strengths of semantic search and lexical search for more comprehensive results.

📖

thuật ngữ

Vector Search

Search method based on semantic similarity of vector embeddings in high-dimensional multidimensional space. Allows finding relevant documents even without exact keyword matches through contextual understanding.

📖

thuật ngữ

Keyword Search

Traditional search technique based on exact or partial term matching between documents and queries. Uses algorithms like BM25 to evaluate relevance based on term frequency and distribution.

📖

thuật ngữ

Sparse Retrieval

Search method using sparse text representations based on the presence or absence of specific terms. More computationally efficient and excellent for exact keyword matches.

📖

thuật ngữ

Reciprocal Rank Fusion

Search result fusion algorithm combining rankings from multiple search systems using a harmonic weighting formula. Enables robust ranking by leveraging the complementarity of approaches.

📖

thuật ngữ

BM25 Algorithm

Probabilistic ranking algorithm based on term frequency and document length, widely used in keyword-based search engines. Considered state-of-the-art for lexical search in hybrid systems.

📖

thuật ngữ

FAISS

Facebook AI's optimized library for fast similarity search in high-dimensional vector spaces. Essential for efficiently implementing the vector component of hybrid retrieval systems.

📖

thuật ngữ

Cross-Encoder

Neural model architecture that simultaneously encodes queries and documents to predict their mutual relevance. More accurate but slower than bi-encoders, often used for re-ranking hybrid results.

📖

thuật ngữ

Bi-Encoder

Model architecture that separately encodes queries and documents into independent vectors for efficient vector search. Fundamental for the dense component of large-scale hybrid retrieval systems.

📖

thuật ngữ

Re-ranking

Process of re-evaluating and reordering initial search results using more complex models to improve final accuracy. Crucial step in hybrid pipelines to refine the selection of the most relevant documents.

📖

thuật ngữ

Semantic Similarity

Measure of conceptual proximity between two texts based on their meaning rather than exact words. Typically calculated via cosine distance between their embeddings in hybrid systems.

📖

thuật ngữ

Embedding Fusion

Technique combining multiple types of embeddings or vector representations to capture different semantic aspects of text. Improves the robustness of vector search in multi-modal hybrid systems.

📖

thuật ngữ

Query Understanding

Process of analyzing and interpreting user intentions in queries to optimize hybrid search strategy. Involves entity detection, intent classification, and semantic expansion.

📖

thuật ngữ

ColBERT

Contextual retrieval model using token-level embeddings rather than document-level embeddings for maximum granularity. Enables fine token-to-token comparisons in hybrid retrieval systems.

📖

thuật ngữ

Late Fusion

Combination strategy where vector search and keyword search results are merged after their individual evaluation. Flexible approach allowing dynamic weighting based on query characteristics.

📖

thuật ngữ

Early Fusion

Hybrid approach combining vector and lexical features at the indexing or document representation level. Allows deep integration of signals but with less adaptation flexibility.

📖

thuật ngữ

Dense Passage Retriever

Model specialized in retrieving relevant passages using BERT encoders to generate high-quality embeddings. Key component for vector search in hybrid RAG systems.

Thuật ngữ AI