Thuật ngữ AI
Từ điển đầy đủ về Trí tuệ nhân tạo
Hybrid Retrieval
Retrieval approach combining vector search and keyword-based methods to simultaneously optimize precision and recall in RAG systems. This technique leverages the strengths of semantic search and lexical search for more comprehensive results.
Vector Search
Search method based on semantic similarity of vector embeddings in high-dimensional multidimensional space. Allows finding relevant documents even without exact keyword matches through contextual understanding.
Keyword Search
Traditional search technique based on exact or partial term matching between documents and queries. Uses algorithms like BM25 to evaluate relevance based on term frequency and distribution.
Sparse Retrieval
Search method using sparse text representations based on the presence or absence of specific terms. More computationally efficient and excellent for exact keyword matches.
Reciprocal Rank Fusion
Search result fusion algorithm combining rankings from multiple search systems using a harmonic weighting formula. Enables robust ranking by leveraging the complementarity of approaches.
BM25 Algorithm
Probabilistic ranking algorithm based on term frequency and document length, widely used in keyword-based search engines. Considered state-of-the-art for lexical search in hybrid systems.
FAISS
Facebook AI's optimized library for fast similarity search in high-dimensional vector spaces. Essential for efficiently implementing the vector component of hybrid retrieval systems.
Cross-Encoder
Neural model architecture that simultaneously encodes queries and documents to predict their mutual relevance. More accurate but slower than bi-encoders, often used for re-ranking hybrid results.
Bi-Encoder
Model architecture that separately encodes queries and documents into independent vectors for efficient vector search. Fundamental for the dense component of large-scale hybrid retrieval systems.
Re-ranking
Process of re-evaluating and reordering initial search results using more complex models to improve final accuracy. Crucial step in hybrid pipelines to refine the selection of the most relevant documents.
Semantic Similarity
Measure of conceptual proximity between two texts based on their meaning rather than exact words. Typically calculated via cosine distance between their embeddings in hybrid systems.
Embedding Fusion
Technique combining multiple types of embeddings or vector representations to capture different semantic aspects of text. Improves the robustness of vector search in multi-modal hybrid systems.
Query Understanding
Process of analyzing and interpreting user intentions in queries to optimize hybrid search strategy. Involves entity detection, intent classification, and semantic expansion.
ColBERT
Contextual retrieval model using token-level embeddings rather than document-level embeddings for maximum granularity. Enables fine token-to-token comparisons in hybrid retrieval systems.
Late Fusion
Combination strategy where vector search and keyword search results are merged after their individual evaluation. Flexible approach allowing dynamic weighting based on query characteristics.
Early Fusion
Hybrid approach combining vector and lexical features at the indexing or document representation level. Allows deep integration of signals but with less adaptation flexibility.
Dense Passage Retriever
Model specialized in retrieving relevant passages using BERT encoders to generate high-quality embeddings. Key component for vector search in hybrid RAG systems.