Vector Indexing - Bảng thuật ngữ Trí tuệ nhân tạo

📖

thuật ngữ

Vector Embedding

Dense numerical representation of a textual or visual object in a multidimensional vector space, capturing its fundamental semantic characteristics. These embeddings enable machines to understand and compare the meaning of data in a quantitative manner.

📖

thuật ngữ

Vector Database

Specialized database optimized for efficiently storing, indexing, and querying high-dimensional vector representations. It uses advanced indexing structures like HNSW or IVF to accelerate similarity searches.

📖

thuật ngữ

Semantic Search

Search method that understands the intent and semantic context behind a query rather than relying solely on exact keyword matches. It uses embeddings to find conceptually similar documents even without shared vocabulary.

📖

thuật ngữ

Dimensionality Reduction

Algorithmic process that reduces the number of dimensions in embeddings while preserving important semantic relationships. Techniques like PCA or t-SNE optimize storage and accelerate similarity calculations.

📖

thuật ngữ

Vector Index

Optimized data structure that organizes vectors to enable fast nearest neighbor searches without exhaustive comparison. Indexes like HNSW, IVF or LSH significantly reduce the time complexity of queries.

📖

thuật ngữ

Vector Normalization

Process of scaling vectors to have unit norm, standardizing cosine similarity comparisons. This technique eliminates biases related to vector magnitude and focuses solely on their semantic direction.

📖

thuật ngữ

Embedding Model

Pre-trained neural network that transforms text or other data into dense vector representations. Models like BERT, Sentence-BERT or OpenAI embeddings capture different semantic nuances based on their architecture.

📖

thuật ngữ

HNSW (Hierarchical Navigable Small World)

Graph indexing structure that creates multiple layers of connections to accelerate nearest neighbor search. It offers an excellent compromise between construction speed, memory efficiency, and search quality.

📖

thuật ngữ

IVF (Inverted File Index)

Indexing technique that partitions the vector space into regions (inverted lists) to limit searches to relevant areas. It combines coarse and fine quantizers to balance precision and performance in ANN searches.

📖

thuật ngữ

Distance Metrics

Mathematical functions that quantify the dissimilarity between two vectors in the embedding space. Common metrics include Euclidean distance, cosine similarity, and Manhattan distance, each suited to different use cases.

📖

thuật ngữ

Vector Store

RAG architecture component responsible for efficiently storing and retrieving document embeddings. It manages persistence, indexing, and querying of vectors to power the augmented generation system.

📖

thuật ngữ

Dense Retrieval

Information retrieval approach that uses dense embeddings to capture deep semantic relationships between documents and queries. It outperforms sparse methods like TF-IDF in understanding context and intent.

📖

thuật ngữ

Embedding Cache

Caching system that stores precomputed embeddings to avoid redundant calculations and speed up responses. It is crucial for the performance of RAG systems handling recurring or similar queries.

📖

thuật ngữ

Chunk Embedding

Process of creating embeddings for document segments rather than entire documents, enabling more granular and precise retrieval. The optimal chunk size depends on the domain and context requirements.

📖

thuật ngữ

Vector Metadata

Information associated with each vector including source document identifier, timestamps, relevance scores, or other filterable attributes. Metadata enables precise refinement of search results.

Thuật ngữ AI

Vector Embedding

Vector Database

Semantic Search

Dimensionality Reduction

Vector Index

Vector Normalization

Embedding Model

HNSW (Hierarchical Navigable Small World)

IVF (Inverted File Index)

Distance Metrics

Vector Store

Dense Retrieval

Embedding Cache

Chunk Embedding

Vector Metadata

Không tìm thấy kết quả