Vector Indexing - 인공지능 용어집

📖

용어

Vector Embedding

Dense numerical representation of a textual or visual object in a multidimensional vector space, capturing its fundamental semantic characteristics. These embeddings enable machines to understand and compare the meaning of data in a quantitative manner.

📖

용어

Vector Database

Specialized database optimized for efficiently storing, indexing, and querying high-dimensional vector representations. It uses advanced indexing structures like HNSW or IVF to accelerate similarity searches.

📖

용어

Semantic Search

Search method that understands the intent and semantic context behind a query rather than relying solely on exact keyword matches. It uses embeddings to find conceptually similar documents even without shared vocabulary.

📖

용어

Dimensionality Reduction

Algorithmic process that reduces the number of dimensions in embeddings while preserving important semantic relationships. Techniques like PCA or t-SNE optimize storage and accelerate similarity calculations.

📖

용어

Vector Index

Optimized data structure that organizes vectors to enable fast nearest neighbor searches without exhaustive comparison. Indexes like HNSW, IVF or LSH significantly reduce the time complexity of queries.

📖

용어

Vector Normalization

Process of scaling vectors to have unit norm, standardizing cosine similarity comparisons. This technique eliminates biases related to vector magnitude and focuses solely on their semantic direction.

📖

용어

Embedding Model

Pre-trained neural network that transforms text or other data into dense vector representations. Models like BERT, Sentence-BERT or OpenAI embeddings capture different semantic nuances based on their architecture.

📖

용어

HNSW (Hierarchical Navigable Small World)

Graph indexing structure that creates multiple layers of connections to accelerate nearest neighbor search. It offers an excellent compromise between construction speed, memory efficiency, and search quality.

📖

용어

IVF (Inverted File Index)

Indexing technique that partitions the vector space into regions (inverted lists) to limit searches to relevant areas. It combines coarse and fine quantizers to balance precision and performance in ANN searches.

📖

용어

Distance Metrics

Mathematical functions that quantify the dissimilarity between two vectors in the embedding space. Common metrics include Euclidean distance, cosine similarity, and Manhattan distance, each suited to different use cases.

📖

용어

Vector Store

RAG architecture component responsible for efficiently storing and retrieving document embeddings. It manages persistence, indexing, and querying of vectors to power the augmented generation system.

📖

용어

Dense Retrieval

Information retrieval approach that uses dense embeddings to capture deep semantic relationships between documents and queries. It outperforms sparse methods like TF-IDF in understanding context and intent.

📖

용어

Embedding Cache

Caching system that stores precomputed embeddings to avoid redundant calculations and speed up responses. It is crucial for the performance of RAG systems handling recurring or similar queries.

📖

용어

Chunk Embedding

Process of creating embeddings for document segments rather than entire documents, enabling more granular and precise retrieval. The optimal chunk size depends on the domain and context requirements.

📖

용어

Vector Metadata

Information associated with each vector including source document identifier, timestamps, relevance scores, or other filterable attributes. Metadata enables precise refinement of search results.

AI 용어집