Vector Databases — AI/ML Embeddings and Similarity Search
A practical guide to vector databases: embeddings, similarity search, approximate nearest neighbors, and choosing between Pinecone, Weaviate, pgvector, and Chroma.
Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.
Overview
Vector databases store high-dimensional numerical vectors (embeddings) generated by machine learning models and enable similarity search via Approximate Nearest Neighbor (ANN) algorithms. They power semantic search, recommendation systems, image retrieval, and Retrieval-Augmented Generation (RAG) for LLMs. Unlike traditional databases that search by exact match or range, vector databases find the “closest” vectors in embedding space — the mathematical representation of meaning, image features, or audio signatures.
When to Use
- You need semantic search (find similar meaning, not just keyword match)
- LLM RAG pipelines require retrieving relevant context chunks
- Recommendation systems suggest items similar to user preferences
- Image, audio, or video retrieval by content similarity
- You have pre-trained embedding models and need scalable vector storage
How Vector Search Works
- Embedding: A model (OpenAI, BERT, CLIP) converts text/image into a dense vector (e.g., 768-1536 dimensions)
- Indexing: Vectors are organized into an ANN index (HNSW, IVF, PQ) for fast retrieval
- Query: The query is embedded and the index returns the K nearest neighbors
- Metadata filtering: Combine vector similarity with traditional filtering (date, category, user ID)
Comparison
| Database | Deployment | Index | Best For |
|---|---|---|---|
| Pinecone | Managed cloud | HNSW, metadata filters | Production RAG, no ops overhead |
| Weaviate | Self-hosted / cloud | HNSW, BM25 hybrid | Multi-modal, GraphQL interface |
| pgvector | PostgreSQL extension | ivfflat, hnsw | Teams already on Postgres |
| Chroma | Embedded / local | HNSW | Prototyping, small-scale local RAG |
| Milvus/Zilliz | Self-hosted / cloud | IVF, HNSW, GPU | Large-scale, high throughput |
pgvector Example
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create table with vector column
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT,
embedding vector(1536)
);
-- Create HNSW index for fast ANN search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
-- Insert a document with embedding
INSERT INTO documents (title, content, embedding)
VALUES ('Vector DB Guide', 'A guide to vector databases...', '[0.12, -0.03, ...]');
-- Semantic search: find 5 most similar documents
SELECT id, title, content,
1 - (embedding <=> '[0.11, -0.02, ...]') as similarity
FROM documents
ORDER BY embedding <=> '[0.11, -0.02, ...]'
LIMIT 5;
Hybrid Search (Vector + Keyword)
# Weaviate: combine vector similarity with BM25 keyword search
import weaviate
client = weaviate.connect_to_local()
results = client.collections.get("Article").query.hybrid(
query="vector database architecture",
vector=[0.12, -0.03, ...],
alpha=0.5, # 0 = pure BM25, 1 = pure vector
limit=10
)
RAG Pipeline Example
from openai import OpenAI
import chromadb
# 1. Load and chunk documents
chunks = load_and_chunk_documents("knowledge_base/")
# 2. Embed and store in Chroma
client = chromadb.Client()
collection = client.create_collection("docs")
embeddings = openai_client.embeddings.create(input=chunks, model="text-embedding-3-small")
collection.add(ids=ids, documents=chunks, embeddings=[e.embedding for e in embeddings.data])
# 3. Retrieve relevant chunks for a query
query_embedding = openai_client.embeddings.create(input="How do vector indexes work?", model="text-embedding-3-small")
results = collection.query(query_embeddings=[query_embedding.data[0].embedding], n_results=5)
# 4. Augment LLM prompt with retrieved context
context = "\n".join([r["document"] for r in results["documents"][0]])
prompt = f"Context:\n{context}\n\nQuestion: How do vector indexes work?"
response = openai_client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": prompt}])
ANN Algorithms
| Algorithm | Type | Speed | Memory | Best For |
|---|---|---|---|---|
| HNSW | Graph-based | Fast | High | General purpose, high recall |
| IVF | Clustering | Medium | Medium | Large datasets, memory-constrained |
| PQ | Quantization | Fast | Low | Billions of vectors, acceptable recall loss |
Common Mistakes
- Wrong distance metric — cosine similarity for semantic text, Euclidean for image features, dot product for normalized embeddings
- No metadata filtering — pure vector search returns irrelevant results; always combine with metadata filters
- Ignoring index tuning — default HNSW parameters may not suit your recall/latency requirements
- Storing raw vectors without indexing — full brute-force scan is O(n) and unusable at scale
- Using a vector DB for structured queries — combine with a relational database; vector DBs are poor at aggregation and joins
FAQ
Do I need a dedicated vector database or can I use Postgres?
For small-to-medium scale (< 1M vectors) and teams already on Postgres, pgvector is sufficient. For high-scale, multi-tenant, or managed needs, Pinecone or Weaviate are better.
How do I choose embedding dimensions? Use the output dimension of your chosen model (OpenAI text-embedding-3-small = 1536, BERT-base = 768). Do not arbitrarily reduce dimensions without quantization-aware training.
Can I update vectors in place? Yes, but re-indexing may be required depending on the database and index type. Some systems support incremental updates; others require full rebuilds.
Related Resources
Graph Databases — Neo4j and Property Graph Modeling
A practical guide to graph databases: property graph model, Cypher query language, modeling patterns, and when to choose Neo4j over relational databases.
GuideNoSQL Data Modeling Patterns — Document, Key-Value, Wide-Column, Graph
A practical guide to NoSQL data modeling: embedding vs referencing, access pattern-driven design, and patterns for MongoDB, DynamoDB, Cassandra, and Redis.