Vector Databases — AI/ML Embeddings and Similarity Search

Overview

Vector databases store high-dimensional numerical vectors (embeddings) generated by machine learning models and enable similarity search via Approximate Nearest Neighbor (ANN) algorithms. They power semantic search, recommendation systems, image retrieval, and Retrieval-Augmented Generation (RAG) for LLMs. Unlike traditional databases that search by exact match or range, vector databases find the “closest” vectors in embedding space — the mathematical representation of meaning, image features, or audio signatures.

When to Use

You need semantic search (find similar meaning, not just keyword match)
LLM RAG pipelines require retrieving relevant context chunks
Recommendation systems suggest items similar to user preferences
Image, audio, or video retrieval by content similarity
You have pre-trained embedding models and need scalable vector storage

How Vector Search Works

Embedding: A model (OpenAI, BERT, CLIP) converts text/image into a dense vector (e.g., 768-1536 dimensions)
Indexing: Vectors are organized into an ANN index (HNSW, IVF, PQ) for fast retrieval
Query: The query is embedded and the index returns the K nearest neighbors
Metadata filtering: Combine vector similarity with traditional filtering (date, category, user ID)

Comparison

Database	Deployment	Index	Best For
Pinecone	Managed cloud	HNSW, metadata filters	Production RAG, no ops overhead
Weaviate	Self-hosted / cloud	HNSW, BM25 hybrid	Multi-modal, GraphQL interface
pgvector	PostgreSQL extension	ivfflat, hnsw	Teams already on Postgres
Chroma	Embedded / local	HNSW	Prototyping, small-scale local RAG
Milvus/Zilliz	Self-hosted / cloud	IVF, HNSW, GPU	Large-scale, high throughput

pgvector Example

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create table with vector column
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT,
    embedding vector(1536)
);

-- Create HNSW index for fast ANN search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

-- Insert a document with embedding
INSERT INTO documents (title, content, embedding)
VALUES ('Vector DB Guide', 'A guide to vector databases...', '[0.12, -0.03, ...]');

-- Semantic search: find 5 most similar documents
SELECT id, title, content,
    1 - (embedding <=> '[0.11, -0.02, ...]') as similarity
FROM documents
ORDER BY embedding <=> '[0.11, -0.02, ...]'
LIMIT 5;

Hybrid Search (Vector + Keyword)

# Weaviate: combine vector similarity with BM25 keyword search
import weaviate

client = weaviate.connect_to_local()

results = client.collections.get("Article").query.hybrid(
    query="vector database architecture",
    vector=[0.12, -0.03, ...],
    alpha=0.5,  # 0 = pure BM25, 1 = pure vector
    limit=10
)

RAG Pipeline Example

from openai import OpenAI
import chromadb

# 1. Load and chunk documents
chunks = load_and_chunk_documents("knowledge_base/")

# 2. Embed and store in Chroma
client = chromadb.Client()
collection = client.create_collection("docs")
embeddings = openai_client.embeddings.create(input=chunks, model="text-embedding-3-small")
collection.add(ids=ids, documents=chunks, embeddings=[e.embedding for e in embeddings.data])

# 3. Retrieve relevant chunks for a query
query_embedding = openai_client.embeddings.create(input="How do vector indexes work?", model="text-embedding-3-small")
results = collection.query(query_embeddings=[query_embedding.data[0].embedding], n_results=5)

# 4. Augment LLM prompt with retrieved context
context = "\n".join([r["document"] for r in results["documents"][0]])
prompt = f"Context:\n{context}\n\nQuestion: How do vector indexes work?"
response = openai_client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": prompt}])

ANN Algorithms

Algorithm	Type	Speed	Memory	Best For
HNSW	Graph-based	Fast	High	General purpose, high recall
IVF	Clustering	Medium	Medium	Large datasets, memory-constrained
PQ	Quantization	Fast	Low	Billions of vectors, acceptable recall loss

Common Mistakes

Wrong distance metric — cosine similarity for semantic text, Euclidean for image features, dot product for normalized embeddings
No metadata filtering — pure vector search returns irrelevant results; always combine with metadata filters
Ignoring index tuning — default HNSW parameters may not suit your recall/latency requirements
Storing raw vectors without indexing — full brute-force scan is O(n) and unusable at scale
Using a vector DB for structured queries — combine with a relational database; vector DBs are poor at aggregation and joins

FAQ

Do I need a dedicated vector database or can I use Postgres? For small-to-medium scale (< 1M vectors) and teams already on Postgres, pgvector is sufficient. For high-scale, multi-tenant, or managed needs, Pinecone or Weaviate are better.

How do I choose embedding dimensions? Use the output dimension of your chosen model (OpenAI text-embedding-3-small = 1536, BERT-base = 768). Do not arbitrarily reduce dimensions without quantization-aware training.

Can I update vectors in place? Yes, but re-indexing may be required depending on the database and index type. Some systems support incremental updates; others require full rebuilds.

Vector Databases — AI/ML Embeddings and Similarity Search

Overview

When to Use

How Vector Search Works

Comparison

pgvector Example

Hybrid Search (Vector + Keyword)

RAG Pipeline Example

ANN Algorithms

Common Mistakes

FAQ

Graph Databases — Neo4j and Property Graph Modeling

NoSQL Data Modeling Patterns — Document, Key-Value, Wide-Column, Graph

Overview

When to Use

How Vector Search Works

Comparison

pgvector Example

Hybrid Search (Vector + Keyword)

RAG Pipeline Example

ANN Algorithms

Common Mistakes

FAQ

Related Resources

Graph Databases — Neo4j and Property Graph Modeling

NoSQL Data Modeling Patterns — Document, Key-Value, Wide-Column, Graph