Skip to content
SP StackPractices
intermediate By StackPractices

Vector Databases — AI/ML Embeddings and Similarity Search

A practical guide to vector databases: embeddings, similarity search, approximate nearest neighbors, and choosing between Pinecone, Weaviate, pgvector, and Chroma.

Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.

Overview

Vector databases store high-dimensional numerical vectors (embeddings) generated by machine learning models and enable similarity search via Approximate Nearest Neighbor (ANN) algorithms. They power semantic search, recommendation systems, image retrieval, and Retrieval-Augmented Generation (RAG) for LLMs. Unlike traditional databases that search by exact match or range, vector databases find the “closest” vectors in embedding space — the mathematical representation of meaning, image features, or audio signatures.

When to Use

  • You need semantic search (find similar meaning, not just keyword match)
  • LLM RAG pipelines require retrieving relevant context chunks
  • Recommendation systems suggest items similar to user preferences
  • Image, audio, or video retrieval by content similarity
  • You have pre-trained embedding models and need scalable vector storage

How Vector Search Works

  1. Embedding: A model (OpenAI, BERT, CLIP) converts text/image into a dense vector (e.g., 768-1536 dimensions)
  2. Indexing: Vectors are organized into an ANN index (HNSW, IVF, PQ) for fast retrieval
  3. Query: The query is embedded and the index returns the K nearest neighbors
  4. Metadata filtering: Combine vector similarity with traditional filtering (date, category, user ID)

Comparison

DatabaseDeploymentIndexBest For
PineconeManaged cloudHNSW, metadata filtersProduction RAG, no ops overhead
WeaviateSelf-hosted / cloudHNSW, BM25 hybridMulti-modal, GraphQL interface
pgvectorPostgreSQL extensionivfflat, hnswTeams already on Postgres
ChromaEmbedded / localHNSWPrototyping, small-scale local RAG
Milvus/ZillizSelf-hosted / cloudIVF, HNSW, GPULarge-scale, high throughput

pgvector Example

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create table with vector column
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT,
    embedding vector(1536)
);

-- Create HNSW index for fast ANN search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

-- Insert a document with embedding
INSERT INTO documents (title, content, embedding)
VALUES ('Vector DB Guide', 'A guide to vector databases...', '[0.12, -0.03, ...]');

-- Semantic search: find 5 most similar documents
SELECT id, title, content,
    1 - (embedding <=> '[0.11, -0.02, ...]') as similarity
FROM documents
ORDER BY embedding <=> '[0.11, -0.02, ...]'
LIMIT 5;

Hybrid Search (Vector + Keyword)

# Weaviate: combine vector similarity with BM25 keyword search
import weaviate

client = weaviate.connect_to_local()

results = client.collections.get("Article").query.hybrid(
    query="vector database architecture",
    vector=[0.12, -0.03, ...],
    alpha=0.5,  # 0 = pure BM25, 1 = pure vector
    limit=10
)

RAG Pipeline Example

from openai import OpenAI
import chromadb

# 1. Load and chunk documents
chunks = load_and_chunk_documents("knowledge_base/")

# 2. Embed and store in Chroma
client = chromadb.Client()
collection = client.create_collection("docs")
embeddings = openai_client.embeddings.create(input=chunks, model="text-embedding-3-small")
collection.add(ids=ids, documents=chunks, embeddings=[e.embedding for e in embeddings.data])

# 3. Retrieve relevant chunks for a query
query_embedding = openai_client.embeddings.create(input="How do vector indexes work?", model="text-embedding-3-small")
results = collection.query(query_embeddings=[query_embedding.data[0].embedding], n_results=5)

# 4. Augment LLM prompt with retrieved context
context = "\n".join([r["document"] for r in results["documents"][0]])
prompt = f"Context:\n{context}\n\nQuestion: How do vector indexes work?"
response = openai_client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": prompt}])

ANN Algorithms

AlgorithmTypeSpeedMemoryBest For
HNSWGraph-basedFastHighGeneral purpose, high recall
IVFClusteringMediumMediumLarge datasets, memory-constrained
PQQuantizationFastLowBillions of vectors, acceptable recall loss

Common Mistakes

  • Wrong distance metric — cosine similarity for semantic text, Euclidean for image features, dot product for normalized embeddings
  • No metadata filtering — pure vector search returns irrelevant results; always combine with metadata filters
  • Ignoring index tuning — default HNSW parameters may not suit your recall/latency requirements
  • Storing raw vectors without indexing — full brute-force scan is O(n) and unusable at scale
  • Using a vector DB for structured queries — combine with a relational database; vector DBs are poor at aggregation and joins

FAQ

Do I need a dedicated vector database or can I use Postgres? For small-to-medium scale (< 1M vectors) and teams already on Postgres, pgvector is sufficient. For high-scale, multi-tenant, or managed needs, Pinecone or Weaviate are better.

How do I choose embedding dimensions? Use the output dimension of your chosen model (OpenAI text-embedding-3-small = 1536, BERT-base = 768). Do not arbitrarily reduce dimensions without quantization-aware training.

Can I update vectors in place? Yes, but re-indexing may be required depending on the database and index type. Some systems support incremental updates; others require full rebuilds.