RAG, semantic search, recommendations — the unsung workhorse behind all of them is the embedding (vector). In a phrase, it's "a technique for converting the meaning of words into a sequence of numbers." Unglamorous, but it's the foundation of search and knowledge use in the AI era.

This article lays out, for beginners, what an embedding is, why it can measure meaning, what it's used for, how to choose a model, and vector DBs and how to get started.

EMBEDDING · TURN MEANING INTO NUMBERS

The closer the meaning, the closer the vector

— the foundation of search, RAG, classification, and recommendations

🔢

Turn meaning into numbers

Convert text into a "sequence of numbers" a machine can work with.

📍

Close = similar

Words close in meaning sit at close positions in the space.

🔎

Search by meaning

Find things by "closeness of meaning," not exact word match.

1. What Is an Embedding (Vector)?

An embedding is the "meaning" of text (or an image, etc.) converted into a sequence of numbers — a vector. For example, the word "dog" gets replaced by a list of hundreds to thousands of numbers like [0.21, -0.78, 0.34, ...]. To a human it looks like meaningless numbers, but this sequence is a set of "coordinates of meaning."

Picture a "map of meaning." Just as cities that are close on a map are geographically near, in the embedding space words close in meaning are placed near each other. "Dog" and "puppy" are close; "dog" and "car" are far. Being able to compute this "distance" is the whole point.

💡 In one line: an embedding = "a technique that converts the meaning of words into numeric coordinates." A computer can't directly understand the meaning of text, but once it's numbers it can compute "closeness of meaning."

2. Why "Closeness" Can Measure Meaning

Embeddings are built by learning, from huge amounts of text, "which words tend to be used together." As a result, words used in similar contexts get similar numbers. The closeness of two vectors can be quantified with measures like cosine similarity, where closer to 1 means "more similar in meaning."

Closeness in meaning to "dog" (illustrative)

dog ↔ puppyvery close
dog ↔ catclose (animal)
dog ↔ carfar

※ A conceptual illustration. In a famous example, semantic relationships show up as vector arithmetic — "king − man + woman ≈ queen."

A real vector is made of hundreds to thousands of numbers (dimensions), and that sheer count expresses countless facets of meaning — "is it an animal?", "a vehicle?", "big or small?" — bit by bit. More dimensions capture finer nuance, but storage and compute costs rise accordingly.

So even when the characters don't match, a machine can judge "whether the meaning is close." That's the real mechanism behind treating "AI" and "artificial intelligence" as the same thing, or finding a document phrased as "steps to cancel and get a refund" from a question like "I want my money back."

3. What's It Used For? (RAG, Semantic Search)

Embeddings are rarely used alone — they underpin various features built on "closeness of meaning." Here are the main uses.

RAG (retrieval-augmented generation)

Find documents close in meaning to the question and hand them to the AI as grounding. The heart of RAG.

Semantic search

Search by meaning, not keyword match. It's found even when worded differently.

Classification & dedup

Auto-sort inquiries, and find similar or duplicate documents.

Recommendations

Surface "products or articles similar to this" by closeness of meaning.

In particular, RAG can't exist without embeddings. A system that searches internal documents and has the AI answer works by vectorizing the documents in advance. Beyond text, multimodal embeddings that place images and audio in the same space are spreading too.

4. How to Choose an Embedding Model

Embeddings are produced by a dedicated "embedding model." There are broadly two options.

API type (easy, no GPU)

OpenAI (text-embedding-3), Cohere, Google Gemini, Voyage, and others. Just call the API — no infrastructure needed. The easy way to start.

Open-source type (free, self-hosted)

BGE-M3, Nomic Embed, Qwen3, and others. Free to use, but you need an environment to run it. Good for privacy and cost.

💡 Matryoshka: some newer models let you shrink the number of dimensions after the fact. For instance, reducing 3,072 dimensions to 1,024 reportedly keeps about 95% of the quality while cutting storage and search cost to roughly a third. Handy for balancing cost and accuracy.

※ Model names and figures are cited from various guides and disclosures (as of June 2026). The best model varies with language, use case, and budget, so the sure way is to try and choose.

5. Vector DBs and How to Start

The embeddings you create are stored in a vector database (vector DB). It's a specialized DB for quickly finding "the ones close to the question" among huge numbers of vectors — examples include Pinecone, Weaviate, Qdrant, Chroma, and pgvector. This becomes the "search engine" for semantic search and RAG.

Getting started is simple.

  • ① Pick one embedding model: an API type (e.g., OpenAI's text-embedding-3-small) is easy to begin with.
  • ② Vectorize and store documents: turn your documents into vectors with the model and put them in the vector DB.
  • ③ Vectorize the question and search: vectorize the question with the same model and pull out the closest documents.

These three steps are exactly the foundation of implementing RAG. Measure and improve the accuracy of the search you build with AI evals.

Summary

Three takeaways on embeddings.

  • What it is: a technique that converts the "meaning" of words into a sequence of numbers (a vector). The closer the meaning, the closer the vector.
  • Its role: the foundation of RAG, semantic search, classification, dedup, and recommendations. It lets you work by "meaning," not exact word match.
  • How to start: begin easily with an API-type model. Store in a vector DB and search. Tune cost with Matryoshka.

Embeddings are the first step in building search and knowledge use with AI. Start by vectorizing two sentences with an embedding model and computing their closeness. Read RAG and how LLMs work alongside this for the full picture.

To push embedding-search precision even further, the next step is "reranking." Read what is reranking to learn how to reorder retrieved candidates by relevance and lift RAG accuracy.

FAQ

Q. What's the difference between an embedding and an LLM?

A. Different roles. An LLM is a model that generates text; an embedding model is a model that turns meaning into numbers. In RAG they cooperate: the embedding finds relevant documents, and the LLM turns the result into prose.

Q. Are more dimensions always better?

A. Not necessarily. More dimensions raise expressive power but also storage and search cost. With a Matryoshka-capable model you can cut dimensions while keeping quality nearly intact, making it easier to balance cost and accuracy.

Q. Is it free to use?

A. Open-source embedding models (like BGE-M3) are free. API types usually charge a small fee, but embeddings are far cheaper than generation. Starting with a free tier or a small dataset is recommended.

Q. Do I need a vector DB?

A. For small amounts you can search with plain computation, but as documents grow a dedicated vector DB becomes practical. Options range from easy ones like Chroma to add-ons like pgvector for an existing DB, so you can choose by scale.