“Vector Search” is the buzzword of the AI era. But for many SQL-native data engineers, it can feel like black magic.
We’re used to exact matches (WHERE id = 123) or fuzzy string matches (LIKE '%text%'). Vector search is fundamentally
different: it searches for semantic similarity.
The Concept: Embeddings#
Computers don’t understand words; they understand numbers. An Embedding Model transforms a piece of text (sentence, paragraph, or document) into a list of floating-point numbers (a vector).
Example:
- “The cat sat on the mat” ->
[0.1, 0.5, -0.3, ...] - “The feline rested on the rug” ->
[0.12, 0.48, -0.29, ...]
In the vector space, these two arrays are mathematically “close” to each other (using Cosine Similarity), even though they share very few common words.
Native Vector Data Type#
Snowflake added the VECTOR data type to support this.
CREATE TABLE docs (
id int,
content text,
embedding VECTOR(FLOAT, 768) -- 768 dimensions is common for standard models
);sqlYou can generate these chunks using SNOWFLAKE.CORTEX.EMBED_TEXT_768('snowflake-arctic-embed-m', content).
Cortex Search Services#
While you can manage vectors manually, managing the index for fast retrieval at scale is hard. Enter Cortex Search.
Cortex Search is a managed service. You point it at a table, tell it which column contains the text, and it handles the indexing, embedding updates, and retrieval.
-- Conceptual service creation
CREATE CORTEX SEARCH SERVICE my_search_svc
ON description
ATTRIBUTES products
WAREHOUSE = my_wh
TARGET_LAG = '1 hour'
AS SELECT * FROM product_catalog;sqlWhy not just use ElasticSearch?#
Integration and Governance. By keeping the vectors in Snowflake:
- Zero ETL: Data doesn’t leave the platform.
- Security: Row-level security on the base table applies to the search results.
- Simplicity: It’s just SQL.
Conclusion#
Vector search enables capabilities like “Find me products similar to this image” or “Find clauses in contracts that
mention liability.” It’s a new primitive in the data engineer’s toolkit, unlocking use cases that LIKE '%...%' could
never dream of.