❄️
Data Flakes

Back

“Vector Search” is the buzzword of the AI era. But for many SQL-native data engineers, it can feel like black magic. We’re used to exact matches (WHERE id = 123) or fuzzy string matches (LIKE '%text%'). Vector search is fundamentally different: it searches for semantic similarity.

The Concept: Embeddings#

Computers don’t understand words; they understand numbers. An Embedding Model transforms a piece of text (sentence, paragraph, or document) into a list of floating-point numbers (a vector).

Example:

  • “The cat sat on the mat” -> [0.1, 0.5, -0.3, ...]
  • “The feline rested on the rug” -> [0.12, 0.48, -0.29, ...]

In the vector space, these two arrays are mathematically “close” to each other (using Cosine Similarity), even though they share very few common words.

Native Vector Data Type#

Snowflake added the VECTOR data type to support this.

CREATE TABLE docs (
    id int,
    content text,
    embedding VECTOR(FLOAT, 768) -- 768 dimensions is common for standard models
);
sql

You can generate these chunks using SNOWFLAKE.CORTEX.EMBED_TEXT_768('snowflake-arctic-embed-m', content).

Cortex Search Services#

While you can manage vectors manually, managing the index for fast retrieval at scale is hard. Enter Cortex Search.

Cortex Search is a managed service. You point it at a table, tell it which column contains the text, and it handles the indexing, embedding updates, and retrieval.

-- Conceptual service creation
CREATE CORTEX SEARCH SERVICE my_search_svc
  ON description
  ATTRIBUTES products
  WAREHOUSE = my_wh
  TARGET_LAG = '1 hour'
  AS SELECT * FROM product_catalog;
sql

Why not just use ElasticSearch?#

Integration and Governance. By keeping the vectors in Snowflake:

  1. Zero ETL: Data doesn’t leave the platform.
  2. Security: Row-level security on the base table applies to the search results.
  3. Simplicity: It’s just SQL.

Conclusion#

Vector search enables capabilities like “Find me products similar to this image” or “Find clauses in contracts that mention liability.” It’s a new primitive in the data engineer’s toolkit, unlocking use cases that LIKE '%...%' could never dream of.

Disclaimer

The information provided on this website is for general informational purposes only. While we strive to keep the information up to date and correct, there may be instances where information is outdated or links are no longer valid. We make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability, or availability with respect to the website or the information, products, services, or related graphics contained on the website for any purpose. Any reliance you place on such information is therefore strictly at your own risk.