❄️
Data Flakes

Back

Last week we talked about SPCS use cases. Today, we implement the most exciting one: Hosting your own Large Language Model.

Why would you do this when Cortex exists?

  1. Custom Weights: You trained a Llama 3 model on your specific domain data using 10,000 internal documents.
  2. Versioning Control: You need to guarantee that the model behavior never changes, not even when the vendor updates the base model.
  3. Specialized Models: You need a niche model (e.g., Biology-BERT) not available in Cortex.

1. The Container#

We need a Python script that loads the model and serves it. Libraries like vLLM or HuggingFace TGI (Text Generation Inference) are excellent for this.

Dockerfile:

FROM nvcr.io/nvidia/pytorch:23.10-py3
# Using NVIDIA base image for CUDA support

RUN pip install vllm

COPY model_loader.py .
CMD ["python", "-m", "vllm.entrypoints.api_server", "--model", "mistralai/Mistral-7B-v0.1"]
dockerfile

2. The Compute Pool#

LLMs need GPUs. Standard warehouses won’t cut it.

CREATE COMPUTE POOL gpu_pool
  MIN_NODES = 1
  MAX_NODES = 1
  INSTANCE_FAMILY = GPU_NV_S; -- Small GPU instance type
sql

Warning: Compute pools bill you as long as they are running, even if idle. Be diligent about suspending them!

3. The Service Specification#

The YAML spec defines how Snowflake runs the container.

spec:
  containers:
  - name: llm-inference
    image: /db/schema/repo/my-llm:v1
    resources:
      requests:
        nvidia.com/gpu: 1
      limits:
        nvidia.com/gpu: 1
  endpoints:
  - name: api
    port: 8000
    public: false -- Only accessible internally
yaml

4. The Bridge#

To call this from SQL, we create a Service Function.

CREATE FUNCTION generate_text(prompt text)
  RETURNS text
  SERVICE = my_llm_service
  ENDPOINT = api
  AS '/generate';
sql

Conclusion#

Now you have a private, custom LLM running in your VPC. You can call SELECT generate_text('Hello') from any worksheet. This is ultimate power and flexibility, balanced with the responsibility of managing your own infrastructure (checking logs, managing memory, updating drivers).

Disclaimer

The information provided on this website is for general informational purposes only. While we strive to keep the information up to date and correct, there may be instances where information is outdated or links are no longer valid. We make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability, or availability with respect to the website or the information, products, services, or related graphics contained on the website for any purpose. Any reliance you place on such information is therefore strictly at your own risk.