Snowflake AI Research

Snowflake AI Research

We are a team with extensive experience building systems and technology that have significantly reduced the cost of LLM training and inference. A lot of our work has been open-sourced to provide the AI community with more accessible and cost-effective LLMs. The team includes many specialists in natural language processing and search. With the help of thousands of engineers worldwide at Snowflake, our cutting-edge technology powers enterprise AI products in Cortex AI and more. Check out what we're working on: https://www.snowflake.com/en/product/ai/ai-research/

Gen AI

Inside Snowflake Intelligence: Five Pillars of Enterprise-Grade Agentic AI

Explore the underlying architecture, orchestration, and system-level optimizations behind Snowflake Intelligence, a production-grade agentic AI system built for enterprise reasoning.

Yuxiong He|Zhewei Yao|Boyi Liu|Hao Zhang|Harshal Pimpalkhute|Anupam Datta|Samyam Rajbhandari|Snowflake AI Research

JUN 03, 2025|13 min read

MORE POSTSFROM Snowflake AI Research

Smaller Models, Smarter SQL: Arctic-Text2SQL-R1 Tops BIRD and Wins Broadly

A deep dive into how Snowflake AI built Arctic-Text2SQL-R1 using simple rewards, strong reasoning, and a scalable approach to real-world SQL generation.

Snowflake AI Research

MAY 29, 2025|14 min read

Arctic Inference with Shift Parallelism: The Fastest Open Source Inference System for Enterprise AI

Built by Snowflake AI Research, Arctic Inference uses Shift Parallelism, SwiftKV, and speculative decoding to power the fastest open-source enterprise AI.

Samyam Rajbhandari

Mert Hidayetoglu

Snowflake AI Research

MAY 29, 2025|15 min read

Scaling vLLM for Embeddings: 16x Throughput and Cost Reduction

Learn how we increased embedding throughput 3x in Snowflake Cortex—and 16x vs. vLLM—through smarter serialization, tokenization, and GPU optimization.

Samyam Rajbhandari

Snowflake AI Research

MAY 29, 2025|8 min read

Fastest Speculative Decoding in vLLM with Arctic Inference and Arctic Training

How we enhanced speculative decoding to get 4x faster end-to-end task completion for LLM agents and up to 2.8x faster decoding for conversational, interactive and coding workloads.

Snowflake AI Research

MAY 01, 2025|18 min read

Evaluating Multimodal vs. Text-Based Retrieval for RAG with Snowflake Cortex

Discover how multimodal retrieval on Snowflake Cortex transforms enterprise PDF search, enhancing accuracy and speed across complex document formats.

Snowflake AI Research

APR 21, 2025|8 min read

Low-Latency and High-Throughput Inference for Long Context with Sequence Parallelism (aka Arctic Ulysses)

Ulysses, a novel sequence parallelism technique, boosts long-context LLM inference performance with 3.4x lower latency and better GPU efficiency.

Mert Hidayetoglu

Samyam Rajbhandari

Snowflake AI Research

APR 03, 2025|14 min read

Think. Execute. Excel: Arctic Text2SQL with Execution-Guided CoT

Learn how Snowflake’s ExCoT optimizes Text2SQL with execution-guided CoT and DPO, setting a new benchmark in natural language to SQL accuracy.

Snowflake AI Research

APR 02, 2025|10 min read

Snowflake Arctic Embed Joins ArcticTraining: Simple And Scalable Embedding Model Training

Arctic Embed now merges with ArcticTraining, giving developers open access to core training code for building efficient frontier embedding models.

Snowflake AI Research

MAR 25, 2025|10 min read

Arctic Agentic RAG Episode 1: Agentic Query Clarification for Grounded and Speedy Responses

Discover how Arctic Agentic RAG improves AI accuracy with agentic query clarification, delivering grounded, speedy responses for enterprise AI applications.

Snowflake AI Research

FEB 18, 2025|9 min read

Previous

1

2

3

4

Next