Yuxiong He

Yuxiong He

Distinguished AI Software Engineer, Snowflake
Yuxiong He is a Distinguished AI Engineer at Snowflake, spearheading the development and research of Large Language Models (LLMs). As a pivotal co-leader of the Arctic project, she collaborates with a team of exceptional AI professionals to develop the Snowflake suite of foundational models. Her dedication to innovation is matched by her commitment to open source and open research, striving to build transformative and high-performing AI technologies. Previously, Yuxiong held the position of Partner Research and Product Manager at Microsoft, where she co-founded and led the DeepSpeed project. This industry-leading, open-source deep learning optimization library introduced groundbreaking innovations like ZeRO, 3D parallelism, and ZeroQuant. These advancements have significantly accelerated and democratized the training and inference processes of cutting-edge LLMs, making them more accessible to everyone in need. Yuxiong has published over 100 papers in major computer science conferences and journals. Her work has been recognized among the best papers at esteemed venues such as SIGIR, ICDE, WSDM, and Middleware, and her research continues to be widely applied in diverse systems and products.

MORE POSTSFROM Yuxiong He

Gen AI

Smaller Models, Smarter SQL: Arctic-Text2SQL-R1 Tops BIRD and Wins Broadly

A deep dive into how Snowflake AI built Arctic-Text2SQL-R1 using simple rewards, strong reasoning, and a scalable approach to real-world SQL generation.
||
MAY 29, 2025|14 min read
Gen AI

Arctic Inference with Shift Parallelism: The Fastest Open Source Inference System for Enterprise AI

Built by Snowflake AI Research, Arctic Inference uses Shift Parallelism, SwiftKV, and speculative decoding to power the fastest open-source enterprise AI.
||||||||
MAY 29, 2025|15 min read
Gen AI

Scaling vLLM for Embeddings: 16x Throughput and Cost Reduction

Learn how we increased embedding throughput 3x in Snowflake Cortex—and 16x vs. vLLM—through smarter serialization, tokenization, and GPU optimization.
||||||
MAY 29, 2025|8 min read
Gen AI

Low-Latency and High-Throughput Inference for Long Context with Sequence Parallelism (aka Arctic Ulysses)

Ulysses, a novel sequence parallelism technique, boosts long-context LLM inference performance with 3.4x lower latency and better GPU efficiency.
|||||
APR 03, 2025|14 min read
Gen AI

Think. Execute. Excel: Arctic Text2SQL with Execution-Guided CoT

Learn how Snowflake’s ExCoT optimizes Text2SQL with execution-guided CoT and DPO, setting a new benchmark in natural language to SQL accuracy.
||||
APR 02, 2025|10 min read
Gen AI

Introducing Arctic Agentic RAG: Smarter, Faster and More Reliable AI for Enterprise

Arctic Agentic RAG — Snowflake’s next-gen AI framework for enterprise retrieval-augmented generation, delivering smarter, faster, and more reliable AI insights.
|
FEB 18, 2025|5 min read
Digital illustration of connected lines and dots in a column lined with grids
Product and Technology

SwiftKV from Snowflake AI Research Reduces Inference Costs of Meta Llama LLMs up to 75% on Cortex AI

SwiftKV optimizes Meta Llama LLMs on Snowflake Cortex AI, reducing inference costs by up to 75% while maintaining accuracy for enterprise AI solutions.
||||
JAN 16, 2025|5 min read

Where Data Does More

  • 30-day free trial
  • No credit card required
  • Cancel anytime