Gen AI

JUN 03, 2025|13 min read

Inside Snowflake Intelligence: Five Pillars of Enterprise-Grade Agentic AI

Data is everywhere, but for most business users, getting timely, actionable answers remains a struggle. Insights are buried across dashboards, documents, CRMs and chat systems — turning even basic questions like, Why are sales down? into time-consuming, cross-functional efforts.

Snowflake Intelligence, in public preview soon, will bridge this gap with a unified, agentic interface. Users can ask complex questions in natural language, analyze both structured and unstructured data, and take direct actions — from triggering workflows to updating records — all in one seamless user interface.

At the core of this experience is a production-grade agentic system developed by Snowflake AI Research. It seamlessly integrates structured and unstructured data, plans multi-step tasks, and orchestrates tools in real time to deliver grounded, reliable answers from your data. Purpose-built for the enterprise, it provides a fast, reliable and transparent experience — enabling users to make confident, data-driven decisions at scale.

This agentic system is supported by five tightly integrated pillars:

Agentic orchestration coordinates reasoning, planning, query decomposition and tool use to execute complex tasks end-to-end.
Structured data intelligence translates natural language into precise, executable queries over complex schemas and metrics.
Unstructured data intelligence grounds answers in domain-specific context from documents, transcripts and enterprise content.
Observability and trust support make agent decisions explainable, auditable and open to continuous improvement.
Inference systems and optimizations enable fast, scalable and efficient execution through state-of-the-art innovations in decoding and memory management.

Together, these pillars transform Snowflake Intelligence from a simple interface into a reasoning and action system, purpose-built for the enterprise.

1. Agentic orchestration: planning, adapting and composing tools in real time

The orchestration engine of Snowflake Intelligence is Cortex Agents, which powers complex reasoning workflows. Cortex Agents are built on an agentic orchestration system, developed by Snowflake AI Research, that enables the core technology for planning, reasoning and tool orchestration across enterprise tasks.

Most enterprise questions require combining insights from structured databases, (unstructured) internal documents and third-party systems. Cortex Agents handle this by coordinating tools like Cortex Analyst (for structured SQL reasoning), Cortex Search (for unstructured retrieval), and visualization components in real time — all within a reasoning loop that continuously updates as new context becomes available.

To understand how this works in practice, consider the following example. When asked, Why did product views drop on April 5?, a Cortex Agent may first query engagement metrics using Cortex Analyst, then retrieve supporting documentation with Cortex Search, and finally check external context (e.g., calendar data) — composing multiple tools into a grounded reasoning workflow.

Figure 1: The agentic orchestration system dynamically plans and executes a multi-tool workflow in response to a complex, open-ended question.

The agentic orchestration system, operationalized through Cortex Agents, follows a sophisticated reasoning pipeline:

1. Planning: Dynamic, multi-hop strategy execution
At the center of every interaction are Cortex Agents, powered by a coordination loop between a large reasoning model (LRM) and a large language model (LLM).

Cortex Agents initiate each task and:

Interpret and resolve ambiguities in the user’s query
Select the appropriate tools and determine their execution order
Decompose the query into tool-specific subtasks
Define how outputs will be composed into a coherent final response

But this plan isn’t static. As intermediate results arrive, the Cortex Agent updates its approach in real time. For instance, it may begin with Cortex Search for background context, pivot to Cortex Analyst for structured analysis, and then adapt downstream actions based on what it uncovers. This dynamic, multi-hop reasoning enables fluid responses to evolving questions.

2. Context-aware execution: Reasoning with continuity
Enterprise tasks evolve, and the Cortex Agent must adapt. Each tool call is informed by the broader reasoning context — including the original query, planner intent, prior steps and intermediate findings. This structured context-sharing helps ensure that every step aligns with the overall objective, enabling coherent, goal-driven responses even in open-ended workflows.

3. Modular extensibility: Seamless integration of new capabilities
The agentic orchestration system is designed for growth. New tools, skills and domain-specific capabilities can be added without rearchitecting the system. For example, integrating medical imaging tools enabled the agentic system to perform high-quality reasoning on diagnostic tasks — demonstrating how easily it can scale to new domains and tasks with minimal effort (see paper).

This orchestration engine — Cortex Agents — is what enables Snowflake Intelligence to reason, act and adapt in real time. The following four supporting pillars further strengthen the system, enabling production-grade performance with intelligence, trust and speed in enterprise environments.

2. Structured data intelligence: Cortex Analyst with semantic models for accurate adaptable SQL

Enterprise data often lives in large, evolving databases with inconsistent naming, evolving schemas, limited documentation and intricate business logic. In short, real-world databases are messy. Table names are inconsistent; documentation is sparse; and analytics workflows often require multi-table joins and complex filters.

To handle this complexity, agents need more than mere language skills; they need schema understanding. Cortex Agents use Cortex Analyst as a tool to translate natural language into accurate, executable SQL across complex enterprise databases, combining deep language understanding with schema-level awareness.

At the core of Cortex Analyst is our agentic semantic modeling approach. These models automatically explore database schemas, learn structural patterns and build internal representations that map language to meaning — even in poorly documented or nonstandard environments. This structured understanding allows agents to produce reliable SQL, grounded in the actual schema and business semantics of enterprise data.

When we tested across four production data sets, on average, our agentic semantic models improved accuracy by more than 20%, as compared to agents without schema understanding. For technical details, see our post on agentic semantic models for Text-to-SQL.

Figure 2: Results from our testing across four data sets revealed that Cortex Analyst improved Text-to-SQL accuracy by more than 20%, on average, as compared to agents without schema understanding.

We’re actively extending this pillar with new capabilities, including agentic planning and execution verification, inspired by our ReFoRCE research. While not yet available in-product, these innovations represent the direction of our research, as we look to further enhance agent reliability and accuracy in complex database environments.

3. Unstructured data intelligence: Cortex Search to ground agents in enterprise knowledge

Enterprise knowledge is spread across PDFs, emails, wikis, slide decks and logs, often with inconsistent formatting and implicit context. Answering real questions means going beyond keyword search. Agents must retrieve the right information, reason over it and ground their answers in trusted data.

To meet this need, Cortex Agents use Cortex Search as a tool to perform intelligent search over unstructured content. Snowflake AI Research brings three core technologies into Cortex Search and Cortex Agents — enabling agents to resolve ambiguity; reason across multiple hops; and interpret complex, multimodal documents.

Together, these capabilities transform fragmented enterprise content into actionable, verifiable knowledge that powers grounded, high-quality answers in Snowflake Intelligence.

A. Verified Diversification clarifies ambiguous queries

A key challenge in enterprise AI is handling vague or ambiguous questions. Take, for example, the question: What is HP? When asked in the context of an engineering repository, “HP” could refer to Hewlett-Packard, horsepower or perhaps something domain-specific.

Traditional LLM approaches try to guess multiple interpretations. Without grounding, this often results in irrelevant or brittle answers. In some cases, you might even get “Harry Potter” as a suggestion, clearly off the mark in an enterprise setting!

To address this, Snowflake AI Research developed a new approach, called Verified Diversification. This method delivers higher recall, stronger grounding and far fewer irrelevant answers by combining semantic retrieval with in-context verification.

Here’s how it works:

First, we relax the query and retrieve passages that reflect multiple plausible meanings — but drawn from the user’s own enterprise content. For example, with ambiguous acronyms (“HP”), this might surface documents referring to both “Hewlett-Packard” and “horsepower.”
Then, we verify and ground each interpretation against context — narrowing the options to the most relevant and eliminating hallucinations or false guesses.

Figure 3: Illustrates how our verified diversification approach replaces vague LLM guesses with citation-backed answers to ambiguous queries like, “What is hp?”

In our internal evaluations, Verified Diversification improved groundedness by 1.8x across both Llama 3.3 70B and GPT-4o, when compared to baseline approaches. Learn more about how this works in our blog post, Arctic Agentic RAG Episode 1: Agentic Query Clarification for Grounded and Speedy Responses.

Figure 4: According to our internal evaluations, Verified Diversification improved groundedness by 1.8x over baseline approaches on Llama 3.3 70B and GPT-4o.

B. Multi-hop query-reasoning builds a chain of evidence

A key challenge in enterprise AI is tackling questions too complex for a single search. Take the following example: Who was the sales lead for Acme Inc when we renewed their contract?

Traditional LLM approaches often stumble on such multi-step queries. They might try to collapse the reasoning into one go, guessing at connections or missing critical context along the way. This can lead to brittle answers or even outright hallucinations.

To address this, Snowflake AI Research developed a multi-hop query-reasoning method. This enables Cortex Agents to decompose complex questions, retrieve and verify intermediate facts, and build a transparent chain of reasoning.

Here’s how it works (see Figure 5 below):

First, Agents break down the complex question into a series of simpler sub-questions. For the Acme Inc. example, this would mean first determining when the contract was renewed (e.g., June 2018), then retrieving data about sales leads for the Acme account during that period.
Next, information is retrieved and verified at each stage. The Agent then adaptively chains these verified pieces of information — identifying the sales lead (e.g., John Smith) — to synthesize a final, grounded answer (e.g., “The sales lead was John Smith”). This method improves not only accuracy but also explainability. Users can trace how an answer was built, and developers can debug or improve logic at any stage — essential for establishing trust in enterprise environments. We cover our research in the ComposeRAG paper.

Figure 5: Illustration of multi-hop question answering, decomposing complex queries into simple steps to trace and verify accurate answers.

C. Multimodal retrieval provides more contextual understanding

Enterprise documents are rich in charts, tables and visual layouts — formats where meaning comes from text, as well as structure and presentation. Invoices, manuals and slide decks often communicate critical context visually, making traditional text-only retrieval inadequate.

The key insight from our research: No single retrieval method is enough. Pure semantic vectors may miss exact matches. Keyword search lacks contextual depth. Visual signals alone don’t capture meaning.

Cortex Search uses a hybrid strategy that combines:

Multimodal vectors (VM3) to capture layout, visual structure and embedded text
Keyword search on Cortex Search to surface high-precision candidates
A lightweight reranker to score and select the most relevant passages from multiple retrieval signals

As shown in Figure 6 below, combining these signals leads to significant gains in recall across a variety of document types — including tech manuals and chart-heavy PDFs.

Figure 6: Augmenting multimodal vector retrieval with keyword search and neural reranking on Cortex Search leads to significant quality improvement.

This hybrid stack significantly outperforms any individual method, confirming that multimodal retrieval isn’t optional — it’s necessary for accurate enterprise QA. Learn more about how this works in the blog post, Evaluating Multimodal vs. Text-Based Retrieval for RAG with Snowflake Cortex.

4. Observability and trustworthiness: Building confidence in agentic AI

In enterprise AI, intelligence isn’t enough; systems must be transparent, verifiable and cost-aware. That’s why observability is built into Snowflake Intelligence from Day 1, enabling users and developers to trust, debug and govern AI-driven processes with confidence.

Every step of the system’s reasoning — from natural language rewrites to tool execution and final responses — is captured in structured traces. Built on OpenTelemetry, Snowflake Intelligence provides a unified view of performance, cost, latency, groundedness and more.

This tracing is:

Dynamic: Captures evolving plans and tool calls in real time
Language-agnostic: Works across Go, Java and Python services
Semantically grouped: Clusters steps like subqueries and charting into coherent reasoning units

We integrate AI observability — powered by TruLens open source tools — so every AI interaction is structured, inspectable and ready for analysis.

Whether you're troubleshooting a workflow, reviewing output quality or auditing for compliance, this level of visibility makes it easier to debug, tune and govern AI systems at scale.

Looking ahead, we’re expanding observability with agentic evaluation, a new capability for measuring answer quality, step correctness and execution efficiency.

5. System optimizations: Efficient agentic inference at enterprise scale

To support efficient complex agentic workloads at scale, Cortex AI integrates a suite of system-level optimizations and techniques, developed by Snowflake AI Research, that improve throughput, reduce latency and reduce costs.

Whether you're running embedding or generative models like Llama 3.3 70B or Mistral Large 2, these optimizations enhance Cortex Agents for planning, SQL generation and conversational reasoning. At the same time, we’re contributing these advances back to the broader community through the open source Arctic Inference system. Our goal is to support enterprise-grade performance both within Snowflake and across the AI ecosystem.

We apply both agent-specific techniques and general-purpose inference improvements:

Prefix caching improves efficiency in agent workflows, especially in planning loops and iterative verification stages.
SwiftKV improves production throughput by 2x by reusing intermediate transformer states to reduce redundant computation in long-context tasks. (Read more)
Speculative decoding delivers 3-4x faster generation using draft-and-verify methods across tasks like SWE-Bench and HumanEval. (Read more)

These optimizations aren’t theoretical — together, they make Snowflake Intelligence one of the fastest agentic AI systems in production today.

Figure 7: Arctic Inference achieves highest throughput and lowest latency for Llama 3.3 70B across open source inference frameworks.

Figure 7 shows how Arctic Inference achieves both the highest throughput and the lowest latency when serving Llama 3.3 70B across batch, conversational and agentic requests¹ — outperforming other leading open source inference systems on real-world enterprise workloads. (Read more about the system innovations behind this performance in our latest blog post, Arctic Inference with Shift Parallelism.)

We’re building state of the art, end-to-end agentic systems

Snowflake Intelligence is the result of deep, cross-disciplinary engineering to operate reliably in the face of real-world complexity, diverse data, evolving schemas, ambiguous queries and high expectations for trust.

We’re excited to keep pushing the frontier of agentic AI and to build with those who share the same goal: systems that are powerful, transparent and production-ready.

Build with us

Much of this infrastructure is open source, including Arctic Inference and TruLens, to be shared with the broader community. Try it out, build with it, and share what you’re working on! Join the Snowflake AI Research community!

Contributors

This blog was supported by editorial contributions from Danmei Xu, Canwen Xu, Puxuan Yu, Youngwon Lee, Ruofan Wu, Shayak Sen, David Kurokawa, Aurick Qiao, Krista Muir, Tom Zayats and Kelvin So.

We also acknowledge the many colleagues across Snowflake AI Research; the broader AI and engineering teams; and our academic collaborators, who contributed to research, product development and experimentation, that made this work possible.

Forward Looking Statements

This article contains forward-looking statements, including about our future product offerings, and are not commitments to deliver any product offerings. Actual results and offerings may differ and are subject to known and unknown risk and uncertainties. See our latest 10-Q for more information.

¹ Latency-optimized and throughput-optimized configurations for vLLM, SGLang and TRT-LLM use TP=8 and DP=1 and TP=1 and DP=8, respectively, along with the best available speculative decoding for each framework. These experiments were run on data sets generated using real-world production traces to compute throughput, and a mixture of ShareGPT, HumanEval and SWEBench to measure latency. As a result, these results are representative of performance achievable in real-world deployments. For more details, see the evaluation methodology in the appendix.