Snowflake Cortex AI to Host Llama 3.1 Collection of Multilingual Open Source LLMs
July 23, 2024

Snowflake will host the Llama 3.1 collection of multilingual open source large language models (LLMs) in Snowflake Cortex AI for enterprises to easily harness and build powerful AI applications at scale.

This offering includes Meta’s largest and most powerful open source LLM, Llama 3.1 405B, with Snowflake developing and open sourcing the inference system stack to enable real-time, high-throughput inference and further democratize powerful natural language processing and generation applications. Snowflake’s AI Research Team has optimized Llama 3.1 405B for both inference and fine-tuning, supporting a massive 128K context window from day one, while enabling real-time inference with up to 3x lower end-to-end latency and 1.4x higher throughput than existing open source solutions. Moreover, it allows for fine-tuning on the massive model using just a single GPU node — eliminating costs and complexity for developers and users — all within Cortex AI.

By partnering with Meta, Snowflake is providing customers with easy, efficient, and trusted ways to seamlessly access, fine-tune, and deploy Meta’s newest models in the AI Data Cloud, with a comprehensive approach to trust and safety built-in at the foundational level.

“Snowflake’s world-class AI Research Team is blazing a trail for how enterprises and the open source community can harness state-of-the-art open models like Llama 3.1 405B for inference and fine-tuning in a way that maximizes efficiency,” said Vivek Raghunathan, VP of AI Engineering, Snowflake. “We’re not just bringing Meta’s cutting-edge models directly to our customers through Snowflake Cortex AI. We’re arming enterprises and the AI community with new research and open source code that supports 128K context windows, multi-node inference, pipeline parallelism, 8-bit floating point quantization, and more to advance AI for the broader ecosystem.”

Snowflake’s AI Research Team continues to push the boundaries of open source innovations through its regular contributions to the AI community and transparency around how it is building cutting-edge LLM technologies. In tandem with the launch of Llama 3.1 405B, Snowflake’s AI Research Team is now open sourcing its Massive LLM Inference and Fine-Tuning System Optimization Stack in collaboration with DeepSpeed, Hugging Face, vLLM, and the broader AI community. This breakthrough establishes a new state-of-the-art for open source inference and fine-tuning systems for multi-hundred billion parameter models.

Snowflake’s Massive LLM Inference and Fine-Tuning System Optimization Stack uses advanced parallelism techniques and memory optimizations, enabling fast and efficient AI processing, without needing complex and expensive infrastructure. For Llama 3.1 405B, Snowflake’s system stack delivers real-time, high-throughput performance on just a single GPU node and supports a massive 128k context windows across multi-node setups. This flexibility extends to both next-generation and legacy hardware, making it accessible to a broader range of businesses. Moreover, data scientists can fine-tune Llama 3.1 405B using mixed precision techniques on fewer GPUs, eliminating the need for large GPU clusters. As a result, organizations can adapt and deploy powerful enterprise-grade generative AI applications easily, efficiently, and safely.

Snowflake’s AI Research Team has also developed optimized infrastructure for fine-tuning inclusive of model distillation, safety guardrails, retrieval augmented generation (RAG), and synthetic data generation so that enterprises can easily get started with these use cases within Cortex AI.

Snowflake is making Snowflake Cortex Guard generally available to further safeguard against harmful content for any LLM application or asset built in Cortex AI — either using Meta’s latest models, or the LLMs available from other leading providers including AI21 Labs, Google, Mistral AI, Reka, and Snowflake itself. Cortex Guard leverages Meta’s Llama Guard 2, further unlocking trusted AI for enterprises so they can ensure that the models they’re using are safe.

Share this

Industry News

October 17, 2024

Progress announced the latest release of Progress® Flowmon®, the network observability platform with AI-powered detection for cyberthreats, anomalies and fast access to actionable insights for greater network and application performance across hybrid cloud ecosystems.

October 17, 2024

Mirantis announced the release of Mirantis OpenStack for Kubernetes (MOSK) 24.3, which delivers enterprise-ready and fully supported OpenStack Caracal, featuring enhancements tailored for artificial intelligence (AI) and high-performance computing (HPC).

October 17, 2024

StreamNative announced a managed Apache Flink BYOC product offering will be available to StreamNative customers in private preview.

October 17, 2024

Gluware announced a series of new offerings and capabilities that will help network engineers, operators and automation developers deliver network security, AI-readiness, and performance assurance better, faster and more affordably, using flawless intent-based intelligent network automation.

October 17, 2024

Sonar released SonarQube 10.7 with AI-driven features and expanded support for new and existing languages and frameworks.

October 16, 2024

Red Hat announced a collaboration with Lenovo to deliver Red Hat Enterprise Linux AI (RHEL AI) on Lenovo ThinkSystem SR675 V3 servers.

October 16, 2024

mabl announced the general availability of GenAI Assertions.

October 16, 2024

Amplitude announced Web Experimentation – a new product that makes it easy for product managers, marketers, and growth leaders to A/B test and personalize web experiences.

October 16, 2024

Resourcely released a free tier of its tool for configuring and deploying cloud resources.

October 15, 2024

The Cloud Native Computing Foundation® (CNCF®), which builds sustainable ecosystems for cloud native software, announced the graduation of KubeEdge.

October 15, 2024

Perforce Software announced its AI-driven strategy, covering four AI-driven pillars across the testing lifecycle: test creation, execution, analysis and maintenance, across all main environments: web, mobile and packaged applications.

October 15, 2024

OutSystems announced Mentor, a full software development lifecycle (SDLC) digital worker, enabling app generation, delivery, and monitoring, all powered by low-code and GenAI.

October 15, 2024

Azul introduced its Java Performance Engineering Lab, which collaborates with global Java developers and customers’ technical teams to deliver enhanced Java performance through continuous benchmarking, code modernization recommendations and in-depth analysis of performance impacts from new OpenJDK releases.

October 10, 2024

AWS has added support for Valkey 7.2 on Amazon ElastiCache and Amazon MemoryDB, a fully managed in-memory services.

October 10, 2024

MineOS announced a major upgrade: Data Subject Request Management (DSR) 2.0.