The AI Blog
AI Technology

LLAMA 4 Maverick & Scout AI models are OUT!

Llama 4, developed by Meta, introduces a new auto-regressive Mixture-of-Experts (MoE) architecture.This generation includes two models:

The highly capable Llama 4 Maverick with 17B active parameters out of ~400B total, with 128 experts.
The efficient Llama 4 Scout also has 17B active parameters out of ~109B total, using just 16 experts.

Both models leverage early fusion for native multimodality, enabling them to process text and image inputs. Maverick and Scout are both trained on up to 40 trillion tokens on data encompassing 200 languages (with specific fine-tuning support for 12 languages including Arabic, Spanish, German, and Hindi).

For deployment, Llama 4 Scout is designed for accessibility, fitting on a single server-grade GPU via on-the-fly 4-bit or 8-bit quantization, while Maverick is available in BF16 and FP8 formats. These models are released under the custom Llama 4 Community License Agreement, available on the model repositories.

Evaluation results confirm the strength of these models, showing state-of-the-art performance that significantly outperforms predecessors like Llama 3.1 405B. For instance, on reasoning and knowledge tasks, the instruction-tuned Maverick achieves 80.5% on MMLU Pro and 69.8% on GPQA Diamond, while Scout scores 74.3% and 57.2% respectively.

10 million token context window,
multimodal superpowers,
and a 2-trillion-parameter behemoth in the works.

Pushing Llama to new sizes: The 2T Behemoth

We’re excited to share a preview of Llama 4 Behemoth, a teacher model that demonstrates advanced intelligence among models in its class. Llama 4 Behemoth is also a multimodal mixture-of-experts model, with 288B active parameters, 16 experts, and nearly two trillion total parameters. Offering state-of-the-art performance for non-reasoning models on math, multilinguality, and image benchmarks, it was the perfect choice to teach the smaller Llama 4 models. We codistilled the Llama 4 Maverick model from Llama 4 Behemoth as a teacher model, resulting in substantial quality improvements across end task evaluation metrics. We developed a novel distillation loss function that dynamically weights the soft and hard targets through training. Codistillation from Llama 4 Behemoth during pre-training amortizes the computational cost of resource-intensive forward passes needed to compute the targets for distillation for the majority of the training data used in student training. For additional new data incorporated in student training, we ran forward passes on the Behemoth model to create distillation targets.

Llama 4 Scout

Our smaller model, Llama 4 Scout, is a general purpose model with 17 billion active parameters, 16 experts, and 109 billion total parameters that delivers state-of-the-art performance for its class. Llama 4 Scout dramatically increases the supported context length from 128K in Llama 3 to an industry leading 10 million tokens. This opens up a world of possibilities, including multi-document summarization, parsing extensive user activity for personalized tasks, and reasoning over vast codebases.

Llama 4 Scout is both pre-trained and post-trained with a 256K context length, which empowers the base model with advanced length generalization capability. We present compelling results in tasks such as retrieval with “retrieval needle in haystack” for text as well as cumulative negative log-likelihoods (NLLs) over 10 million tokens of code.

Official Meta release with more details:

https://ai.meta.com/blog/llama-4-multimodal-intelligence/

LLAMA 4 Maverick & Scout AI models are OUT!

Llama 4, developed by Meta, introduces a new auto-regressive Mixture-of-Experts (MoE) architecture.This generation includes two models:

The highly capable Llama 4 Maverick with 17B active parameters out of ~400B total, with 128 experts.
The efficient Llama 4 Scout also has 17B active parameters out of ~109B total, using just 16 experts.

10 million token context window,
multimodal superpowers,
and a 2-trillion-parameter behemoth in the works.

Pushing Llama to new sizes: The 2T Behemoth

Llama 4 Scout

Official Meta release with more details:

https://ai.meta.com/blog/llama-4-multimodal-intelligence/

The AI Blog
AI Technology

LLAMA 4 Maverick & Scout AI models are OUT!

10 million token context window,
multimodal superpowers,
and a 2-trillion-parameter behemoth in the works.

Pushing Llama to new sizes: The 2T Behemoth

Llama 4 Scout

Official Meta release with more details:

LLAMA 4 Maverick & Scout AI models are OUT!

10 million token context window,
multimodal superpowers,
and a 2-trillion-parameter behemoth in the works.

Pushing Llama to new sizes: The 2T Behemoth

Llama 4 Scout

Official Meta release with more details:

The AI BlogAI Technology

LLAMA 4 Maverick & Scout AI models are OUT!

10 million token context window, multimodal superpowers, and a 2-trillion-parameter behemoth in the works.

Pushing Llama to new sizes: The 2T Behemoth

Llama 4 Scout

Official Meta release with more details:

LLAMA 4 Maverick & Scout AI models are OUT!

10 million token context window, multimodal superpowers, and a 2-trillion-parameter behemoth in the works.

Pushing Llama to new sizes: The 2T Behemoth

Llama 4 Scout

Official Meta release with more details:

The AI Blog
AI Technology

10 million token context window,
multimodal superpowers,
and a 2-trillion-parameter behemoth in the works.

10 million token context window,
multimodal superpowers,
and a 2-trillion-parameter behemoth in the works.