Meta Sought Funds for Llama AI Model Development from Amazon and Microsoft

Facing high AI expenses, Meta has reportedly sought funding from competitors for its Llama models, offering influence on features in return.

Even Meta Platforms isn’t immune to the staggering costs of the AI race. The company spent parts of the last year approaching competitors, including Microsoft, Amazon, and others, seeking financial help to train its flagship Llama large language models, according to four individuals briefed on the discussions reported by The Information.

These overtures, reportedly dubbed the “Llama Consortium” pitch, were driven by apprehension within Meta about the escalating resources needed for its artificial intelligence development, two people said. As a sweetener, Meta apparently discussed giving potential financial backers a say in Llama’s future feature development.

Sources suggest the initial reaction to Meta’s proposal was lukewarm, and it’s uncertain if any formal funding deals were struck. Still, the attempt reveals the intense financial burden involved in building leading AI systems, putting pressure even on companies with Meta’s deep pockets and signalling the high stakes in generative AI.

Llama 4 – Meta’s Latest Models

Meta’s search for funding partners casts its recent Llama 4 announcement in a new light. That release introduced Llama 4 Scout (109B total parameters, 17B active) aimed at single-GPU use with an exceptionally large 10 million token context window – capable of processing roughly 7.5 million words at once.

It also unveiled the much larger Llama 4 Maverick (400B total parameters, 17B active, 128 experts) for bigger workloads. Both employ a Mixture-of-Experts (MoE) architecture, a technique using specialized sub-networks (‘experts’) where only the necessary ones are activated per task, aiming for greater efficiency during operation compared to dense models where all parameters are always used.

They were also built with native multimodality, handling text and images together using early fusion from the pretraining stage, rather than adding image capabilities later.

Underpinning these is the yet-unreleased Llama 4 Behemoth, a 2-trillion parameter model used internally for distillation (teaching smaller models), which required training across up to 32,000 GPUs. Meta employed techniques like FP8 precision – a lower-precision number format that speeds up calculations – and novel architectural components like interleaved rotary positional embeddings (iRoPE) to handle long sequences effectively.

Building, training, and refining models of this scale and complexity—integrating MoE, multimodality, advanced positional encoding, and achieving competitive benchmarks—inherently demands enormous computational power and engineering effort, directly explaining the potential need for shared investment. While MoE offers potential inference efficiency, the upfront training cost remains a significant factor.

Development Hurdles and Data Questions

Beyond raw compute, Meta dedicated resources to tuning Llama 4 for specific outputs and safety. The company publicly stated its goal was countering perceived political biases in LLMs, noting, “It’s well-known that all leading LLMs have had issues with bias—specifically, they historically have leaned left when it comes to debated political and social topics… This is due to the types of training data available on the internet.”

Meta claimed internal tests showed reduced refusal rates and ideological disparities on sensitive topics, alongside deploying safety tools like Llama Guard and the GOAT red-teaming system – a method of adversarial testing to find vulnerabilities. These fine-tuning and safety layers add further development overhead.

Potentially adding to Meta’s financial calculus are persistent legal questions about its training data, representing another facet of development challenges and costs. Active lawsuits, including one involving comedian Sarah Silverman, allege the company trained Llama models on massive datasets of pirated books sourced from libraries like LibGen via BitTorrent file-sharing. Court documents reportedly revealed internal apprehension, with one engineer quoted as saying, “Torrenting from a [Meta-owned] corporate laptop doesn’t feel right.”

Allegations surfaced in late March 2025 that Meta might have also re-uploaded roughly 30% of this data, potentially weakening ‘fair use’ arguments and increasing potential legal liability or the future cost of sourcing alternative, licensed data. Such controversies could represent a substantial, if less visible, driver of overall AI development expenses.

Strategic Plays in a Competitive Field

Meta’s funding outreach aligns with its clear strategy to make Llama central to its operations. The models were integrated into Meta AI features across WhatsApp, Instagram, and Facebook shortly after launch. They were also made available for download and via cloud partners – including Amazon SageMaker JumpStart and Microsoft’s Azure AI Foundry and Azure Databricks – though notably under a custom commercial license, not a typical open-source one. This controlled release strategy keeps Meta involved in Llama’s deployment, balancing openness with commercial interests.

Further underscoring Meta’s focus on its own AI was its move, reported around to block Apple’s system-wide Apple Intelligence features within Meta’s iOS apps. This prevents iPhone users from using Apple’s AI writing tools or Genmoji inside Facebook or Instagram, pushing them towards Meta’s Llama-based alternatives instead.

This competitive maneuver happened despite earlier, unsuccessful talks in mid-2024 about a potential AI partnership between Meta and Apple, reportedly ending over privacy disagreements. Meta’s approach also differs from Apple’s more privacy-focused, often on-device model, a distinction highlighted by Meta’s public discussion of tuning Llama 4’s political leanings and its simultaneous, controversial roll-back of third-party fact-checking in the US starting January 2025.

Meta plans to share more details at its LlamaCon event scheduled for April 29, potentially offering updates on the massive Behemoth model or the forthcoming Llama 4-V vision model.

Markus Kasanmascheff
Markus Kasanmascheff
Markus has been covering the tech industry for more than 15 years. He is holding a Master´s degree in International Economics and is the founder and managing editor of Winbuzzer.com.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x