Large Language Models (LLMs) — Latest News and Analysis
News on large language models, foundation model releases, benchmarks, and LLM-powered applications.
The Latest News About Large Language Models
Z.ai's GLM-5.2 models takes the lead among open-weight models on Artificial Analysis' index, with public weights, a 1M-token window, and deployment caveats for coding teams.
Google has introduced DiffusionGemma to speed local AI output through parallel text diffusion, but lower quality than Gemma 4 keeps trade-offs visible.
Anthropic has apologized for invisible Claude Fable 5 safeguards and will show fallback notices after hidden output changes threatened AI model evaluations.
Anthropic has launched Claude Fable 5, bringing Mythos-class AI to regular Claude users with safety routing, a discounted June 22 access window, and usage-credit pricing.
OpenAI has filed a confidential draft S-1 for a possible IPO, leaving timing open as employee share-sale and Anthropic competition pressure starts to build.
A flaw in Meta's AI-assisted Instagram recovery exposed 20,225 accounts, letting attackers redirect password resets and forcing June 19 security notices to users.
OpenAI's planned ChatGPT super app is expected to put agents and Codex inside one hub, turning free prompts into paid tool paths as rivals push competing agent platforms.
DeepSeek has topped Ramp's June AI vendor list as US firms are increasingly betting on cheaper models.
xAI appears to have used a workaround to train its Grok AI with outputs of Anthropic's Claude model after an Anthropic access cutoff in January.
AI leaders have backed DNA and RNA screening rules that would make gene synthesis sellers verify customers and orders before risky designs reach labs.
Sakana AI has opened a Recursive Self-Improvement Lab to test whether AI can cut compute dependence.
Mathematicians warn in the Leiden Declaration AI proof tools could strain peer review, credit and verification.
OpenAI has expanded its genomics and drug discovery model GPT-Rosalind with life-sciences plugins and controlled access.
Microsoft’s in-house MAI-Thinking-1 faces scrutiny over Common Crawl and public-web training data despite its pitch about clean, commercially licensed data.
Anthropic says Claude now authors over 80% of Anthropic production code, shifting risk from writing software to reviewing AI-made changes before they ship inside live systems.
OpenAI has expanded ChatGPT memory, giving Plus and Pro users in the US editable summaries as Free and Go accounts wait for a global rollout in the coming weeks.
Tencent is reportedly developing a WeChat AI agent that would use mini programs to complete in-app tasks, with review and external tests still ahead.
Google has released Gemma 4 12B, a local multimodal AI model for laptops that tests whether audio, images, code, and tool calls fit in 16GB memory locally.
Researchers built a contained AI powered malware worm that adapts attacks across lab hosts, exposing how local open-weight models complicate malware containment.
Meta AI Support abuse exposed an Instagram recovery gap that let hackers change emails, reset passwords, and briefly seize high-profile accounts before a patch.
Anthropic has disclosed a 31.5% prompt-injection success rate for Claude's browser agent before safeguards, showing how hostile web instructions can reach live tools.
Mistral has rebranded its Le Chat AI assistant as Vibe, folding work automation and remote coding into one AI agent with cloud sandboxes, connectors and tiered pricing.
LiveBrowseComp benchmark results suggest AI search agents often verify hunches instead of fresh web evidence, raising new doubts about benchmark scores for browsing skill.
MiniMax is pushing M3 into the long-context model race with multimodal input and a claimed 1 million-token window.
Rising token-driven AI bills are pushing more and more companies to ration access, track usage, and steer workers toward cheaper tools.
Terence Tao argues AI could split math research into specialized roles if verification keeps pace and human reviewers filter weak ideas before they spread.
Google has tweaked Gemini quota rules after paid users hit five-hour walls after a just a few minutes, capping single-request usage and excluding failed jobs.
Tencent has expanded WorkBuddy globally while betting smaller AI models can win more users against Alibaba and ByteDance in China's intensifying AI race.
Anthropic's SpaceX compute deal is real, but Musk's 180-day lease claim conflicts with payment terms through May 2029, raising Claude planning questions.
Apple's leaked Siri redesign points to a chatbot-style iPhone app, Dynamic Island replies, and Gemini-backed AI features that could debut at WWDC in June.