TNG Technology Consulting has released DeepSeek-R1T-Chimera, an open-weight large language model. The model represents a technical fusion, aiming to combine attributes from two distinct AI systems developed by DeepSeek AI: the noted reasoning capability of DeepSeek R1 and the performance efficiency of the more recent DeepSeek V3-0324 checkpoint, released in March. Offered under a permissive MIT license, Chimera was constructed using what TNG Tech referred to in its X announcement as a “novel construction method.”
While TNG did not provide specific technical details, it appears their method involved building the model directly from selected neural network components of its parents, specifically using V3-0324’s shared expert layers augmented with a custom merge of the distinct routed expert layers from both R1 and V3-0324, rather than conventional finetuning or knowledge distillation. The stated objective was to create a model that retains R1’s reasoning strength but operates with V3’s speed and lower resource demands.
Today we release DeepSeek-R1T-Chimera, an open weights model adding R1 reasoning to @deepseek_ai V3-0324 with a novel construction method.
— TNG Technology Consulting GmbH (@tngtech) April 27, 2025
In benchmarks, it appears to be as smart as R1 but much faster, using 40% fewer output tokens.
The Chimera is a child LLM, using V3s… pic.twitter.com/3HYtHSLwF7
Architecture And Base Model Characteristics
DeepSeek-R1T-Chimera inherits the Mixture-of-Experts (MoE) architecture common to recent DeepSeek models. MoE designs allow models to have a very large total parameter count—685 billion in this case (composed of approximately 41.5 million F32, 3.9 billion BF16, and 680 billion F8_E4M3 parameters)—while only activating a smaller subset (around 37 billion for V3) during inference for a specific task, thus managing computational load.
The model utilizes safetensors, a secure format for storing model weights, and is distributed across 163 sharded files. It also employs FP8 quantization, a numerical format that reduces memory footprint compared to traditional 16-bit or 32-bit formats, potentially speeding up calculations with a manageable trade-off in precision. It leverages the `transformers` library and is tagged for `text-generation` tasks.
The V3-0324 base model, which contributes the efficiency characteristics, gained notice following its March 24 release for its impressive performance on high-end consumer hardware. Developer Awni Hannun reported achieving over 20 tokens per second using a 4-bit quantized version on an Apple Mac Studio, commenting, “It’s the most powerful model I’ve ever run on my laptop.”
Beyond MoE and FP8, V3 incorporates architectural features like Multi-Head Latent Attention (MLA), designed to better capture long-range data dependencies, and Multi-Token Prediction (MTP), allowing the generation of several tokens per inference step instead of just one. At the time, AI researcher Xeophon evaluated it favorably against contemporaries for certain tasks: “Tested the new DeepSeek V3 on my internal bench and it has a huge jump in all metrics on all tests. It is now the best non-reasoning model, dethroning Sonnet 3.5.”
TNG Tech claims Chimera shows promise in inheriting this efficiency, citing benchmarks on its model page, suggesting it uses around 40% fewer output tokens than R1 for similar reasoning tasks, producing outputs described as “more compact and orderly.”
The DeepSeek R1 component, contributing the reasoning element, had previously been identified as having content filtering mechanisms, particularly on topics sensitive within China.
This was highlighted by Perplexity AI when it released an unlocked version, R1 1776, around February 20. Perplexity CEO Aravind Srinivas stated back then: “The post-training to remove censorship was done without hurting the core reasoning ability of the model… Some example queries where we remove the censorship: ‘What is China’s form of government?’, ‘Who is Xi Jinping?’, ‘how Taiwan’s independence might impact Nvidia’s stock price’.” The release materials for Chimera do not specify how or if these filtering characteristics from the R1 parent were handled during the merging process.
Efficiency In A Constrained Environment
The development of specialized models like Chimera fits within DeepSeek AI’s wider pattern of focusing on architectural optimization, a strategy possibly influenced by restricted access to top-tier AI training hardware due to US export controls on advanced GPUs.
This approach gained external validation when Tencent, during its Q4 2024 earnings call, confirmed leveraging DeepSeek models to reduce its own GPU dependency. A Tencent executive noted, “Chinese companies are generally prioritizing efficiency and utilization — efficient utilization of the GPU servers… DeepSeek’s success really sort of symbolize and solidify — demonstrated that — that reality.”
DeepSeek AI’s original R1 model was itself reportedly trained using just 2,048 H800 GPUs, illustrating a historical focus on resource management. The company has also recently open-sourced infrastructure components supporting this focus, such as its 3FS distributed file system and the FlashMLA attention kernel.
The Shadow Of Scrutiny
Technologies originating from DeepSeek AI operate under a complex geopolitical shadow. A report released by the US House Select Committee on the CCP on April 16, labeled DeepSeek AI a national security risk. The detailed report, “DeepSeek Unmasked,” alleged activities including espionage, widespread user data collection potentially involving state-owned China Mobile, enforced CCP censorship, potentially used restricted Nvidia chips acquired illicitly, and engaged in intellectual property theft via model distillation.
Regarding potential IP theft, OpenAI provided a statement to the Select Committee, claiming: “Through our review, we found that DeepSeek employees circumvented guardrails in OpenAI’s models to extract reasoning outputs, which can be used in a technique known as ‘distillation’ to accelerate the development of advanced model reasoning capabilities at a lower cost… Additionally, we found that DeepSeek employees used OpenAI models to grade model responses and filter and transform training data… DeepSeek likely also used leading open-source AI models to create high-quality synthetic data.”
Committee Chairman John Moolenaar stated, “This report makes it clear: DeepSeek isn’t just another AI app — it’s a weapon in the Chinese Communist Party’s arsenal…” This background forms part of the context surrounding any model, like Chimera, derived from DeepSeek AI’s foundational work. TNG Technology Consulting can be reached via [email protected] for inquiries regarding their Chimera model.