China Telecom has unveiled a 100-billion-parameter AI model, TeleChat2-115B, that was entirely developed using computing systems built within China. In a notable move to sidestep reliance on Western tech imports, the model’s training was accomplished with domestic infrastructure, a clear statement of China’s growing capability in the AI field despite international tech sanctions.
According to an update on GitHub, the model was trained with 10 trillion tokens comprising both Chinese and English data. TeleChat2-115B demonstrates how China is pushing forward with AI development while navigating challenges like limited access to advanced foreign hardware.
AI Training Powered by Huawei’s Hardware
At the core of TeleChat2’s development are Huawei’s Ascend Atlas 800T A2 servers. These servers, which house Kunpeng 920 processors, were pivotal to the model’s training. Processors used are built on the Arm 8.2 architecture and operate at speeds up to 3.0GHz. Manufactured with a 7nm process, these processors serve as the backbone for AI models such as TeleChat2, which required vast computational power despite hardware restrictions due to global sanctions.
While TeleChat2 doesn’t match the scale of models like Meta’s Llama, which has 400 billion parameters, the Chinese model’s size is not the only measure of its potential. In fact, by optimizing its training methods and infrastructure, China Telecom demonstrates how effective results can be achieved even without the latest Western hardware.
Efficient AI Development with Local Resources
One of the key aspects of TeleChat2-115B’s creation is how China Telecom utilized available resources to produce the model. Despite the absence of high-end GPUs, typically essential for such massive AI models, the telecom giant made the most of local infrastructure. Huawei’s Ascend Atlas servers, while featuring basic GPUs, are mainly relied upon for their processing power rather than cutting-edge graphics capabilities.
China Telecom’s broad reach and financial power also provide a critical advantage. With annual revenues surpassing $70 billion and a subscriber base of more than 500 million, the company has the resources to push its AI ambitions forward. The allows it to compete in AI without access to foreign-made technology, demonstrating that large-scale AI projects can still be pursued using locally produced hardware and software.
Enhanced Model Structure and Performance
TeleChat2-115B builds on its predecessor, TeleChat1, offering enhanced capabilities in general question-answering, coding, and mathematical problem-solving. The model’s structure features improvements such as Rotary Embedding for positional encoding, SwiGLU activation functions, and RMSNorm for more efficient training. Optimizations make the model more stable and improve its overall performance.
Rotary embedding is a technique used in natural language processing (NLP) to improve the performance of transformer-based models. It works by adding a positional encoding to the input embeddings of a sequence, which helps the model to understand the order of the tokens in the sequence.
Moreover, new methods were employed during training to improve long-text comprehension. For instance, techniques like RingAttention and NTK-aware scaling ensure the model can efficiently process data of varying lengths. The structural enhancements make TeleChat2 capable of handling complex tasks that demand a high level of comprehension and reasoning.
Comprehensive Evaluation Across Datasets
TeleChat2-115B has undergone extensive testing, with evaluations on various datasets that measure its ability to tackle subjects ranging from humanities to advanced mathematics. For example, the CEVAL and CMMLU datasets assessed its performance in Chinese-language tasks across multiple educational levels, while GSM8K provided a benchmark for its mathematical reasoning capabilities.
The model also performed well in coding tasks, demonstrated through its evaluation with datasets like HumanEval and MBPP, which test code generation abilities. TeleChat2’s results place it among the top-performing models of its scale in tasks involving both code and general reasoning.
Last Updated on October 14, 2024 12:42 pm CEST