Kuaishou, a prominent short video platform based in Beijing, unveiled its self-developed large language model named KwaiYii to the public last week, according to a report from TechNode. Furthermore, the company also unveiled its research into Spiking Neural Networks and the development of SpikeGPT.
This release comes after a beta-testing phase for a ChatGPT-like service for Android devices that started on August 18. The dialog service, which boasts 13 billion parameters rooted in KwaiYii, rivals OpenAI GPT-3.5 in its capacity for content creation, consultation, and problem-solving.
The LLM is detailed on KwaiYii's GitHub page. The primary application for Kuaishou's AI chatbot has been search, utilizing original content from the platform to address AI “hallucinations” – inaccuracies resulting from inadequate data training.
SpikeGPT: A Leap in Energy Efficiency
Kuaishou is positioning itself as a major force in AI research and development, both in public mainstream products and R&D projects. KwaiYii is an example of the mainstream public AI, while Kuaishou also this week discussed SpikeGPT, an example of its AI research efforts.
The computational demands of contemporary large language models (LLMs) are substantial. However, Spiking Neural Networks (SNNs) have been identified as a more energy-efficient alternative to conventional artificial neural networks, even though their efficacy in language generation tasks remains uncharted.
A research collaboration between the University of California and Kuaishou Technology has introduced SpikeGPT (via Synced Review), the inaugural generative spiking neural network (SNN) language model. This model, with its 260M parameter version, matches the performance of deep neural networks (DNN) while retaining the energy-saving benefits of spike-based computations.
SpikeGPT is a generative language model characterized by pure binary, event-driven spiking activation units. It integrates recurrence into a transformer block, making it compatible with SNNs. This integration not only eliminates the quadratic computational complexity but also facilitates the representation of words as event-driven spikes.
The model can process streaming data word by word, initiating computation even before the formation of a complete sentence, while still capturing the long-range dependencies in intricate syntactic structures. The research team has also incorporated various techniques to enhance SpikeGPT's performance, such as a binary embedding step, a token shift operator, and a vanilla RWKV to replace the traditional self-attention mechanism.
Understanding Spiking Neural Networks
Spiking neural networks (SNNs) are a type of artificial neural network that is inspired by the way biological neurons work. In SNNs, the neurons communicate with each other by sending spikes, which are short bursts of electrical activity. The spikes are not continuous, but rather occur at discrete time intervals. This is in contrast to traditional artificial neural networks, which use continuous values to represent the activation of neurons.
SNNs have several potential advantages over traditional artificial neural networks. First, they are more energy-efficient. This is because the spikes are only sent when necessary, rather than continuously. Second, SNNs are more biologically realistic. This makes them a good choice for applications that require a high degree of realism, such as robotics and medical imaging.
However, SNNs also have some challenges. One challenge is that they are more difficult to train than traditional artificial neural networks. This is because the spikes are discrete events, which makes it difficult to backpropagate the error through the network. Another challenge is that SNNs are not as well-understood as traditional artificial neural networks. This makes it difficult to design and optimize SNNs for specific tasks.
How SpikeGPT Performs
In an empirical study, SpikeGPT was trained with three different parameter scales (45M, 125M, and 260M parameters) and was benchmarked against transformer baselines like Reformer, Synthesizer, Linear Transformer, and Performer using the Enwik8 dataset. The results revealed that SpikeGPT delivered comparable outcomes with 22 times fewer synaptic operations (SynOps).
This research underscores the potential of training large SNNs to harness the advancements in transformers, suggesting a significant reduction in LLMs' computational demands by applying event-driven spiking activations to language generation. The researchers have expressed their intention to continue refining their model and will be updating their preprint paper accordingly. The code for SpikeGPT is available on the project's GitHub, and the paper detailing the model can be accessed on arXiv.