By Asmita - Dec 28, 2024
DeepSeek unveils the DeepSeek-V3, an open-source large language model with 671 billion parameters surpassing Meta's Llama 3.1. The AI model features a MoE architecture, was pre-trained on a dataset of 14.8 trillion tokens, and outperforms competitors across benchmarks. Efficiently trained in 2.788 million hours using Nvidia H800 GPUs, the model is available on Hugging Face under an MIT license, excelling in coding competitions and language tasks.
AI via FMT
LATEST
DeepSeek, a Chinese artificial intelligence firm, has unveiled the DeepSeek-V3, a groundbreaking open-source large language model (LLM) that sets new standards in the AI landscape. With an impressive 671 billion parameters, the model significantly surpasses previous open-source offerings like Meta's Llama 3.1, which had 405 billion parameters. The AI model is designed with a unique mixture-of-experts (MoE) architecture, enabling it to activate only the most relevant parameters for each specific task, thereby ensuring enhanced efficiency and accuracy in processing information.
The model's technical capabilities are remarkable, having been pre-trained on a massive dataset of 14.8 trillion tokens, which equates to approximately 11.1 trillion words. DeepSeek-V3 was developed using advanced techniques including Multi-head Latent Attention (MLA) and sophisticated training methodologies like supervised fine-tuning and reinforcement learning. The researchers claim the model outperforms competitors across multiple benchmarks, including Big-Bench High-Performance (BBH), Massive Multitask Language Understanding (MMLU), HumanEval, and MATH assessments.
Despite its substantial size, the DeepSeek-V3 was efficiently trained in just 2.788 million hours using Nvidia H800 GPUs, with an estimated training cost of $5.5 million. The model's architecture incorporates innovative load-balancing techniques to minimize performance degradation, a strategy first implemented in its predecessor. Notably, the AI can only activate 37 billion parameters per token, which contributes to its computational efficiency. The model is currently accessible on Hugging Face under an MIT license, allowing both personal and commercial usage.
The DeepSeek-V3 demonstrates exceptional performance across various domains, particularly excelling in coding competitions and language tasks. Performance evaluations reveal it outperforms significant competitors like Meta's Llama 3.1, OpenAI's GPT-4o, and Alibaba's Qwen 2.5. The model's versatility extends to tasks such as essay writing, algorithm development, workflow automation, content generation, and language translation. However, like many AI models developed in China, it has limitations when addressing politically sensitive topics, reflecting regulatory constraints on AI development in the region.