AI computing power surges, driving a revolution in server architectures.
Release time:
2021-11-09
The explosive growth of AI computing power is driving multi-dimensional technological innovations that, in turn, are compelling upgrades to server architectures—and propelling the industry toward a comprehensive overhaul, spanning from the hardware to the system level. At the heart of this transformation lies the exponential demand for computational resources fueled by large-scale AI models, spurring breakthroughs and innovations in areas such as computing units, storage systems, interconnect technologies, power management, and cooling solutions.
The explosive growth of AI computing power is driving multi-dimensional technological innovations that, in turn, are compelling upgrades to server architectures—and propelling the industry toward a comprehensive reconfiguration, spanning from the hardware to the system level. At the heart of this transformation is the exponential demand for computational resources fueled by large-scale AI models, spurring breakthroughs and innovations in areas such as computing units, storage systems, interconnect technologies, power management, and cooling solutions.
I. Computing Units: From CPU-Dominated to Heterogeneous Computing
Traditional servers are centered around CPUs, but AI tasks—such as training large-scale models—demand parallel computing capabilities far beyond what CPUs can deliver. Thanks to their thousands of CUDA cores, GPUs have become the cornerstone of AI processing power, with their market share soaring from 12% in 2018 to 35% in 2023. For instance, NVIDIA’s H100 GPU leverages HBM3 high-bandwidth memory and NVLink interconnect technology delivering up to 900 GB/s, enabling a single card to achieve peak performance of 1,979 TFLOPS (FP16)—enabling the efficient training of models with hundreds of billions of parameters.
Meanwhile, domain-specific architecture (DSA) chips such as Google's TPU and AMD's MI300X are outperforming general-purpose GPUs in terms of power efficiency and performance by optimizing their hardware circuits for specific AI tasks—like AIGC inference. McKinsey predicts that by 2030, 95% of AI computing tasks will be handled by DSA architectures, potentially eclipsing the dominant role currently held by GPUs.
II. Storage Systems: Breaking Through the "Storage Wall" Bottleneck
After AI model parameter scales surpassed the trillion-level mark, data storage and access have become critical bottlenecks limiting system performance. The bandwidth and latency of traditional DRAM and NAND Flash can no longer meet the growing demands, driving innovation in the following technologies:
1. HBM (High-Bandwidth Memory): Utilizing 3D stacking technology, multiple layers of DRAM are vertically integrated, complemented by silicon through-silicon vias (TSV) to enable ultra-fast data transfer. For instance, the HBM3e single chip delivers a bandwidth of up to 1.2 TB/s, powering AI servers to handle terabyte-scale data in real time.
2. Processing-in-Memory (PIM): This approach integrates computing units directly into the storage chip, minimizing data movement. Samsung’s HBM-PIM solution boosts AI inference speed by 2.5x while reducing power consumption by 40%.
3. NVMe SSD and Optane Persistent Memory: Leveraging the NVMe protocol to achieve an interface speed of 64Gbps, combined with Optane memory to create a tiered storage architecture, this setup boosts database server performance by 5 to 8 times.
III. Interconnect Technologies: From PCIe to Ultra-High-Speed Networking
As AI cluster sizes continue to expand, data communication between computing units has become a performance bottleneck. Traditional PCIe 5.0 bandwidth, at just 128 GB/s, falls short of meeting the demands of multi-GPU collaboration. NVIDIA’s NVLink and NVSwitch technologies deliver a breakthrough through the following innovations:
- NVLink 4.0: Delivers bandwidth up to 900 GB/s—7 times faster than PCIe 5.0—and enables low-latency communication between GPUs.
- NVSwitch System: A single-chip solution featuring 64 ports and supporting 13.6 Tb/s data transfer, serving as the "neural hub" for building AI clusters spanning tens of thousands of GPUs.
- CXL interconnect protocol: Enables cache coherence between CPUs and accelerators, making it possible to pool computing resources and boosting resource utilization by more than 30%.
4. Power Management: From 12V to 48V DC Systems
AI server power consumption has surged dramatically (with individual GPUs exceeding 700W), exposing the inefficiency of traditional 12V power supply solutions. In contrast, a 48V DC system uses a DC/DC conversion module to step down the voltage to as low as 0.8V, significantly reducing current-related transmission losses and boosting energy efficiency by 15%–20%. Domestically, companies like Huawei are already developing indigenous DCX modules, paving the way for greater autonomy and control over the supply chain.
V. Cooling Solutions: From Air Cooling to Liquid Cooling—A Revolutionary Shift
When AI servers operate under high load, their power density exceeds 50 kW per cabinet, making traditional air cooling insufficient to meet cooling demands. As a result, liquid cooling and immersion cooling technologies are now becoming the industry standard.
- Liquid cooling technology: By directly contacting heat-generating components with cold plates or chilled liquid, cooling efficiency is boosted by 3 to 5 times, reducing the PUE value below 1.1.
- Immersion cooling: By fully submerging servers in a fluorinated liquid, cooling efficiency is boosted by 40%, while maintenance costs are reduced by 30%. Microsoft’s underwater data center project achieves a PUE as low as 1.06 thanks to its immersion-cooling technology.
6. Architectural Models: From Traditional Virtualization to Cloud-Native
Traditional server architectures rely on virtualization technology, which often leads to performance bottlenecks and high availability depending heavily on host hardware. Cloud-native architectures, however, deliver innovation through the following features:
- Microservices: Breaking down applications into multiple independent services to enable scalable expansion and fault tolerance.
- Containerization: Achieves cross-platform deployment via Docker/Kubernetes, boosting resource utilization by more than 50%.
- Automated operations and maintenance: AI-driven resource scheduling and fault prediction reduce operational costs by 40%.
7. Market and Industry Impact
1. Market Size: In 2024, the global AI server market reached $125.1 billion, and is projected to surpass $222.7 billion by 2028, with generative AI servers increasing their share from 29.6% to 37.7%.
2. Industry Chain Restructuring: The direct sales share of traditional OEM manufacturers dropped from 68% in 2015 to 41% in 2023, while the proportion of ODM direct supply models targeting hyperscale data centers surpassed 35%.
3. Domestic Breakthroughs: Chinese companies are accelerating their efforts to replace foreign suppliers in areas such as HBM, DSA chips, and liquid cooling technologies, making the localization rate of the computing power industry a key focus for both policymakers and investors.
Relevant Information