As the market size of high-performance computing (HPC) systems, especially AI servers, continues to expand, the performance and power consumption levels of their core processors, including CPUs, GPUs, NPUs, ASICs, FPGAs, and memory and network communication chip components, are also improving. With the enhancement of performance, the improvement of power management becomes even more critical, as the power consumption of HPC systems, particularly AI servers, is increasing, posing higher demands on the power management capabilities of the entire system and the main chips.
In AI servers, the CPU requires power supply, the GPU board card requires power supply, memory (DDR4, DDR5, HBM) requires power supply, and various interfaces also require power supply. At this point, the power supply management system becomes very important. In addition to AC/DC power supplies and DC/DC converters, passive components used in the power supply management system (mainly inductors and capacitors) also play a key role. With the improvement of system performance and power consumption, higher and more demands are placed on the performance and quantity of these passive components.
Advertisement
High-performance passive components can provide more stable voltage and current to ensure the normal operation of HPC systems such as AI servers, ensuring fast transient response and low ripple. Low-loss passive components can improve the energy efficiency of AI servers, enhance the efficiency of key components, and save energy and protect the environment. To ensure the reliability and stability of AI servers, higher demands are placed on inductors.
01
Power Supply Challenges of AI Systems
Compared to general servers, AI servers require higher configurations and energy consumption. Due to the power of AI servers being 6 to 8 times higher than that of general servers, the requirements for power supplies are also increased accordingly. Currently, general servers on the market typically require two 800W power supplies, while AI servers may require up to four 1800W power supplies.
As the performance of servers improves, the number of accompanying inductive transformers will inevitably increase. Taking chip inductors as an example, a report from an organization points out that due to the increase in the number of GPUs, AI servers require a total of 24 to 48 inductors, with each valued at 1 US dollar, the value of chip inductors in AI servers is 60% to 220% higher than in general servers.
Additionally, in AI servers, multi-phase or coupled inductors and other integrated forms are gradually replacing single inductor applications; to address thermal dissipation and loss issues, ultra-thin applications and power module-like power supplies will become more widespread.
Data centers require an increasing number of AI accelerator cards, which need to be equipped with a large number of processors (xPU), often using large-scale parallel computing solutions. Compared to general CPUs, xPUs have a large number of small cores, which are helpful for neural network training and AI inference. However, when xPUs perform AI calculations and transmit data, they generate significant power consumption. That is to say, xPUs are very power-hungry chips, and their strict power consumption requirements pose new challenges for AI accelerator cards, which can also affect system performance.When AI systems are operational, especially when processing workloads such as deep learning and inference, they require extremely high computational power. At the system level, AI accelerators play a crucial role in providing near real-time results. All xPU have multiple high-end cores, which are composed of billions of transistors and consume hundreds of amperes of current. The core voltage of these xPU has been reduced to the 1V level.
The peak current density required by AI accelerator cards is a very heavy burden for any motherboard, difficult to handle. The highly dynamic nature of the workload and the extremely high current transients can lead to very high di/dt and sharp voltage transients lasting for several microseconds, which are very destructive and may cause damage to the xPU. The average workload of AI will last for a long time, and decoupling capacitors will not always be able to provide the energy to meet immediate needs. At this time, it is necessary to eliminate the transients of the AI accelerator to avoid causing damage to the entire power distribution network.
Currently, the requirements for xPU voltage regulators (VR) are very different from standard PoL voltage regulators. Some applications require providing more than 1000A of current to the xPU at voltages below 1V. At this time, power consumption must be well controlled, otherwise, it is difficult for the system to work stably.
How to reduce the energy consumption of AI systems has become an industry problem. Currently, there are two main ideas for reducing the energy consumption of AI systems: First, reduce the energy consumption of the core processor of the AI system; Second, optimize the power supply management system to improve the efficiency of power supply management for the AI core processor. However, with the popularization of emerging applications such as AI, the efficiency of traditional computing systems using AC/DC, DC/DC, multi-phase power controllers, and DrMOS power level combinations has reached the ceiling, and more advanced power management solutions are needed.
02
The server power supply system is evolving
The miniaturization of processors has led to a reduction in power supply voltage, but the current consumed has not decreased but increased, making the power consumption continue to increase. One of the problems brought about by the development trend of low voltage and high current is how to improve the rapid response ability to load fluctuations.
As the voltage decreases, the allowable tolerance of the voltage becomes very small. For example, in order to avoid misoperation of the processor, if the core voltage is provided with an accuracy of ±3%, the tolerance at a voltage of 1V must be controlled within ±30mV. For server-specific power supplies, even under the driving conditions of large current load changes exceeding 1000A, the output voltage must be as stable as possible.
In practical applications, the development trend of low voltage and high current has been continuing, and high frequency and multi-phase are usually used to cope. Switching operations at higher frequencies allow the use of smaller components (such as capacitors and inductors) to manage and smooth the flow of energy in the input and output circuits. For converters based on ordinary silicon power semiconductor devices, the typical switching frequency is 30-80kHz. At this frequency, widely recognized capacitors can be used, which are cost-effective. However, above this frequency range, parasitic effects will lead to excessive resistance losses and self-heating.
Although increasing the frequency has a great effect on improving load response, it will also greatly increase the loss of switching components. In addition, by using large capacity external capacitors, voltage fluctuations in large current applications can be suppressed to a certain extent, but this will increase the installation area and capacitor cost.Taking into account the many factors mentioned above, TLVR (Trans-Inductor Voltage Regulators) is currently the mainstream circuit configuration solution for dealing with rapid load fluctuations in low-voltage, high-current applications. This scheme involves connecting each phase switch to an inductor with an additional winding, and then serially connecting the windings of each phase with a compensating inductor to provide current for each phase simultaneously. TLVR enables processors to achieve higher transient response performance, meet load requirements, and maintain almost no reduction in supply voltage, while reducing power supply losses. It also allows for maintaining a smaller output capacitance value, thereby reducing the installation area and system costs.
03
More Inductor Solutions
In high-performance computing systems, particularly in the power management systems of AI servers, there is an increasing variety of inductor solutions. In addition to the aforementioned TLVR, there are products such as one-piece molded inductors, chip inductors, and ultra-thin one-piece molded inductors.
Chip inductors serve to supply power to the front end of chips, primarily used for voltage and current conversion, and are commonly found in power management integrated circuits (PMICs) and FPGA power supply circuits. In high-performance computing systems, chip inductors, capacitors, MOSFETs, and driver chips together form the power supply circuit to meet the power demands of GPUs and CPUs.
Currently, the mainstream chip inductors use ferrite material, but ferrite has poor saturation characteristics. With the miniaturization of power modules and the increase in current, the volume and saturation characteristics of ferrite inductors are no longer sufficient to meet the requirements of high-performance GPUs. In recent years, a type of metal soft magnetic material inductor has emerged, which has higher efficiency, smaller size, and can better respond to large current changes. Chip inductors using metal soft magnetic materials can operate at switching frequencies ranging from 500kHz to 10MHz.
There is also a type of chip inductor based on semiconductor thin-film processes, using photolithography processing technology, which differs from traditional wound inductor and one-piece molded inductor processes. The biggest feature of semiconductor thin-film processes is the ability to produce chip inductor products in full wafer production, increasing production efficiency. Traditional power modules based on SIP (System in Package) process encapsulate the chip and inductor on a single packaging base, integrating the power inductor with the packaging base to achieve a two-in-one solution. Compared to the traditional SIP, which requires "chip + inductor + base," the solution based on semiconductor thin-film processes only needs to encapsulate the chip with integrated inductors and other components to achieve a complete power module and peripheral circuit functions, further reducing the size of the power module and increasing power density while reducing costs.
This type of chip inductor utilizes new magnetic materials with excellent permeability and saturation current. At a frequency of 6MHz, the material loss of the inductor accounts for a very low proportion of the total inductor loss.
04
Capacitors are Also Very ImportantIn the power management systems of high-performance computing, the renewal and replacement of inductors, capacitors, and thermistors are also underway.
Currently, the proportion of AI servers in the overall high-performance computing market is still relatively low. Therefore, there are no market research organizations that have yet to tally the consumption of MLCCs (Multilayer Ceramic Chip Capacitors) by AI servers. However, looking at the development trend, passive component distributors generally have high expectations for the application prospects of capacitors, particularly MLCCs, in AI servers. A significant growth trend is expected to emerge in the second half of 2024, with a substantial increase in both MLCC specifications and unit prices.
On the technical front, computing system processors all require the cooperation of capacitors. Traditionally, these capacitors have used tantalum or polymer capacitors. To reduce reliance on decoupling capacitors, a small portion of Class II MLCCs (such as X5R, X6S, or X7R devices) can be placed directly near the processor. Currently, some manufacturers are striving to embed aluminum polymer decoupling capacitors into the chip carrier within the packaging, working in conjunction with on-chip silicon capacitors. This approach can overcome the decoupling challenges faced by high-performance processors and support higher converter frequencies, potentially reaching up to 10MHz in the future.
05
Opportunities for Passive Component Manufacturers
A few days ago, at NVIDIA's GTC conference, server OEM giant Delta Electronics stated that in the power conversion system of AI servers, how to maintain the voltage at 0.8V for GPU operation while the current rapidly surges is a critical role for inductors. They must be able to operate stably under high current and low voltage conditions.
AI servers equipped with NVIDIA's new Blackwell architecture accelerator chips have a power consumption of up to 1000W to 1200W, with inductor usage increasing by 2 to 3 times compared to regular servers. Additionally, due to the significant increase in power consumption, higher specifications of inductors are required, resulting in an average selling price (ASP) that is 5 to 8 times higher than that of regular servers. Furthermore, as the penetration rate of DDR5 gradually increases, there is a need for more and better inductors.
The power consumption of AI servers has increased significantly. To improve instantaneous response performance, the addition of TLVR inductors is necessary, with 5 to 10 new inductors required per AI server. The unit price of TLVR inductors is 3 to 5 times that of regular inductors.
It's not just the latest AI servers; an increasing number of high-performance computing systems require more and better inductors. Even a simple CPU upgrade in a general server can significantly increase inductor usage. For example, when upgrading from Eagle Stream to Birch Stream, due to a CPU power increase of about 50%, the inductor usage needs to increase by 50% to 70%.
It is evident that for major passive component manufacturers, especially high-quality inductor companies, new business opportunities are on the horizon. Currently, the leading industry players in this field include TDK, Yageo, Sunlord Electronics, Tai庆科, ITG, and EATON, among others.As previously mentioned, in the power management systems of high-performance computing, the use of chip inductors is increasing, which is not only good news for international giants but also a period of opportunity for Chinese domestic enterprises to improve product quality and market share. The Chinese chip inductor industry started relatively late, and in its early development, the level of technology research and development and production management was lagging behind international giants, especially well-known companies such as TDK, Murata, Chilisin, and Taiyo Yuden. In recent years, Chinese domestic enterprises like Sunlord Electronics have been making efforts and have ranked among the top five globally. In addition, other domestic chip inductor enterprises worth paying attention to include Boke New Materials, Maijie Technology, Yitong New Materials, Tian Tong Shares, Dongmu Shares, and Hengdian Dongci, etc.
06
Conclusion
In the current context where the market size of high-performance computing systems, especially AI servers, is continuously expanding, the requirements for key chip components are becoming higher and higher. Not only high-performance processors like GPUs and CPUs, but also the power management systems, and the quantity and quality requirements for related chips and components have significantly increased.
For inductors and capacitors, which are not very conspicuous but indispensable and used in large quantities in the power management system, the increasing power consumption of computing systems is exactly the stage for them to fully exert their efficiency and function. New technologies and materials are also expected to continue to emerge.
For passive component manufacturers, national giants with high-quality products will still have better business opportunities, and for Chinese domestic enterprises, the huge domestic market provides them with enough room to play, giving them more opportunities to seize market share from international giants.
Comment Box