As the demand for data continues to grow, even in areas where market growth has slowed down, artificial intelligence applications such as ChatGPT have proven to be surprising in their speed of development and the amount of data they require. For example, ChatGPT-3 requires 175 billion parameters during training. These rapid advancements in AI applications have raised higher requirements and standards for processors and bandwidth.
However, there has long been a misconception in the industry: Despite the significant growth in computing power, bandwidth advancements have not improved. This means that many GPU resources are not being fully utilized on top of existing high computing power, leading to current challenges. Frank Ferro, Senior Director of Product Marketing for Rambus IP Cores, points out that "we need to reexamine this issue and find a solution." He believes that by improving bandwidth and optimizing processor architectures, we can better utilize GPU resources and promote the development of AI applications.
Artificial intelligence applications consist of training and inference components, each with different requirements. Training requires extensive data analysis and is a scenario with high computing power and high time consumption. Inference layers, on the other hand, have lower computing power requirements but are more sensitive to cost and power consumption. As AI inference moves towards edge devices, the advantages of reduced data transmission volume, improved edge device performance, and reduced latency will become apparent. In this process, GDDR's high bandwidth and low latency features make it stand out on the edge side.
GDDR was originally developed for high-performance graphics calculations, but it is now appearing in data centers and network applications as well. Although GDDR is mainly used in the graphics field, its excellent data transfer rate is suitable for many edge-side AI inference scenarios and network applications. In contrast, DDR devices face challenges in terms of quantity, cost, and power consumption. Take DDR4 as an example; its operating speed is 3.2 Gb/s, while GDDR has already achieved 16 Gb/s bandwidth, which is 3-4 times faster than DDR4. Additionally, both have significant differences in terms of capacity density and power consumption. Therefore, GDDR is the best choice when considering the main standards and most important measures; however, DDR is a better option when considering storage density and cost sensitivity.
When discussing the application differences between HBM and GDDR6, Frank Ferro pointed out that although HBM can provide a bandwidth of 800Gb, it exceeds the required 400-500Gb bandwidth for AI inference and increases costs by 3-4 times. In contrast, for AI inference scenarios that require larger capacity and higher bandwidth, GDDR6 is a more suitable choice.
Furthermore, Rambus denied any plans to develop GDDR6X technology or products. Although both GDDR6 and GDDR6X are excellent technologies with very high performance, the former is a JEDEC standard, while the latter is protected by patents. Therefore, Rambus prefers to develop specific technologies based on customer needs.
Recently, Rambus launched a 24Gb/s GDDR6 PHY product that provides a bandwidth of 96 GB/s per GDDR6 memory device. This product is designed to provide high cost-effectiveness and high bandwidth memory performance for artificial intelligence, graphics, and network applications.
The main features of this PHY IP include:
● Providing up to 24 Gb/s data transmission rate with a maximum bandwidth of 96 GB/s;
● Can be combined with Rambus GDDR6 digital controller IP to form a complete memory subsystem solution, allowing users to customize the system directly;
● Provides LabStation™ development environment for quick system establishment, characteristic analysis, and debugging;
● Based on Rambus' 30 years of leadership in high-speed signal integrity and power integrity (SI/PI) expertise, it can provide system-level signal integrity;
● Provides reference designs and support for packaging and PCB;
In practical applications, the PHY physical layer connects directly to DRAM through two 16-bit slots, while the other side connects to the memory controller via DFI interface to ensure that the controller accesses the logic control of the entire system directly. Frank Ferro explained that the trend of using 16-bit dual read-write channels has become mainstream, and the significant increase in data width of 32 bits can significantly increase data transmission speed and efficiency. Meanwhile, GDDR6 memory also has eight such dual read-write channels, bringing together a total of 256 bits of data transmission width, which can bring about very large data transmission rates and system efficiency improvements, as well as further optimization in power management.
For AI inference applications, each GDDR6 device can achieve a bandwidth of 96 Gb/s, so combining four to five GDDR6 devices together can easily meet bandwidth requirements of 500 Gb/s or below.