At the recent German Supercomputing Conference, Intel announced that its upcoming "Falcon Shores" chip will feature 288GB of HBM3 memory and a total memory throughput of 9.8TB/s. As expected, the chip will also support smaller data types such as FP8 and BF16. These details are part of Intel's strategic transformation to seize the AI processor market and catch up with competitors like NVIDIA and AMD.
As part of this strategic transformation, Intel is committed to developing high-performance AI processors to meet the growing demand. By launching the "Falcon Shores" chip, Intel aims to further expand its market share in the field of AI and provide users with more powerful and efficient computing solutions.
In addition to its powerful performance, Intel's "Falcon Shores" chip will also support smaller data types, which will help improve computational efficiency and reduce power consumption. Data types like FP8 and BF16 are increasingly popular in deep learning and other AI applications, so including them in the new product will further meet market demands.
Overall, Intel's "Falcon Shores" chip will be a powerful and high-performance AI processor. Its release marks an important step in Intel's strategic transformation in the field of AI and is expected to engage in fierce competition with competitors like NVIDIA and AMD in the future.
According to earlier reports, Intel planned to launch a product called Falcon Shores, which integrated GPU, CPU, and Memory to create an XPU product. However, in the meeting on Monday, Intel confirmed that Falcon Shores will no longer be an XPU and will instead revert back to being a standalone GPU.
Falcon Shores GPU is a part of Intel's Xeon Max GPU series, using standard Ethernet switches and resembling Intel's Gaudi architecture focused on AI. Additionally, Falcon Shores GPU will adopt Chiplet module-based design, similar to Ponte Vecchio, allowing for programmable processing on individual GPUs. The basic draft of the device also includes a universal GPU-based programming interface called OneAPI, which will be widely compatible with other CPUs and architectures. Intel lists CXL support as a key differentiator, enabling GPUs, AI chips, and other accelerators to easily access large memory pools.
According to Intel, the current computing environment is not yet mature, and the goal of blending CPU and GPU kernels into the same Falcon Shores package is still too early. As AI and LLM generation enter the HPC space, the workloads of different processors are changing, and thus, the best CPU and GPU kernel combinations are also evolving. Therefore, Intel is transforming its thinking to build the next-generation supercomputing architecture, believing that it is not yet time to lock customers into specific CPU and GPU ratios.
Moreover, from a design perspective, advanced supercomputers are highly specialized designs for specific tasks, and software adjustments for architecture are just routine operations for running supercomputers. These factors mean that CPU/GPU ratio is not the only reason Intel has removed CPU cores from its design.
Intel also notes that Falcon Shores allows its customers to use various CPUs, including AMD's x86 and Nvidia's Arm chips, as well as their GPU designs. Therefore, it will not restrict customers from choosing Intel's x86 cores. CPU-GPU decoupling will provide more choices for customers with different workloads.
Intel stated that the purpose of using the CXL interface is to enable customers to leverage composable architectures and combine various CPU/GPU ratios in their custom designs. However, the CXL interface provides only 64 GB/s throughput between chip combinations, while custom CPU+GPU designs like Nvidia's Grace Hopper can provide up to 1 TB/s memory throughput between CPU and GPU. For many types of workloads - especially those requiring memory bandwidth for AI workloads - this offers performance and efficiency advantages over the CXL implementation.