Intel has announced that its second-generation Xeon Phi hardware (codenamed Knights Landing) is now shipping to early customers. Knights Landing is built on 14nm process technology, with up to 72 Silvermont-derived CPU cores. While the design is derived from Atom, there are some obvious differences between these cores and the chips Intel uses in consumer hardware. Traditional Atom doesn’t support Hyper-Threading on the consumer side, while Knights Landing supports four threads per core. Knights Landing also supports AVX-512 extensions.
The new Xeon Phi runs at roughly 1.3GHz and is split into tiles, as Anandtech reports. There are two cores (eight threads) per tile along with two VPUs (Vector Proc essing Units, aka AVX-512 units). Each tile shares 1MB of L2 cache (36MB cache total). Unlike first-gen Xeon Phi, Knights Landing can actually run the OS natively and out-of-order performance is supposedly much improved compared to the P54C-derived chips that powered Knights Ferry. The chip includes ten memory controllers — two for DDR4 (six channels total) and eight MCDRAM controllers for a total of 16GB of on-chip memory and six channels of DDR4-2400 (up to 386GB total, according to Anandtech.).
Memory accesses can be mapped in different ways, depending on which model best suits the target workload. The 72 CPU cores can treat the entire MCDRAM space as a giant cache, but if they do, accessing main memory incurs a greater penalty in the event of a cache miss. Alternately, data can be flat mapped to both the DDR4 and MCDRAM and accessed that way. Finally, some MCDRAM can be mapped as a cache (with a higher-latency DDR4 fallback) while other MCDRAM is mapped as main memory with less overall latency. The card connects to the rest of the system via 36 PCIe 3.0 lanes. That’s 36GB/s of memory bandwidth in each direction (72GB/s of bandwidth in total) assuming that all 36 lanes can be dedicated to a single co-processor.
The overall image Intel is painting is that of a serious computing powerhouse, with far more horsepower than the previous generation. According to the company, at least some Xeon Phi workstations are going to ship next year. Intel will target researchers who want to work on Xeon Phi but don’t have access to a supercomputer for testing their software. With 3TFLOPS of double precision floating point performance, Xeon Phi can lay fair claim to the title of “Supercomputer on a PCB.” 3TFLOPs might not seem like much compared to the modern TOP500, but it’s more than enough to evaluate test cases and optimizations.
Intel has no plans to offer Xeon Phi in wide release (at least not right now), but if this program proves successful, we could see a limited run of smaller Xeon Phi coprocessors for application acceleration in other contexts. In theory, any well-parallelized workload that can run on x86 should perform well on Xeon Phi, and while we don’t see Intel making a return to the graphics market, it would be interesting to see the chip deployed as a rendering accelerator.
As far as comparisons to Nvidia are concerned, the only Nvidia Tesla that comes close to 3TFLOPS is the dual-GPU K80 GPU compute module. It’s not clear if that solution can match a single Xeon Phi, given that the Nvidia Tesla is scaling across two discrete chips. Future Nvidia products based on Pascal are expected to pack up to 32GB of on-board memory and should substantially improve the relative performance between the two, but we don’t know when that hardware will hit the market.