Numenta has demonstrated that Intel Xeon CPUs can vastly outperform the best CPUs and best GPUs on AI workloads by applying a novel approach to them.
Using a set of techniques based on this idea, branded under the Numenta Platform for Intelligent Computing (NuPIC) label, the startup has unlocked new performance levels in conventional CPUs on AI inference, according to Serve the Home.
The really astonishing thing is it can apparently outperform GPUs and CPUs specifically designed to tackle AI inference. For example, Numenta took a workload for which Nvidia reported performance figures with its A100 GPU, and ran it on an augmented 48-core 4th-Gen Sapphire Rapids CPU. In all scenarios, it was faster than Nvidia’s chip based on total throughput. In fact, it was 64 times faster than a 3rd-Gen Intel Xeon processor and ten times faster than the A100 GPU.
Boosting AI performance with neuroscience
Numenta, known for its neuroscience-inspired approach to AI workloads, leans heavily on the idea of sparse computing – which is how the brain forms connections between neurons.
Most CPUs and GPUs today are designed for dense computing, especially for AI, which is rather more brute force than the contextual manner in which the brain works. Although sparsity is a surefire way to boost performance, CPUs can’t work well in that way. This is where Numenta steps in.
This startup looks to unlock the efficiency gains of sparse computing in AI models by applying its “secret sauce” to general CPUs rather than chips built specifically to handle AI-centric workloads.
Although it can work on both CPUs and GPUs, Numenta adopted Intel Xeon CPUs and applied its Advanced Vector Extensions (AVX)-512 plus Advanced Matrix Extensions (AMX) to it, because Intel’s chips were the most available at the time.
These are extensions to the x86 architecture – serving as additional instruction sets that can allow CPUs to perform more demanding functions.
Numenta delivers its NuPIC service using docker containers, and it can run on a company’s own servers. Should it work in practice, it would be an optimum solution to repurposing CPUs already deployed in data centers for AI workloads, especially in light of lengthy wait times on Nvidia’s industry-leading A100 and H100 GPUs.