New network needed to let 36-cores talk
Researchers have unveiled an experimental 36-core processor with a number of intriguing design points.
The exciting new chip design lets cores manage local memory more efficiently using an Internet-style communication network.
For years, some forward-thinking researchers have argued that the massive multi-core chips of the future will need to resemble little Internets, where each core has an associated router and data travels in fixed packets between them.
At the International Symposium on Computer Architecture, Li-Shiuan Peh, the Singapore Research Professor of Electrical Engineering and Computer Science at MIT unveiled the 36-core chip, which features one idea for the “network-on-chip.”
The new design solves a major problem hampering previous attempts: how to maintain cache coherence, or, how to make sure that the cores’ locally-stored copies of globally-accessible data remain up to date.
In today’s processors, cores are linked by a ‘bus’. When two cores need to communicate, they are granted exclusive access to the bus.
But as the core count rises researchers say this will quickly become too inefficient.
In MIT’S new network-on-chip, each core is connected only to those immediately adjacent to it.
“You can reach your neighbours really quickly,” says Bhavya Daya, an graduate student in electrical engineering and computer science, and first author on a new paper.
“You can also have multiple paths to your destination. So if you’re going way across, rather than having one congested path, you could have multiple ones.”
In order to make sure various copies of the data stay the same as it flies between multiple cores, the team equipped their chips with a second network shadowing the first.
The circuits connected to this network are very simple; all they can do is declare that their associated cores have sent requests for data over the main network. Because these declarations are so simple, nodes in the shadow network can combine them and pass them on without incurring delays.
The system creates hierarchical ordering to simulate the chronological ordering of requests sent over a bus. The hierarchy is shuffled during every interval, however, to ensure that in the long run, all the cores receive equal weight.
Cache coherence in multi-core chips “is a big problem, and it’s one that gets larger all the time,” according to Todd Austin, a professor of electrical engineering and computer science at the University of Michigan.
“Their contribution is an interesting one, they’re saying; ‘Let’s get rid of a lot of the complexity that’s in existing networks. That will create more avenues for communication, and our clever communication protocol will sort out all the details’.”
“It’s a much simpler approach and a faster approach. It’s a really clever idea,” he said.
The team’s next experiments will see them load a version of Linux, modified to run on 36 cores, to evaluate the performance of real applications and determine the accuracy of the group’s theoretical projections.
MIT says it will release the blueprints for the chip, written in the hardware description language Verilog, as open-source code.