Very cool. Although I wouldn't call this a "cheap supercomputer". It is surely too slow for any real world application. As the quotes in the article imply, the machine is a scale model of a supercomputer for training programmers. It helps them test for scalability and fix embarrassing bugs before they power up a real computer and start burning megawatts.
Depending on clock speed(/cooling) a Raspberry 3 achieves 3-6 GFlops. Assuming optimal performance&scaling it achieves 4.5 TFlops.
#500 of the top500 of supercomputers is about 600 TFlops. Assuming there are more than 500 supercomputers, I'd say this beautiful beast gets pretty friggin close.
And raw FLOPS aren't everything. The core-to-core bandwidth/latency on any GPU will run many circles around the node-to-node bandwidth/latency of this machine.
Basically, this thing is GREAT as a low-cost development testbed for massively parallel software designed to run on a real supercomputer... but that's about it. Such low performance would be matched by 4 Epyc 7551P boards, which would also eat less power and consume less space.
As someone who built a 64-node Pi 3 cluster (and profiled the performance for an HPC paper), the scaling in typical HPC workloads is quite poor due to the 100 Mbit Ethernet. You can also use the USB interface to a 1 Gbit adapter, but the USB is 2.0 and thus limited to less than half of a real Gigabit solution. The limited RAM per node also doesn't help.
I think the crappy network bandwidth is what makes this setup interesting and useful to LANL. As the old joke goes: "supercomputers turn computational problems into I/O problems."
Why can't they do testing on simple many-core single-chip which costs way less than this useless toy? This is a waste of money ($120 per RaspberryPi board?) and power.
The Pi 0/1/2 do not support network booting, and the Pi 3's network booting has some fairly significant bugs which make it hard to use with high-end switches.
Amusingly it works best with the dumbest switches, but that doesn't really intersect with most of the roles where people would be trying to netboot the things.
There is an updated bootloader available, but there's no way to load it permanently to the hardware. You have to load it off of a SD card (though this does mean it also works on the older models).
Hopefully the Pi 4 or whatever replacement may come around has netbooting done right.
Seems odd to build a "dense" cluster from Raspberry Pi Model B PCBs connected into a backplane. If the goal is to maximise density wouldn't it make more sense to use the Compute Module 3 plugged into a backplane that provides IO in a more optimal way (ie. built-in switch)? They could probably double the density, or halve the rack space requirement.
If the objective was to fill racks with them yes; but the stated purpose of this appears to be single units provided to people writing supercomputer code to test scaling during development. Enough cores to prove the workload scales approximately linearly with core count and as cheap as possible are equally important. The more they can plug off the shelf components together instead of engineering custom hardware the better for cost.
If at some point in the future someone does express an interest in buying a few dozen racks worth of them then it might be worth engineering more custom parts to ramp the density up.
All of that is assuming that the Compute Module was even an announced product when this began. Personally I think they missed the boat in terms of power consumption here. 250 Pis and 1000 cores should still be enough to test scaling; and at only ~1300W it could be plugged into a standard outlet in an office/lab instead of needing to go through the university bureaucracy to get it installed in an official data center.
It would be interesting to run a neural network on this with each core acting as a neuron. If the latency isn't terrible, it would allow for some cool parallelization work.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
26 Comments
Back to Article
ScottSoapbox - Tuesday, November 14, 2017 - link
I'm pretty sure the supercomputer needs 1.21 gigawatts of electricity, Gary.MrSpadge - Tuesday, November 14, 2017 - link
You mean Jiggawatts, don't you?LordConrad - Wednesday, November 15, 2017 - link
Only for time travel, which wouldn't work anyways as there is no Flux Capacitor.Lord of the Bored - Wednesday, November 15, 2017 - link
No flux capacitor THAT YOU KNOW OF. This IS Los Alamos, who KNOWS what crazy stuff they get to behind the classified signage.Elstar - Tuesday, November 14, 2017 - link
Very cool. Although I wouldn't call this a "cheap supercomputer". It is surely too slow for any real world application. As the quotes in the article imply, the machine is a scale model of a supercomputer for training programmers. It helps them test for scalability and fix embarrassing bugs before they power up a real computer and start burning megawatts.Mil0 - Tuesday, November 14, 2017 - link
Depending on clock speed(/cooling) a Raspberry 3 achieves 3-6 GFlops. Assuming optimal performance&scaling it achieves 4.5 TFlops.#500 of the top500 of supercomputers is about 600 TFlops. Assuming there are more than 500 supercomputers, I'd say this beautiful beast gets pretty friggin close.
MrSpadge - Tuesday, November 14, 2017 - link
By that standard my desktop with a GTX1070 with 7.2 TFlops plus an i3 is also a supercomputer.Elstar - Tuesday, November 14, 2017 - link
And raw FLOPS aren't everything. The core-to-core bandwidth/latency on any GPU will run many circles around the node-to-node bandwidth/latency of this machine.bananaforscale - Thursday, November 16, 2017 - link
Going by the tech of yesteryear, it is.Alexvrb - Wednesday, November 15, 2017 - link
Basically, this thing is GREAT as a low-cost development testbed for massively parallel software designed to run on a real supercomputer... but that's about it. Such low performance would be matched by 4 Epyc 7551P boards, which would also eat less power and consume less space.jptech7 - Wednesday, November 15, 2017 - link
As someone who built a 64-node Pi 3 cluster (and profiled the performance for an HPC paper), the scaling in typical HPC workloads is quite poor due to the 100 Mbit Ethernet. You can also use the USB interface to a 1 Gbit adapter, but the USB is 2.0 and thus limited to less than half of a real Gigabit solution. The limited RAM per node also doesn't help.Elstar - Wednesday, November 15, 2017 - link
I think the crappy network bandwidth is what makes this setup interesting and useful to LANL. As the old joke goes: "supercomputers turn computational problems into I/O problems."Arnulf - Wednesday, November 15, 2017 - link
Why can't they do testing on simple many-core single-chip which costs way less than this useless toy? This is a waste of money ($120 per RaspberryPi board?) and power.kaidenshi - Wednesday, November 15, 2017 - link
$120 is for each node, and each node has four Raspberry Pi boards, not one. That puts the cost per Pi at well under retail.Mo3tasm - Tuesday, November 14, 2017 - link
Not sure the rational behind using SD cards to boot them but I could imagine network boot will do much better job.. but, who knows??!!wolrah - Tuesday, November 14, 2017 - link
The Pi 0/1/2 do not support network booting, and the Pi 3's network booting has some fairly significant bugs which make it hard to use with high-end switches.https://www.raspberrypi.org/blog/pi-3-booting-part...
Amusingly it works best with the dumbest switches, but that doesn't really intersect with most of the roles where people would be trying to netboot the things.
There is an updated bootloader available, but there's no way to load it permanently to the hardware. You have to load it off of a SD card (though this does mean it also works on the older models).
Hopefully the Pi 4 or whatever replacement may come around has netbooting done right.
CityBlue - Tuesday, November 14, 2017 - link
Seems odd to build a "dense" cluster from Raspberry Pi Model B PCBs connected into a backplane. If the goal is to maximise density wouldn't it make more sense to use the Compute Module 3 plugged into a backplane that provides IO in a more optimal way (ie. built-in switch)? They could probably double the density, or halve the rack space requirement.DanNeely - Tuesday, November 14, 2017 - link
If the objective was to fill racks with them yes; but the stated purpose of this appears to be single units provided to people writing supercomputer code to test scaling during development. Enough cores to prove the workload scales approximately linearly with core count and as cheap as possible are equally important. The more they can plug off the shelf components together instead of engineering custom hardware the better for cost.If at some point in the future someone does express an interest in buying a few dozen racks worth of them then it might be worth engineering more custom parts to ramp the density up.
DanNeely - Tuesday, November 14, 2017 - link
All of that is assuming that the Compute Module was even an announced product when this began. Personally I think they missed the boat in terms of power consumption here. 250 Pis and 1000 cores should still be enough to test scaling; and at only ~1300W it could be plugged into a standard outlet in an office/lab instead of needing to go through the university bureaucracy to get it installed in an official data center.Qwertilot - Wednesday, November 15, 2017 - link
Well worth having that last bit :)edzieba - Wednesday, November 15, 2017 - link
The Compute Module has been available since 2014, and the card-edge interface is forward-compatible between versions.Alexvrb - Wednesday, November 15, 2017 - link
Hmm good point... Might be wise to bundle it with a 3-phase 4000-or-so watt diesel generator.thewacokid000 - Thursday, November 16, 2017 - link
It'll potentially be 10,000+ nodes. You're not thinking big enough for scaling. :)serendip - Wednesday, November 15, 2017 - link
It would be interesting to run a neural network on this with each core acting as a neuron. If the latency isn't terrible, it would allow for some cool parallelization work.kaidenshi - Wednesday, November 15, 2017 - link
I think latency would definitely be the major issue, but it does sound interesting!mode_13h - Friday, November 17, 2017 - link
You mean like this?http://www.artificialbrains.com/spinnaker#hardware