10 Gbit Ethernet, the super I/O pipe for virtual servers? (WMWorld 2008)
by Johan De Gelas on March 1, 2008 12:00 AM EST- Posted in
- Virtualization
10 Gbit Ethernet just got even more attractive. You might remember from our last storage article that we have high hopes for iSCSI as a high performance but a very cost effective shared storage solution for SME's. Our hopes were based on getting 10 Gbit (10 GBase-T) on UTP Cat 6 (or even CAT5e) but unfortunately the only switch that I could find (thanks Renée!) that supports 10 Gbit this way was the SMC TigerSwitch 10G. With pricing at $1000 per port, not really a budget friendly offering.
Still, 10 Gbit Ethernet is an incredibly interesting solution for a virtualized server or an iSCSI storage array that is serving data to a lot of (virtualized or not) servers.
So maybe it is best to give optic cabling another look. Some of the 10 Gbit Ethernet NICs are getting quite cheap these days, but an enthousiastic Ravi Chalaka, Vice President of Neterion told us that it might be wise to invest a bit more in NICs with IOV (I/O virtualization) support. According to Neterion, the newest Neterion X3100 Series is the first adapter to support the new industry-standard, SR-IOV 1.0 (Single-Root /O Virtualization.) SR-IOV is a PCI-SIG workgroup extension to PCIe. One of the features of such a NIC is that is has multiple channels that can accept multiple requests of virtualized servers, which significantly reduces the latency and overhead of multiple servers sharing the same network I/O. Even more important is that the Neterion X3100 is natively supported in VMWare ESX 3.5.
We will test the Neterion X3100 in the coming months. It seems like a very promising product as Neterion claims :
- 7 times more bandwidth
- 50% less latency
- 40 % less TCP overhead
Than a comparable 1 Gbit solution. So while many of us are probably quite pleased with the bandwidth of 2 GBit (2x 1 Gbit MPIO), especially 50% lower latency sounds great for iSCSI. Fibre Channel, which is moving towards 8 GBit, might just have lost another advantage...
14 Comments
View All Comments
FreshPrince - Monday, March 3, 2008 - link
12X InfiniBand QDR is supposed to give us 96Gb bandwidth!holy cow batman!
mvrx - Wednesday, March 5, 2008 - link
From what I understood, everyone's hopes or goals was for 100GbE to replace infiniband. Even in its 12x implementation, 100GbE will be cheaper and adopted as a universal communication medium while infiniband will stay niche.When the nerds are building a 1000 machine cluster using OTS computers, will they want to add an infiniband infrastructure, or grab low cost 10/40/100GbE pieces and parts?
FreshPrince - Monday, March 3, 2008 - link
EMC just came out with 4Gb FC drives last year and you would need to build your san with 210, 146GB 15K 4Gb FC drives (all running in RAID0) in order to saturate that 4Gb pipe....that's what their engineers told me. 1st of all, who in their right mind would buy a SAN and then carve out a raid0 lun? LOL.....the whole reason for purchasing a SAN is for the redundancy....So we're looking at a cost of around $1M, yup, I don't think most people have that kind of $ to play with...most IT shops will likely have $100K or less for their SAN, which will get you no where near filling that pipe.
We have 105 of those drives in our SAN now and we're fully virtual. I can tell you it's smoking. IO isn't where VMware lives anyways...it's purely CPU and RAM.
now, we're talking about iSCSI, so the interface changes from FC to iSCSI, but you still need just as many spindles to fill that bandwidth...now we're talking about 210*2.5=525 drives to fill that 10Gb pipe....ya....I don't think so. Unless money grows on trees and everyone can afford super duper SANs, this is not practical. Again, I want to stress that vmware is more cpu/mem intensive than it is I/O intensive.
this is just my perspective, I could be wrong...
somedude1234 - Monday, March 3, 2008 - link
A single 15k drive can do large sequential reads in the neighborhood of 120~130 MBps. A single 4 Gbps FC link operates at 4.125 Gbps (with a small b). You also have to factor in protocol overhead, so realistically 400 MBps out of that link would be outstanding.In a straight JBOD or SBOD enclosure (perhaps with raid-0 on top), you could saturate that link with only 4 drives.
A Hitachi 7K1000 SATA drive can do 80 ~ 90 MBps in large sequential reads. With a proper FC-SATA enclosure, you can saturate your 4 Gbps FC link with only 5 or 6 drives.
mvrx - Wednesday, March 5, 2008 - link
For those of you stating that current FC and high speed networks will easily handle data rates from even the fastest RAID ararys you really should keep in mind that starting this year, the next generation flash technologies will begin to easily saturate these networks.This is why the Fusion IO drive, which can sustain around 600MB/sec, is on a PCIe card instead of a SATA interface which is limited to about 300MB/sec after overhead.
As you may be able to summize, 1GB/sec data storage is right around the corner. Take a jump to a couple years from now and imagine a stand alone *consumer* iSCSI box with 16 DIMM sockets and a 100GbE interface. It's loaded with, say, 6 SSD disks that are capable of a combined throughput of 3GB/sec. The 16 DIMM sockets each have a 8GB memory module which the box uses to intelligently cache read operations. So, lets say this box averages 4GB/sec read throughput, and maxes out at 20GB/sec. All of a sudden even the following upcoming standards begin to reach their limits:
Fibre Channel 8GFC (8.50 GHz) 6800 Mbit/s or 850 MB/s
iSCSI over 10G Ethernet 10000 Mbit/s or 1250 MB/s
iSCSI over 100G Ethernet 100000 Mbit/s or 12500 MB/s
NFS over Infiniband 120000 Mbit/s or 15000 MB/s
I don't see a need for either FC or Infiniband to continue development when the industry could all join together and make 100GbE a reality for even small business, then soon after, for the home enthusiast.
JohanAnandtech - Monday, March 3, 2008 - link
I agree that for many random OLTPisch workloads you need a lot of disks before you can even think of getting a 4 gbit pipe full.However, not every workload is purely random. Think about storage vmotion for example. Or even database warehouse where you are requesting with relatively large recordsets. An image databank or a bit more exotic: a video streaming server. Those kind of apps will be bottlenecked by 4 Gbit quite quickly. You need about 5-6 disks in RAID-5 to get more than 400 MB/s
Secondly, VMWare is not about CPU and Memory only. If you run a few of your database warehouse/OLTP apps on top of ESX, it sure will become I/O limited.
FreshPrince - Monday, March 3, 2008 - link
hopefully engineers will carve out a 2nd dedicated storage LUN for just their databases and run that outside of vmware...that's what vmware and M$ recommends anyways.virtual machines and data should be separated to maximize performance.
yes, I do agree that some of those applications you pointed out may attempt to fill the 10Gb pipe, but once again, should they be virtual? I truely believe that's more of a business decision than a technical one.
mvrx - Sunday, March 2, 2008 - link
I find it interesting that Intel's Eaglelake chipset due out mid to late 2008 will feature 10GbE over copper (maybe optical?). I hope that means high performance 10GbE switches will be right around the corner as well. As it stands now iSCSI just isn't worth it over 1GbE. At best I think you get something in the range of 105-115MB/sec maximum performance. With SSD devices such as the FusionIO coming (600MB/sec), higher throughput is required.It is worth noting the Intel has been pushing 40GbE as the standard to jump to (supporting 10GbE, but somewhat passing it by). According to a presentation I read from Intel, even 4 bonded 10GbE channels don't make the cut for HPC tech they are talking about. I'd honestly rather see the industry come together and work up a nice shiny 32nm 100GbE chipset that could replace PCIe. As I've blabbed on before, I'd love to see a switched optical interconnect standard take over for peripherals, clustering, and CPU/Memory/GPU scaling. I just don't think it is practical to require a 16x 2.0 bus along with a custom SLI chip to stack CPU resources. I'd rather see GPU cards with a 40GbE or 100GbE optical interface that would allow a somewhat linear and limitless ability to add processing to a computer, or computer cluster.
IBM seems to be making progress in optical technology by leaps and bounds. There is talk of optically interconnected POWER7, 32nm Cell B.E. chips, PlayStation 4's (PS4's interconnected via 10GbE or 40GbE ethernet in computing clusters would be very exciting). With IBM's chipset development expertise, I'm hoping we see something special for a enthusiast workstation / server interconnect where a system would have optically interconnected daughter cards for memory, cpu(s), PCIe backplanes, etc. http://www.dailytech.com/article.aspx?newsid=10915">http://www.dailytech.com/article.aspx?newsid=10915 and http://www-03.ibm.com/press/us/en/pressrelease/227...
I really want to be able to build a VMWare Hypervisor controlled cluster that, to the guest operating systems, can look like a single high performance multiprocessor machine. No CPU, memory, GPU, or storage limits.
Sub 10 microsecond latency is needed for IPC in clusters, massive bandwidth for effective storage over networks (3-10GB/sec will be desirable for some enterprise applications), TOE and iSCSI on board processing is needed as a CPU has no hope of dealing with anything close to 10GbE. It is realistic to see that a future IO chip that could handle SATA3, SAS, iSCSI, 10/40/100GbE, USB3.0, RAID 6 processing, PCIe expansion bridging, secondary memory resource pool (SMRP) interconnection and host/target control, ATA over Ethernet, H.264/RDP hybrid encode/decode, and maybe even PC-Over-IP (see teradici.com) functionality - would not only need to be designed as a 32nm-45nm high performance chipset, but would likely have such a degree of complexity that it would run its own embedded OS with heaviy virtualization and cluster aware technologies built in. As crazy as that sounds, yes, I'd gladly pay $150 more for my motherboard if it had all that built in.
I also have hopes that HDMI over ethernet devices come around. 10Gbit/sec will easily handle even 1080p/60 content with any of the new audio standards. I believe AV data should be routable in a home network, and not be limited to HDMI cabling. I want a real time HDMI to 10GbE system. Compressed or not.
Anyway, as the author wrote, they have high hopes for iSCSI as a cost effective standard over 10GbE. I have high hopes for 10/40/100GbE to solve my top 10 list of limitations keeping enthusiast computing from taking the leap it really should.
(reference: Inter-process comunication | IPC, TCP/IP offload Engine | TOE, iSCSI offload engine, Teradici.com is a company taking thin client computer a route it should have taken 5 years ago.)
Viditor - Monday, March 3, 2008 - link
I agree that this is a truly exciting area right now...Some random recent headlines include:
"IBM has unveiled a new prototype chip that can transmit data at up to 8 TB/sec, or about 5,000 high-def video streams. While this might not be entirely amazing, the fact that they did it using the same amount of juice required to light a 100-watt lightbulb, is."
(IBM hasn't officially announced this yet though...)
Or this one:
"A single-chip, 60 GHz, CMOS wireless transceiver is the latest invention from researchers at the National ICT Australia laboratory.
Project leader Professor Stan Skafidas said the chip can transfer data at up to 5 Gbps – “10 times the current maximum wireless transfer rate, at one-tenth the cost,” his team wrote in a press release. The team includes 10 Ph.D. students from the University of Melbourne with support from Agielent, Anritsu, Ansoft, Cadence, IBM, Synopsys and SUSS MicroTec.
The chip resulted from a 3-year experiment using IBM’s 130 nm RF CMOS process and could be commercialized in about three years, possibly through a new startup company, a representative of the Melbourne group said.
“The availability of 7 GHz of spectrum results in very high data rates, up to 5 Gbps to users within an indoor environment, usually within a range of 10 meters,” Skafidas noted."
Olaf van der Spek - Monday, March 3, 2008 - link
Why? Is the latency significant?
How does a IOV NIC work?
Is it basically multiple NICs sharing one physical ethernet port?