Gigabit Over Copper Evaluation

DRAFT

Prepared by Anthony Betz and Paul Gray

April 2, 2002

(Updated May 27, 2002)

University of Northern Iowa

Department of Computer Science

Cedar Falls, IA 50614



Important! Thanks to all for the feedback with similar results or curious findings!


As was expected, there have been many updates that evolve the results contained in this document:


Referencing this document: The benchmarks illustrated within this web document are excerpts from the University of Northern Iowa technical report TR040202-CS.


Given the relatively low cost, backwards-compatibility, and widely-availability solutions for gigabit over copper network interfaces, the migration to commodity gigabit networks has begun. Copper-based gigabit solutions are now providing an alternative to the often more expensive fiber-based network solutions that are typically integrated in high performance environments such as today's tightly-coupled cluster systems.

But how do these cards compare with their fiber-based counterparts? Are the Linux-based drivers ready for prime-time? The intent of this paper is to provide an extensive comparison of the various Gigabit over copper network interface cards available. Since performance is based on numerous factors such as bus architecture and the network protocol being used, these are the two main subjects of our investigation.

Our bandwidth benchmarks look at sustained throughput using TCP. While other communication protocols are available, indeed preferred, for high-performance computing, TCP-based benchmarks provide an immediate insight into the expected performance of the cards. With PCI-X coming into the marketplace in more and more motherboards as well as the multitude of systems with more traditional 32-bit PCI subsystems, numerous cards are available for today's 64bit and 32bit computer systems. The 64bit cards tested were as follows: Syskonnect SK9821, Syskonnect SK9D21, Asante Giganix, Ark Soho-GA2000T, 3Com 3c996BT and Intel's E1000 XT. The 32bit cards were Ark Soho-GA2500T and D-Link DGE500T. Comparisons for the various cards were made with respect to operation in alternate bus configurations and varied maximum transmission unit (MTU) sizes of TCP frames (jumbo frames). Results were gathered using Netpipe 2.4. By using Netpipe the peak sustained throughput would be provided as well as the transfer rate for varying packet sizes.

Note: All cards were tested at 1500, 3000, 4000, and 6000 values for the TCP MTU size. The drivers for the cards were not modified. Cards based upon the dp83820 chipset were limited to 6000MTU due to driver defaults. All other cards were tested through 9000MTU.

Cards Tested and Document Links:


Our Testing Environments:


Our testing environment consisted of two testbeds. The first testbed consisted of two server-class Athlon systems with a 266MHz FSB. The second testbed consisted of typical desktop/workstation Pentium-based systems.





D-Link DGE-500T

D-Link DGE-500T was the first of the gigabit cards tested. This card is based on National Semiconductor's dp83820 chipset and is designed for a 32bit bus. The chipset in this card turned out performance nearly identical to the two Ark cards and the GigaNIX cards tested in our test suite, since all utilize the dp83820 chipset from National Semiconductor. The Linux driver used was the ns83820 as included in the 2.4.17 kernel. Latency on both platforms was .0002 seconds.

Peak throughput while operated in a 32bit bus was 192.21 Mbps. This was achieved in the Dell systems. The Athlon systems only obtained a peak of 172.21 Mbps when these cards were inserted into the 32-bit bus. Both systems show a slight drop in throughput but eventually level out. Peak throughput while operated in a 64bit bus running at 33Mhz was 315.96 Mbps.

When the bus was jumpered to autoselect 66/33Mhz, the performance increase was negligible. Peak throughput was 316.40 Mbps. Comparing the plots of the 66Mhz and 33Mhz run reveals that they are essentially identical.

For complete testing results, click here.

Price: $45

The cost per Mbps is as follows:

32bit 33Mhz: $45 /((192.21+172.21) / 2) = $.25>

64bit 33Mhz: $45 / 315.96 = $.14

64bit 66Mhz: $45 / 316.40 = $.14

Frame1

Frame2

Cards Tested and Document Links:


Ark Soho-GA2500T

The Ark Soho-GA2500T is also a 32-bit PCI card design. Like the D-Link DGE-500T and the Asante GigaNIX cards, this card is based on the National Semiconductor dp83820 chipset. With that in mind the performance was estimated to be close to the D-Link DGE500T. The driver used was the generic ns83820 included the 2.4.17 kernel. The latency for both test systems was .0002 seconds.


The peak throughput achieved while in a 32bit 33Mhz bus was in the Dell system: 192.62 Mbps. While the Athlon system in the same bus setup only reached 172.19 Mbps. As before, there is a performance drop at the 1Kb and 5-10Kb packet sizes.

Peak throughput while operated in a 64bit bus running at 33Mhz was 610.83 Mbps and 609.98 Mbps when running at 66Mhz respectively. As with the Soho-GA2000T, there is no noticeable difference between a 33Mhz and a 66Mhz bus.

For complete testing results, click here.

Price: $44

The cost per Mbps is as follows:

32bit 33Mhz: $44 / ((192.62+172.19) / 2) = $.24

64bit 33Mhz: $44 / 610.83 = $.07

64bit 66Mhz: $44 / 609.98 = $.07

Frame3

Frame4

Cards Tested and Document Links:



NetGear GA302T

Although a latecomer to our benchmarks, the NetGear GA302T shows that a high performance gigabit cards can come in small packages. At the standard 1500 MTU, the GA302T outperformed all other cards in our test - both 32-bit and 64-bit varieties.


The NetGear GA302T is a 32-bit card based on the Altima (Broadcom) chipset. Although Linux is not officially on the list of supported operating systems, NetGear provided a preliminary version of their Linux-based drivers for our tests. The drivers proved to be very stable and delivered solid throughput.


Neither the Windows nor the Linux drivers (beta) supported Jumbo frames, which limits our testing to 1500 MTU bandwidth only. Nonetheless, peak throughput for 64-bit 33MHz slot was 645.4Mbps and was 880Mbps for a 64-bit, 66MHz slot.


Price: $99.99


The cost per Mbps is as follows:


32bit 33Mhz: (Sorry. This environment was not available at the time of our testing.)

64bit 33Mhz: $99.99 / 645.40 = $0.15

64bit 66Mhz: $99.99 / 880.00 = $0.11



Illustration 1Throughput results for the NetGear GA302T. (Longer bars are better.)




Cards Tested and Document Links:



Ark Soho-GA2000T

Our transition into cards designed for a 64-bit PCI bus began with the Ark Soho-GA2000T. Like it's 32-bit counterpart, this card was designed around the ns83820 chipset, which will allow us to examine the performance benefits, if any, in moving from a 32-bit As

Designed to run in a 64bit 66Mhz slot, this card is backwards compatible to 32bit and 33Mhz slots. This card is based off of National Semiconductor's dp83820 chipset so performance was expected to be similar to the DGE500T and the Soho-GA2500T. The driver used was the generic ns83820 included in the 2.4.17 kernel. Latency was .0002 seconds on both test platforms.

Peak throughput for a 32bit 33Mhz slot was 189.93 Mbps in the Dell system. The Athlons were only able to reach 172.26 Mbps.

Peak throughput for 64bit 33Mhz was 665.06 Mbps with an MTU of 6000. Peak throughput while running at 66Mhz was 640.60 Mbps. With the exception of the 6000MTU tests, there is no noticeable difference between bus speeds of 33 and 66Mhz.

For complete testing results, click here.

Price: $69

The cost per Mbps is as follows:

32bit 33Mhz: $69 / ((172.26+189.93)/2) = $.38

64bit 33Mhz: $69 / 665.06 = $.10

64bit 66Mhz: $69 / 640.60 = $.11

Frame5

Frame6

Cards Tested and Document Links:

Asante GigaNIX

The second 64bit card tested was Asante's Giganix. This card is designed for a 64bit bus but, is backwards compatible to 32bit and 33Mhz configurations. Giganix is based off of the dp83821 chipset. The driver supplied by Asante was unable to compile due a bug in the code. In order to get the card to work the generic ns83820 driver was used again. Performance was expected to be similar to the GA2000T. Latency was .0002 seconds on both systems.

Peak throughput for a 32bit 33Mhz configuration was 238.75 Mbps in the Dell systems, with a peak of 172.19 in the Athlons. When comparing to the GA2000T, the Athlon results stay about the same whereas the Dell systems increase by 50Mbps.

Peak throughput for 64bit 33Mhz 641.02 Mbps with an MTU of 6000. When running at 66Mhz, the peak is 651.51 Mbps with the MTU at 6000.

An interesting spike in throughput on the 64bit 66Mhz tests was when the MTU was set to 3000. Aside from the 40Mbps difference between the two bus speeds, the plots look very similar. The main difference is the spike at 8KB packets.

For complete testing results, click here.

The cost per Mbps is as follows:

32bit 33Mhz: $138 / ((238.75+172.19) / 2) = $.67

64bit 33Mhz: $138 / 641.02 = $.22

64bit 66Mhz: $138 / 651.51 = $.21

Frame7

Frame8

Cards Tested and Document Links:

Syskonnect SK9821:

The first of the Syskonnect cards tested was the SK9821. This card is designed for a 64bit bus. The SK9821's are backwards compatible to 32bit and 33Mhz configurations. The driver used was sk98lin from the kernel source. Latency was .000048 on the Dells and .000025 seconds on the Athlons. Of all the 64bit cards tested, the SK9821 is the first to have a noticeable difference in performance between the two bus speeds.

Of all cards tested, the Syskonnect SK9821 gave the most consistent throughput over all packet sizes, and was far-and-away the overall performance leader.

In the server-class testing environment, peak throughput in our 64 bit 33Mhz setup was 782.27Mbps with the MTU set to 9000. The peak for 66Mhz tops off at roughly 940Mbps with jumbo frame MTU sizes of 6000 and 9000.

Peak throughput on 32bit 33Mhz was 365.27 Mbps on the Dells. After the peak, is reached there is a noticeable drop in throughput as it levels off to the 330Mbps range.

For complete testing results, click here.

Price: $570

The cost per Mbps is as follows:

32bit 33Mhz: $570 / ((365.27+163.97) / 2) = $2.15

64bit 33Mhz: $570 / 782.27 = $.73

64bit 66Mhz: $570 / 938.97 = $.61

Frame9

Frame10

Cards Tested and Document Links:

Syskonnect SK9D21:

The second card tested from Syskonnect was the SK9D21. The SK9D21 is aimed at the desktop/workstation market. While support for this card under Windows environments appears to be solid, there were too many technical issues. The testing environment's mix of kernel, motherboard, Athlon chipset, and Syskonnect drivers made for too many components to successfully debug the problems with this card thoroughly. This card is designed for a 64bit bus the card is backwards compatible with 32bit and 33Mhz configurations. While an exhaustive analysis of the cards was unavailable, it should be noted that the latency was successfully determined at .000123 seconds.

Our difficulties with this card were limited to the 64-bit bus. Our tests were successful in analyzing the performance in both the QLI Technologies Athlon-based systems and the Pentium-based systems in 32-bit busses.

When drivers issues for this card are resolved, performance evaluations in this section will be amended.

Peak throughput in the Dell system was 377.53 Mbps. As with the SK9821, there is a drop off after the peak is reached.

For complete testing results, click here.

Price: $228

The cost per Mbps is as follows:

32bit 33Mhz: $228 / 377.53 = $.60

Cards Tested and Document Links:

3Com 3c996BT:

The next card in the test suite was the 3Com's 3c996BT. This card is designed as a 64bit 133Mhz card, but is backwards compatible to 32 bit, 33 and 66Mhz configurations. The driver used was the bcm5700, version 2.0.28, as supplied by 3Com. Latency was .000103 in the Dells and .000078 in the Athlons.

The peak throughput achieved in this card while in a 32bit 33Mhz slot was 436.23 Mbps in the Dell systems. In the Athlon system, the same bus configuration only reached 184.02 Mbps.

Peak throughput while running in a 64bit 33Mhz slot was 884.09 Mbps this was with an MTU of 4000. While running at 66Mhz, the peak was only 546.16 Mbps with an MTU of 6000. These plots are all relatively smooth when compared to the other plots for this card.

Performance in a 66Mhz slot is actually lower for all MTU sizes as compared to a 33Mhz slot.

For complete testing results, click here.

Price: $138

The cost per Mbps is as follows:

32bit 33Mhz: $138 / ((436.23+184.02) / 2) = $.44

64bit 33Mhz: $138 / (884.09) = $.16

64bit 66Mhz: $138 / (546.16) = $.25

Frame11

Frame12

Cards Tested and Document Links:

Intel Pro 1000/XT:

The final 64bit card tested was Intel's E1000 XT. As with the 3c996BT this card is designed for future PCI-X bus speeds running at 133Mhz. It is compatible with a variety of configurations running at 33 and 66Mhz as well as 32bit. The card uses Intel's e1000 module, version 4.1.7. Latency in the Athlon systems was .000091 seconds. Due to time constraints, we have yet to test this card in the Dell testbed.

Peak throughput achieved was 743.14 Mbps while running in a 64-bit 66Mhz slot with the MTU set to 9000. Performance in a 32-bit configuration turned out the lowest throughput for all cards tested coupled with the most erratic throughput. During the throughput tests, the card would drop 100% of packets for extended lengths of time. Initial testing in the 64-bit setup showed performance similar to the Giganix card with regards to a 64-bit bus. Once the MTU was set to 9000 performance became very erratic, stagnated several times, then stabilized once the packet size reached an upper threshold peak. Note that the drop in performance was not associated with the (expected) phenomena of packet reassembly when the TCP packet size exceeds the MTU.

As testing continued the the 66Mhz phase things only got worse. Once the MTU exceeded 3000, performance was no longer predictable. During the 4000 MTU tests, the throughput plummeted to around .4 Mbps for several TCP packet sizes. At an MTU of 6000 and at 9000 the same problem occurred as before in the 64-bit 33Mhz test.

For visual clarity of this phenomena, see the ''Complete Test Results'' link for the Intel Pro 1000/XT below.

For complete testing results, click here.

NOTE: The results shown within this document reflect out-of-the-box driver performance with no modifications to the drivers or parameter tuning. Working with Intel Linux support, we were able to overcome the stagnation that was observered in our testing. Utilizing the 4.2.8 driver, a 2.5.x series kernel compiled with gcc-3.0.4, along with adjusting the Rx and Tx delay parameters increased both performance and reliabitliy for these cards. An example of the increased performance for the 9000 MTU setting can be found here.

Results for modified drivers, tweaking the driver options, cpu utilization and comparisions with Windows 2K performance for all cards in our test are currently being compiled, and will be available as a technical report as soon as our tests are complete.

Price: $169

The cost per Mbps is as follows:

32bit 33Mhz: $169 / 142.02 = $1.18

64bit 33Mhz: $169 / 624.41 = $.27

64bit 66Mhz: $169 / 743.14 = $.22

Frame13

Frame14

Cards Tested and Document Links:

Comparisons and Observations:

In this section, we compare performance differences between cards in like environments , provide some general performance observations, and examine the cost per megabit as determined by the operating environment.

Head-to-head throughput results:

While the results obtained in this study clearly show that peak performance is not a complete indicator of overall performance across the packet size spectrum, in this section we examine the peak performance results amongst all cards under common environments.



General Observations:

Of the eight cards tested, the clear performance champion was the SK9821 with regard to throughput and consistency. The 3Com 3c996BT has a modest price tag and respectable performance for the entry-level server configuration. If price per megabit is the main concern, the Ark Soho-GA-2500T has the lowest cost per Mbps, making it a viable solution for entry-level systems requiring higher throughput than fast ethernet.

The D-Link DGE500T and the Soho-GA2500T show nearly identical peaks, which is to be expected since the drivers and the chipsets were the same.

The 3Com 3C996BT results at 64-bit 33MHz were surprising inasmuch as these cards showed better performance at 33MHz bus than at the higher 66MHz bus.

Of all of the cards tested, the Intel E1000 TX proved to be comparable to the Asante GigaNIX card in peak performance, but the erratic overall performance proved too much to overcome.

In referring to the ''Complete Test Results'' sections for the 3C996BT and the SK9821 cards, one sees a very consistent and ''smooth'' transition to the peak throughput of the cards over the complete range of packet sizes.

Some general comparisons that can be derived from the above results include the notion of ''cost per peak megabit.'' Depending upon the environment that the network device is to be installed, the cost per peak megabit varies greatly. For example, if one would wish to upgrade their P-III-based desktop system with a 32-bit, 33MHz PCI, the GA25000T is the clear cost-effective solution, but would not be able to provide throughput at the level of the 3Com 3C996BT.

In an HPC environment, where sustained throughput is critical and the switch is capable of Jumbo frames, the SK9821 would be the best performer. In light of gigabit switching hardware that lacks Jumbo Frame support, a comparison of the 1500MTU results shows the SK9821 is still a viable choice, as is the 3Com 3C996BT which provides a more cost-effective solution..

Cards Tested and Document Links:

Paul Gray                                         -o)
323 Wright Hall                                   /\\
University of Northern Iowa                      _\_V
Message void if penguin violated ...  Don't mess with the penguin