Network Troubleshooting – Port Mapping!!
Managing a datacenter can lead to some interesting opportunities for troubleshooting. In a previous role, I was responsible for the performance and stability of a corporate datacenter. While most of my time was spent babysitting the systems and running reports, there were those rare occasions that seemed straight out of the twilight zone.
One such occasion involved a performance issue with a physical server in the data center. We were half way through a hardware refresh cycle and had not seen previous issues with performance. However, we had recently installed a new server running a new application. Almost immediately we noticed the application preforming sluggish and non responsive. Time to break out the troubleshooting toolkit and get to business, right?
After verifying it was a widespread issue and not isolated to a few clients, we started with the obvious and checked the speed and duplex settings on the server and switch. Both were 1000/full, ruling out a mismatch. Then, we checked the switch interface expecting to find the problem, but we didn’t see any errors or loss of traffic.
The next obvious step was to break out Wireshark and do some sniffing. We tested basic file transfers and noticed the same sluggish performance as with the application. While looking at the captures, we noticed a large amount of retransmitted traffic. This was a bit puzzling since we didn’t see any issues with the switch port interface. Time to take a closer look at the switch architecture…
When deciding on a network switch, the decision is often made strictly based on the interface speed, number of ports, and feature set. Performance is assumed based off of the marketed fabric bandwidth. However, this can be misleading.
The switch fabric bandwidth may indicate 160Gbps maximum throughput and the switch may have 24x1Gbps ports. You would think that you should get at least 160Gbps through the combined switch ports. However, this may not be the case.
Keep this in mind. This is an extremely important piece of information when connecting devices to network switches. The switch ports utilize ASIC chips to preform certain tasks.
These chips are not a one to one ratio with the number of ports, meaning that it is highly possible to have one ASIC chip to four or more switch ports, while creating a situation where two devices with heavy traffic loads could overrun the ASIC chip and drop traffic.
Without considering the load of each device to be connected, you can easily over run the capabilities of the ASIC chip. It is very common to start connecting devices to a switch starting with port number one and proceed to connect devices in order from there. This is trap I fell into. When connecting the new server, I selected the first open switch port without considering the ASIC to port ratio.
There are ways of finding the ASIC to port mapping on a switch. The mapping processes will very depending on manufacturer. I strongly recommend fully understanding the architecture of a switch and the traffic load of network devices during the design phase of your solution. It is not as simple as number of ports needed and interface speeds of connected devices.