Some explanation of the graphs below

On Wed evening about 9:30 PM (2130), one of the core switches crashed. This switch, 6509b, connects all the buildings on campus to the network Thursday night we swapped the "supervisory module" ("Sup 720 blade) with our backup Cisco 6509 switch to get the network functional again.

In the graphs below, the flatline between Wed evening and Thur afternoon represents the time when that backup Sup 720 blade was in place. We were not collecting data when it was in place.

It turned out that the crash was related to a firmware bug that was triggered under rare circumstances.

It took most of Thursday with Cisco tech support to determine that. Cisco provided us with updated firmware. Once installed, we switched the Sup720 module back into 6509b late Thur afternoon. You can see the data collection begin again.

(More detailed data are further below.)

Thursday to Friday broadcast problem

The graphs are from a particular building which has very little traffic. Most of what you see in the blue line is broadcast traffic.

In the Monthly graph further down, the overall the broadcasts are seen to have been about 340 kb/s for quite some time (except for Thanksgiving week).

As can be seen in the graph above, after we were back up and running, the broadcasts were over 1000 kb/s. Presumably on Thur when we were not collecting data they were equally high.

This high level of broadcasts severely interfered with network performance. It is not clear how the broadcast levels related to the firmware bug in the switch and its crash.

The source of the broadcasts were traced to a particular switch in Mead on Friday morning. There is some indication that the broadcasts were being replicated, multiplying the effects of normal broadcast traffic. Once that switch was rebooted, the problem vanished, as can be seen by the dramatic decrease in traffic early Friday.

What are broadcasts

Broadcasts are a normal network function. Most network traffic goes directly between two locations on the network, the source and the desitination. A broadcast, on the other hand, goes from the source to all locations on the local area network.

Broadcasts are used, for example, when a computer is looking to send traffic to an IP number it hasn't recently contacted. It first sends out a broadcast asking where it should send traffic for the particular IP. After receiving a response telling it where the destination is, traffic goes directly to that destination address.

Another common use is when the computer is requesting its own IP. Since it doesn't have an IP, it sends a broadcast on the network asking a DHCP server to provide an IP based on its ethernet address.

High broadcast traffice can interfere with the network operations. Next summer we are planning on segmenting the network to reduce the overall number of computers within a segment (VLAN) which will reduce the broadcast traffic within each segment. Such a network topology would have reduced the effects of the problems we've just experienced.

The data

Here are the data for the days surrounding the event. To see the live data of the core campus switch, go here.

Traffic Analysis for 72 -- 6509b.mtholyoke.edu

System: 6509b.mtholyoke.edu in chp
Maintainer: network@mtholyoke.edu
Description:GigabitEthernet7/6
ifType: ethernetCsmacd (6)
ifName: Gi7/6
Max Speed: 125.0 MBytes/s


The statistics were last updated Saturday, 2 December 2006 at 7:40,
at which time '6509b.mtholyoke.edu' had been up for 1 day, 15:09:43.
`Daily' Graph (5 Minute Average)
day
Max  In: 0.0 b/s (0.0%) Average  In: 0.0 b/s (0.0%) Current  In: 0.0 b/s (0.0%)
Max  Out: 1722.7 kb/s (0.2%) Average  Out: 293.8 kb/s (0.0%) Current  Out: 54.0 kb/s (0.0%)

`Weekly' Graph (30 Minute Average)
week
Max  In: 0.0 b/s (0.0%) Average  In: 0.0 b/s (0.0%) Current  In: 0.0 b/s (0.0%)
Max  Out: 1321.2 kb/s (0.1%) Average  Out: 212.9 kb/s (0.0%) 54.0 kb/s (0.0%)

`Weekly' Graph (30 Minute Average)
week
Max  In: 0.0 b/s (0.0%) Average  In: 0.0 b/s (0.0%) Current  In: 0.0 b/s (0.0%)
Max  Out: 1321.2 kb/s (0.1%) Average  Out: 212.9 kb/s (0.0%) Current  Out: 52.5 kb/s (0.0%)

`Monthly' Graph (2 Hour Average)
month
Max  In: 0.0 b/s (0.0%) Average  In: 0.0 b/s (0.0%) Current  In: 0.0 b/s (0.0%)
Max  Out: 1241.1 kb/s (0.1%) Average  Out: 235.0 kb/s (0.0%) Current  Out: 57.0 kb/s (0.0%)

`Yearly' Graph (1 Day Average)
year
Max  In: 0.0 b/s (0.0%) Average  In: 0.0 b/s (0.0%) Current  In: 0.0 b/s (0.0%)
Max  Out: 357.2 kb/s (0.0%) Average  Out: 228.7 kb/s (0.0%) Current  Out: 269.8 kb/s (0.0%)