Radek's IT Blog: Severe latency bottleneck detected on ISL / Trunk port

How to troubleshoot error "Severe latency bottleneck detected" on ISL/trunk port?
What can cause this problem and how a root cause can be found?

I faced this problem at one of my customers.
We received this alert on trunk created from two 8Gbit ISL ports between two 5100 switches.

Here is alert message:
Time Level Message Service Number Count Message ID Switch
Mon Aug 06 2012 20:32:05 CEST Warning Severe latency bottleneck detected at slot 0 port 35. Switch 1241 1 AN-1010 XSAN01

Port in alert message is ISL port from one trunk group.

Performance of trunk:

XSAN01:admin> trunkshow -perf

1: 1-> 7 10:00:00:05:1e:36:38:62 100 deskew 15 MASTER

0-> 6 10:00:00:05:1e:36:38:62 100 deskew 24

Tx: Bandwidth 8.00Gbps, Throughput 37.44Kbps (0.00%)

Rx: Bandwidth 8.00Gbps, Throughput 51.94Kbps (0.00%)

Tx+Rx: Bandwidth 16.00Gbps, Throughput 89.38Kbps (0.00%)

2: 5-> 71 10:00:00:05:1e:36:38:62 100 deskew 16 MASTER

4-> 70 10:00:00:05:1e:36:38:62 100 deskew 15

Tx: Bandwidth 8.00Gbps, Throughput 33.12Kbps (0.00%)

Rx: Bandwidth 8.00Gbps, Throughput 58.08Kbps (0.00%)

Tx+Rx: Bandwidth 16.00Gbps, Throughput 91.20Kbps (0.00%)

3: 35-> 35 10:00:00:05:33:ce:61:f5 203 deskew 15 MASTER => trunk with alerts

39-> 39 10:00:00:05:33:ce:61:f5 203 deskew 16

Tx: Bandwidth 16.00Gbps, Throughput 442.46Kbps (0.00%)

Rx: Bandwidth 16.00Gbps, Throughput 433.73Kbps (0.00%)

Tx+Rx: Bandwidth 32.00Gbps, Throughput 876.19Kbps (0.00%)

Port errors on ISL ports:

porterrshow

frames enc crc crc too too bad enc disc link loss loss frjt fbsy

tx rx in err g_eof shrt long eof out c3 fail sync sig

=========================================================================================================

35: 374.0m 115.7m 0 0 0 0 0 0 0 70 0 1 2 0 0

39: 3.1g 3.8g 0 2 0 0 0 0 0 204 0 1 2 0 0

Regarding to Brocade docs looks like buffer credit problem:

Data Center Fabric Resiliency Best Practices:
Bottleneck Detection can detect ports that are blocked due to lost credits and generate special “stuck VC” and “lost
credit” alerts for the E_Port with the lost credits (available in FOS 6.3.1b and later).
Example of a “stuck VC” alert on an E_Port:
2010/03/16-03:40:48, [AN-1010], 21761, FID 128, WARNING, sw0, Severe latency bottleneck detected at slot 0 port 38.

Data Center Bottleneck Detection Best Practices Guide:
"timestamp", [AN-1010], "sequence-number",, WARNING, "system-name", Severe latency bottleneck detected at Slot "slot number" port "port number within slot number".
This message identifies the date and time of a credit loss on a link.The platform and port affected and the number of seconds that triggered the threshold.

But what can cause buffer credit loss?
There could be a slow drain device causing the issue.

Root cause of this problem in my case has been one erroneous port on second switch with id 203.

XSAN01:admin> porterrshow

frames enc crc crc too too bad enc disc link loss loss frjt fbsy

tx rx in err g_eof shrt long eof out c3 fail sync sig

=========================================================================================================

28: 0 0 0 0 0 0 0 0 13.9k 0 0 0 0 0 0

SFP has been identified as a failing item in fabric. After its replacement problem has gone.

Source:
Severe latency bottleneck detected on ISL / Trunk port
HP Storageworks B-series SAN Switches - How to Interpret the Brocade porterrshow Output
HP StorageWorks B-Series Switches - Identifying if SFP or the Cable is the Cause for Loss of Link

Radek's IT Blog

Tuesday, August 28, 2012

Severe latency bottleneck detected on ISL / Trunk port

No comments:

Post a Comment

Total Pageviews

Search

About me

Links

Archive

Followers