X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C869F9.2770907A@onstor-exch02.onstor.net>; Thu, 7 Feb 2008 19:20:20 -0700
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Content-class: urn:content-classes:message
Subject: RE: Possible link problem in the lab
Date: Thu, 7 Feb 2008 19:20:20 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E0346AF5E@onstor-exch02.onstor.net>
In-Reply-To: <20080207180504.3557042a@ripper.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: Possible link problem in the lab
Thread-Index: Achp9wUXQrnyx4P4Tcml4TLmishOzgAAdYvA
From: "John Rogers" <john.rogers@onstor.com>
To: "Andy Sharp" <andy.sharp@onstor.com>
Cc: "Brian Baker" <brian.baker@onstor.com>,
	"John VanderWerf" <john.vanderwerf@onstor.com>

I don't think it's a routing problem. The terminal servers actually have
10Mb interfaces and seem to be dropping packets on ping. That's why they
lag out on traceroute from time to time.

It may be they are sensitive to the traffic that is there or the cards
are actually dying. Will investigate further.

-----Original Message-----
From: Andy Sharp=20
Sent: Thursday, February 07, 2008 6:05 PM
To: John Rogers
Cc: Brian Baker; John VanderWerf
Subject: Re: Possible link problem in the lab

Well, no, this is a routing problem.  Besides, that traffic is not
high.  And there are only 7 -- mine is running off flash ~:^)  This NFS
root traffic is decent only when booting/starting up, or if a filer was
trying to write an infinite # of elog messages per second, which we all
know they aren't because there aren't any bugs.

Here's another trace route I just did when things hung up for 20
seconds:

$ traceroute -n 10.2.10.235
traceroute to 10.2.10.235 (10.2.10.235), 30 hops max, 52 byte packets
 1  10.0.0.19  0.992 ms  0.863 ms  0.924 ms
 2  10.0.0.1  5.101 ms 10.3.0.1  7.861 ms  2.954 ms
 3  * * *
 4  * * *
 5  10.2.10.235  1.166 ms  0.989 ms  1.021 ms

But traceroutes from, say 10.1.1.189, are always instantaneous:

$ traceroute -n 10.2.10.235
traceroute to 10.2.10.235 (10.2.10.235), 64 hops max, 40 byte packets
 1  10.1.1.1  1.831 ms  0.971 ms  1.70 ms
 2  66.201.51.116  3.670 ms  3.803 ms  2.944 ms
 3  10.0.0.1  2.693 ms  2.881 ms  6.215 ms
 4  10.3.0.1  2.225 ms  3.752 ms  2.196 ms
 5  10.2.10.235  4.280 ms  2.883 ms  2.840 ms
$ traceroute -n 10.2.10.235=20
traceroute to 10.2.10.235 (10.2.10.235), 64 hops max, 40 byte packets
 1  10.1.1.1  1.857 ms  1.47 ms  0.966 ms
 2  66.201.51.116  4.897 ms  2.830 ms  6.757 ms
 3  10.0.0.1  2.739 ms  5.300 ms  5.778 ms
 4  10.3.0.1  3.736 ms  2.260 ms  3.264 ms
 5  10.2.10.235  2.874 ms  4.502 ms  2.856 ms




On Thu, 7 Feb 2008 17:26:14 -0800 "John Rogers"
<john.rogers@onstor.com> wrote:

> We are looking into this. Off hand I would say that it's the nfs root
> file systems on rack 10 causing the network there to be near or at
> capacity. There are 8 cougars all nfs root mounted there and most of
> them are in heavy qe test. I'd say I wouldn't be too far off in saying
> that the 100Mb network there just aint cutting it.
>=20
> =20
>=20
> ________________________________
>=20
> From: Brian Baker=20
> Sent: Thursday, February 07, 2008 5:23 PM
> To: John Rogers; John VanderWerf; Andy Sharp
> Subject: Possible link problem in the lab
>=20
> =20
>=20
> John's
>=20
> Andy is experiencing high latency in the lab. Corp appears to hand off
> this traffic but 10.3.0.1 is lagging to 10.2.10.235
>=20
> =20
>=20
> 3 - 2/7/2008 5:19:05 PM - Brian Baker (Brian Baker)
> <http://altiris.onstor.net/AeXHD/worker/?cmd=3DviewContact&id=3D211>  =
-
> Closed
>=20
> =20
>=20
> =20
>
<http://altiris.onstor.net/AeXHD/worker/Default.aspx?cmd=3DeditItemCommen=
t
> &version=3D3&id=3D1998>=20
>=20
> Andy thanks for the info. It tells me enough to know its not my
> problem ;) Corp hands off at 10.0.0.1. The problem appears to be at
> 10.3.0.1. This is elabs domain. I will forward this info to the
> John's but you may want to re-enter this ticket through the elab
> support system.=20
>=20
>=20
>=20
> =20
>=20
> 2 - 2/7/2008 5:11:38 PM - Andy Sharp (Guest)
> <http://altiris.onstor.net/AeXHD/worker/?cmd=3DviewContact&id=3D582>  =
-
> Edit
>=20
>=20
>=20
> =20
>
<http://altiris.onstor.net/AeXHD/worker/Default.aspx?cmd=3DeditItemCommen=
t
> &version=3D2&id=3D1998>=20
>=20
> Another traceroute:
>=20
> ripper:~$ traceroute 10.2.10.235=20
> traceroute to 10.2.10.235 (10.2.10.235), 30 hops max, 52 byte packets
> 1  10.0.0.19 (10.0.0.19)  0.990 ms  0.871 ms  0.960 ms
> 2  10.0.0.1 (10.0.0.1)  0.901 ms 10.3.0.1 (10.3.0.1)  2.004 ms  0.454
> ms 3  * * *
> 4  * * *
> 5  * 10.2.10.235 (10.2.10.235)  28.732 ms *
>=20
>=20
>=20
> =20
>=20
> 1 - 2/7/2008 4:52:15 PM - Andy Sharp (Guest)
> <http://altiris.onstor.net/AeXHD/worker/?cmd=3DviewContact&id=3D582>  =
-
> Create
>=20
>=20
>=20
> =20
>
<http://altiris.onstor.net/AeXHD/worker/Default.aspx?cmd=3DeditItemCommen=
t
> &version=3D1&id=3D1998>=20
>=20
> OK I got a little excited there, but network routing between my
> workstation on 10.0.0.42 and terminal servers (10.2.10.23[56]) seems
> to be hurting, causing very bad response times sometimes.  5-10
> seconds for a keystroke sometimes.
>=20
> I ran 3 traceroutes in a row, you can see something isn't right:
>=20
> ripper:~/src/dev$ traceroute 10.2.10.235
> traceroute to 10.2.10.235 (10.2.10.235), 30 hops max, 52 byte packets
> 1  10.0.0.19 (10.0.0.19)  1.001 ms  0.909 ms  0.877 ms
> 2  10.0.0.1 (10.0.0.1)  1.010 ms 10.3.0.1 (10.3.0.1)  1.921 ms  0.399
> ms 3  * 10.2.10.235 (10.2.10.235)  37.671 ms *
> ripper:~/src/dev$ traceroute 10.2.10.235
> traceroute to 10.2.10.235 (10.2.10.235), 30 hops max, 52 byte packets
> 1  10.0.0.1 (10.0.0.1)  1.061 ms  0.980 ms  0.940 ms
> 2  10.3.0.1 (10.3.0.1)  0.469 ms  0.400 ms  0.448 ms
> 3  * * 10.2.10.235 (10.2.10.235)  1.242 ms
> ripper:~/src/dev$ traceroute 10.2.10.235
> traceroute to 10.2.10.235 (10.2.10.235), 30 hops max, 52 byte packets
> 1  10.0.0.1 (10.0.0.1)  0.987 ms  2.099 ms  0.976 ms
> 2  10.3.0.1 (10.3.0.1)  0.600 ms  0.403 ms  0.397 ms
> 3  10.2.10.235 (10.2.10.235)  1.056 ms  0.917 ms  0.913 ms
>=20
>=20
>=20
>=20
>=20
> =20
>=20
