AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20090423113531.15c2e845@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:mail.onstor.net
NSV:
SSH:
R:<brian.stark@onstor.com>,<carlos.mora@onstor.com>,<ed.kwan@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@exch1.onstor.net/INBOX	0	102AB4F33EBBDB4C91915B145C8E9FB31284F9B97D@exch1.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Thu, 23 Apr 2009 11:41:03 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: Brian Stark <brian.stark@onstor.com>
Cc: Carlos Mora <carlos.mora@onstor.com>, Ed Kwan <ed.kwan@onstor.com>
Subject: Re: Defect  TED00026664 (LSI Logic Storage GmbH - 12066) Appears to
 have some performance problems
Message-ID: <20090423114103.3122dae1@ripper.onstor.net>
In-Reply-To: <102AB4F33EBBDB4C91915B145C8E9FB31284F9B97D@exch1.onstor.net>
References: <89a7e2e1-5d35-4889-b859-9beee8e02897@exch1.onstor.net>
	<102AB4F33EBBDB4C91915B145C8E9FB31284F9B97D@exch1.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Thu, 23 Apr 2009 11:29:12 -0700 Brian Stark <brian.stark@onstor.com>
wrote:

> I have some specific questions relating to this email, so I thought
> it would be best to reply directly to this message rather than putting
> another note in the bug.
> 
> The node that has 30% CPU and no virtual servers -- if the writebench
> test is run against that node, is it then slow?  I'm basically trying
> to confirm if there's a correlation between the unexplained 30% CPU and
> the slowdown.

I assume that it can't be run against that node since if there are no
vsvrs, there are no volumes to run it against.  Hence my thought that
this must be some important clue, like maybe there is some software bug
that is causing the "other" blade to interfere and cause the slowdown.
There's a couple of ways that it could that I know of: it could be
banging on the storage/SAN; it could be firing stuff into the other
blade via the management connection.  If we could reproduce it here, it
would be very instructive to disconnect the idle blade's management
network connections to see if the slowdown disappears.

> Ed, can you try to get writebench running on an internal machine?
> It doesn't sound like it's different than what we've already done,
> but maybe it can wring out the problem more quickly.
> 
> Carlos, I would like to understand some more about the FC ports when
> the problem occurs.  FC ports 0 and 1 are on 1 QLogic chip, and FC ports
> 2 and 3 are on another QLogic chip.  I would like to see if both ports
> on the same QLogic are affected when the slowdown occurs and also if
> there's an impact on the other QLogic.  For example, if port 0 is seeing
> the slowdown, is port 1 also seeing it?  Same goes for ports 2 and 3
> (the other QLogic) when port 0 or 1 is slow.  Can we get LSI to help
> with this?  This would really help to narrow down the problem.
> 
> 
> Thanks,
> Brian
> 
> 
> -----Original Message-----
> From: Carlos Mora 
> Sent: Thursday, April 23, 2009 5:50 AM
> To: Ed Kwan; Andy Sharp
> Cc: Carlos Mora; Brian Stark
> Subject: Defect TED00026664 (LSI Logic Storage GmbH - 12066) Appears to have some performance problems
> 
> Headline: (LSI Logic Storage GmbH - 12066) Appears to have some performance problems
> id: TED00026664
> Note_Entry: There is an email called email in the /n/dw/bycase/case0012001-case0014000/case12271 directory. There are 2 screenshots that go with it that describe a bit about what is going on.
> 
> He is willing to give us the writebench application that he wrote,, it is basically a dd with some stats.
> 
> What he is finding is that one blade, that should be idle, is up around 30% cpu on the FPs while the other node that has the virtual servers is down to almost 0. When he reboots the blade with the 30% cpu, that seems to clear it up and he can run his test without performance problems.
> 
> He is not sure what is putting the blade into that 30% cpu, since there is nothing on it.
> State: Assigned
> history: 33787085	Apr 15 2009  7:16AM	carlosm	Submit	no_value	Opened
> 33787096	Apr 15 2009 10:46AM	edk	Assign	Opened	Assigned
> 33787273	Apr 21 2009  6:21AM	carlosm	Modify	Assigned	Assigned
> 33787280	Apr 21 2009  8:00AM	carlosm	Modify	Assigned	Assigned
> 33787329	Apr 22 2009  7:32AM	carlosm	Modify	Assigned	Assigned
> 33787337	Apr 22 2009 10:01AM	edk	Modify	Assigned	Assigned
> 33787374	Apr 22 2009  1:56PM	edk	Modify	Assigned	Assigned
> 33787389	Apr 22 2009  4:26PM	brians	Modify	Assigned	Assigned
> 33787402	04/23/2009 05:49:29 AM	carlosm	Modify	Assigned	Assigned
> company_name: LSI Logic GmbH
> 
