X-Sylpheed-Account-Id:2
S:andy.sharp@lsi.com
SCF:#mh/Mailbox/sent
X-Sylpheed-Sign:0
X-Sylpheed-Encrypt:0
X-Sylpheed-Privacy-System:
RMID:#imap/LSI/INBOX	0	A1FEB16D007D2E4DAE212D51980EC3B9DB3601D7@sikmail02.lsi.com
X-Sylpheed-End-Special-Headers: 1
Date: Tue, 6 Oct 2009 23:35:15 -0700
From: Andrew Sharp <andy.sharp@lsi.com>
To: "Irie, Shin" <Shin.Irie@lsi.com>
Cc: "Johnson, Dave" <Dave.Johnson@lsi.com>, "Scheer, Larry"
 <Larry.Scheer@lsi.com>, DL-ONStor-cstech <dl-cstech@lsi.com>
Subject: Re: running processes queue hitting huge spikes on beast...
Message-ID: <20091006233515.21c023c1@ripper.onstor.net>
References: <C5277CB418429641BC1498607A9F480593D3B405@cosmail01.lsi.com>
	<DEC609CD0E54B2448DAF023C89AE9755E250CF1A@cosmail02.lsi.com>
	<C5277CB418429641BC1498607A9F480593D3B47C@cosmail01.lsi.com>
	<A1FEB16D007D2E4DAE212D51980EC3B9DB3601D7@sikmail02.lsi.com>
Organization: LSI
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Tue, 6 Oct 2009 18:27:43 -0600 "Irie, Shin" <Shin.Irie@lsi.com>
wrote:

> From: "Irie, Shin" <Shin.Irie@lsi.com>
To: "Johnson, Dave" <Dave.Johnson@lsi.com>, "Scheer, Larry"
<Larry.Scheer@lsi.com>, DL-ONStor-cstech <dl-cstech@lsi.com> Subject:
RE: running processes queue hitting huge spikes on beast... Date: Tue,
6 Oct 2009 18:27:43 -0600

> I think it's related to EMRS.  Lots of upload.cgi was running around
> 1700 today.
> 
> top - 17:00:29 up 20 days, 22:13,  5 users,  load average: 19.82,
> 5.84, 3.70 Tasks: 334 total,  60 running, 274 sleeping,   0
> stopped,   0 zombie Cpu(s): 89.4%us,  9.8%sy,  0.0%ni,  0.0%id,
> 0.0%wa,  0.3%hi,  0.5%si,  0.0%st Mem:   7262544k total,  2527472k
> used,  4735072k free,    96368k buffers Swap:  7815580k total,
> 64k used,  7815516k free,   782512k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 11598 emrs      15   0 47908  43m 3676 S   12  0.6   0:09.83
> EMRS_event_cons 4080 emrs      16   0  252m 250m 3072 S    6  3.5
> 356:06.93 emrs_mq.pl
                               ^^^^
Holy flying pigs, Batman, who gave this thing so much thrust?
250m *resident* for a user space process?  Not to mention the 10-20m
for each of the others (that we can see).  It says 60 running, including
the ones in iowait.  These processes are total porkers.  No wonder the
system caves.  The all get IO contentious at some point, I'm betting,
causing the large load average.  If this was the mysql daemon, that'd
be one thing, but this is a friggin perl program.

I wonder why it says 0.0%wa when I can see two processes in this short
list alone that are in IO wait.  The CPU contention is also impressive,
sixty processes adding up to almost 100% cpu, but most of them in the
5-6% range.  That's a lot of task switching.  We might help ourselves
by increasing the maximum time slice on this system.  Processes might
have to wait longer to get started, but there'd be less thrashing.
Or just add 3-4 more cores.

> 11872 emrs      17   0 11760 9.8m 1692 R    6  0.1   0:00.28 
upload.cgi
> 11856 emrs      17   0 15996  13m 1816 R    5  0.2   0:00.40
> upload.cgi 11871 emrs      17   0 11104 9444 1692 R    5  0.1
> 0:00.26 upload.cgi 11820 emrs      15   0 23400  20m 3572 D    5
> 0.3   0:00.61 upload.cgi 11825 emrs      17   0 17952  15m 1844 R
> 4  0.2   0:00.45 upload.cgi 11857 emrs      17   0 17600  15m 1828
> R    4  0.2   0:00.44 upload.cgi 11881 emrs      17   0 10440 8784
> 1684 R    4  0.1   0:00.21 upload.cgi
> 11848 emrs      17   0 17820  15m 1828 R    4  0.2   0:00.46
> upload.cgi
11858 emrs      17   0 16940  14m 1828 R    4  0.2   0:00.40
> upload.cgi
11869 emrs      17   0 10048 8400 1684 R    4  0.1   0:00.19
> upload.cgi
11884 emrs      17   0 10044 8360 1684 R    4  0.1   0:00.18
> upload.cgi
11841 emrs      17   0 17952  15m 1844 D    3  0.2   0:00.44
> upload.cgi
11852 emrs      17   0 16148  13m 1824 R    3  0.2   0:00.38
> upload.cgi

