AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:
CFG:
PT:0
S:andy.sharp@lsi.com
RQ:
SSV:mhbs.lsil.com
NSV:
SSH:
R:<Shin.Irie@lsi.com>,<Dave.Johnson@lsi.com>,<Larry.Scheer@lsi.com>,<dl-cstech@lsi.com>
MAID:2
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/LSI/INBOX	0	A1FEB16D007D2E4DAE212D51980EC3B9DB3601D7@sikmail02.lsi.com
X-Sylpheed-End-Special-Headers: 1
Date: Tue, 6 Oct 2009 23:44:05 -0700
From: Andrew Sharp <andy.sharp@lsi.com>
To: "Irie, Shin" <Shin.Irie@lsi.com>
Cc: "Johnson, Dave" <Dave.Johnson@lsi.com>, "Scheer, Larry"
 <Larry.Scheer@lsi.com>, DL-ONStor-cstech <dl-cstech@lsi.com>
Subject: Re: running processes queue hitting huge spikes on beast...
Message-ID: <20091006234405.36cc1a2a@ripper.onstor.net>
In-Reply-To: <A1FEB16D007D2E4DAE212D51980EC3B9DB3601D7@sikmail02.lsi.com>
References: <C5277CB418429641BC1498607A9F480593D3B405@cosmail01.lsi.com>
	<DEC609CD0E54B2448DAF023C89AE9755E250CF1A@cosmail02.lsi.com>
	<C5277CB418429641BC1498607A9F480593D3B47C@cosmail01.lsi.com>
	<A1FEB16D007D2E4DAE212D51980EC3B9DB3601D7@sikmail02.lsi.com>
Organization: LSI
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Tue, 6 Oct 2009 18:27:43 -0600 "Irie, Shin" <Shin.Irie@lsi.com>
wrote:

> I think it's related to EMRS.  Lots of upload.cgi was running around 1700 today.

> top - 17:00:29 up 20 days, 22:13,  5 users,  load average: 19.82, 5.84, 3.70
> Tasks: 334 total,  60 running, 274 sleeping,   0 stopped,   0 zombie
> Cpu(s): 89.4%us,  9.8%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.3%hi, 0.5%si,  0.0%st
> Mem:   7262544k total,  2527472k used,  4735072k free,    96368k buffers
> Swap:  7815580k total,       64k used,  7815516k free,   782512k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 11598 emrs      15   0 47908  43m 3676 S   12  0.6   0:09.83 EMRS_event_cons
>  4080 emrs      16   0  252m 250m 3072 S    6  3.5 356:06.93 emrs_mq.pl
                               ^^^^
Holy flying pigs, Batman, who gave this thing so much thrust?
250m *resident* for a user space process?  Not to mention the 10-20m
for each of the others (that we can see).  It says 60 running, including
the ones in iowait.  These processes are total porkers.  No wonder the
system caves.  The all getI/O contentious at some point, I'm betting,
causing the large load average.  If this was the mysql daemon, that'd be
one thing, but this is a friggin perl program.  The system's got memory
enough to get bottlenecked on CPU, but it's thrashing.

I wonder why it says 0.0%wa when I can see two processes in this short
list alone that are in IO wait.  Maybe because the CPU contention is
also impressive, sixty processes adding up to almost 100% cpu, but most
of them in the 5-6% range.  That's a lot of task switching.  We might
help ourselves by increasing the maximum time slice on this system.
Processes might have to wait longer to get started, but there'd be
less thrashing.  Or just add 3-4 more cores.

> 11872 emrs      17   0 11760 9.8m 1692 R    6  0.1   0:00.28 upload.cgi
> 11856 emrs      17   0 15996  13m 1816 R    5  0.2   0:00.40 upload.cgi
> 11871 emrs      17   0 11104 9444 1692 R    5  0.1   0:00.26 upload.cgi
> 11820 emrs      15   0 23400  20m 3572 D    5  0.3   0:00.61 upload.cgi
> 11825 emrs      17   0 17952  15m 1844 R    4  0.2   0:00.45 upload.cgi
> 11857 emrs      17   0 17600  15m 1828 R    4  0.2   0:00.44 upload.cgi
> 11881 emrs      17   0 10440 8784 1684 R    4  0.1   0:00.21 upload.cgi
> 11848 emrs      17   0 17820  15m 1828 R    4  0.2   0:00.46 upload.cgi
> 11858 emrs      17   0 16940  14m 1828 R    4  0.2   0:00.40 upload.cgi
> 11869 emrs      17   0 10048 8400 1684 R    4  0.1   0:00.19 upload.cgi
> 11884 emrs      17   0 10044 8360 1684 R    4  0.1   0:00.18 upload.cgi
> 11841 emrs      17   0 17952  15m 1844 D    3  0.2   0:00.44 upload.cgi
> 11852 emrs      17   0 16148  13m 1824 R    3  0.2   0:00.38 upload.cgi



