AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080514154030.31d6b1fe@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<chris.vandever@onstor.com>,<raj.kumar@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E03E9A865@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Wed, 14 May 2008 15:40:51 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Chris Vandever" <chris.vandever@onstor.com>
Cc: "Raj Kumar" <raj.kumar@onstor.com>
Subject: Re: Defect  SW-BSD Opened TED00023791
Message-ID: <20080514154051.5ac3081e@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E03E9A865@onstor-exch02.onstor.net>
References: <BB375AF679D4A34E9CA8DFA650E2B04E09EE845A@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E03E9A865@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Well, first, kill top, then run ps to find out what the majority of the
519 (!) processes are.  Probably something like emrscron or socat or
something.  Or 380 nfxsh sessions Raj has laying about for some reason.

But is sounds like Raj already did that ~:^)

Cheers,

a

 On Wed, 14 May 2008 14:36:25 -0700 "Chris Vandever"
<chris.vandever@onstor.com> wrote:

> Andy, is there some way we can find out what processes are running?
> Can we access /dev/proc or whatever directly?  It looks to me like
> we're out of memory, so how is an administrator supposed to recover
> without rebooting?
> 
> ChrisV
> 
> -----Original Message-----
> From: Raj Kumar 
> Sent: Wednesday, May 14, 2008 2:33 PM
> To: Chris Vandever
> Subject: RE: Defect SW-BSD Opened TED00023791
> 
> Even ps fails.
> 
> # ps
> sh: cannot fork - try again
> 
> 
> 
> 
> -----Original Message-----
> From: Chris Vandever 
> Sent: Wednesday, May 14, 2008 2:32 PM
> To: Raj Kumar
> Subject: RE: Defect SW-BSD Opened TED00023791
> 
> We're out of memory.  There are too many processes running.  Do a ps
> and see, but I suspect we have a bunch of emrs processes wedged.
> 'sh' will fail when we're out of memory.  -13 error from pm is
> RMC_NOMEM.
> 
> ChrisV
> 
> -----Original Message-----
> From: Raj Kumar 
> Sent: Wednesday, May 14, 2008 2:28 PM
> To: Chris Vandever
> Subject: FW: Defect SW-BSD Opened TED00023791
> 
> Since you are looking at the elogs, there is already a defect for the
> pm errors that you will see on g8r9.
> 
> -----Original Message-----
> From: raj.kumar@onstor.com [mailto:raj.kumar@onstor.com] 
> Sent: Wednesday, May 14, 2008 2:10 PM
> To: Andy Sharp
> Cc: Raj Kumar
> Subject: Defect SW-BSD Opened TED00023791
> 
> id: TED00023791
> Headline: S-Soak (G8R9): BSD can not fork any more processes (May 14
> 14:07:59 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs)
> Severity: 2-Major
> Build: Submittal 20 Beta
> Description: Submittal : 20_BETA
> Setup: SS
> Node: G8r9
> Elog at /n/newcorevol/defect_23791
> 
> BSD on thsi particular node is not able to fork any more processes. 
> 
> I was trying to get a SGA on this node and the CLI failed. Then I
> noticed several pm related messages on the elog. When I tried to look
> at process list using ps, ps failed.
> 
> I wonder whether this is due to the fact that I have startedusing NCM
> on this node or not.
> 
> # ps ax | grep onstor
> sh: cannot fork - try again
> # Connection to g8r9 closed.
> 
> g8r9 diag> system get all
> % Command failure.
> 
> # nfxsh
> 
> sh: cannot fork - try again
> 
> ************** Elog*********
> 
> May 14 14:07:59 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
> not enough pid entries, got(512) need(521)
> May 14 14:07:59 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
> pm_timeout_work: pm_get_procs failed, -13
> May 14 14:08:00 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
> not enough pid entries, got(512) need(521)
> May 14 14:08:00 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
> pm_timeout_work: pm_get_procs failed, -13
> May 14 14:08:01 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
> not enough pid entries, got(512) need(521)
> May 14 14:08:01 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
> pm_timeout_work: pm_get_procs failed, -13
> May 14 14:08:02 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
> not enough pid entries, got(512) need(521)
> May 14 14:08:02 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
> pm_timeout_work: pm_get_procs failed, -13
> May 14 14:08:03 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
> not enough pid entries, got(512) need(521)
> May 14 14:08:03 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
> pm_timeout_work: pm_get_procs failed, -13
> May 14 14:08:04 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
> not enough pid entries, got(512) need(521)
> May 14 14:08:04 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
> pm_timeout_work: pm_get_procs failed, -13
> May 14 14:08:05 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
> not enough pid entries, got(512) need(521)
> May 14 14:08:05 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
> pm_timeout_work: pm_get_procs failed, -13
> May 14 14:08:06 g2r5-2280.onstor.lab : 0:0:cluster2:INFO:
> Cluster_SendMsgSock: sendto to 10.4.1.1 failed, msgId 10452, code 64
> (Host is down)
> May 14 14:08:06 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
> not enough pid entries, got(512) need(521)
> May 14 14:08:06 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
> pm_timeout_work: pm_get_procs failed, -13
> May 14 14:08:07 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
> not enough pid entries, got(512) need(521)
> May 14 14:08:07 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
> pm_timeout_work: pm_get_procs failed, -13
> May 14 14:08:09 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
> not enough pid entries, got(512) need(521)
> May 14 14:08:09 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
> pm_timeout_work: pm_get_procs failed, -13
> May 14 14:08:10 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
> not enough pid entries, got(512) need(521)
> May 14 14:08:10 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
> pm_timeout_work: pm_get_procs failed, -13
> May 14 14:08:11 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
> not enough pid entries, got(512) need(521)
> May 14 14:08:11 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
> pm_timeout_work: pm_get_procs failed, -13
> May 14 14:08:12 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
> not enough pid entries, got(512) need(521)
> May 14 14:08:12 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
> pm_timeout_work: pm_get_procs failed, -13
> May 14 14:08:13 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
> not enough pid entries, got(512) need(521)
> May 14 14:08:13 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
> pm_timeout_work: pm_get_procs failed, -13
> May 14 14:08:14 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
> not enough pid entries, got(512) need(521)
> May 14 14:08:14 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
> pm_timeout_work: pm_get_procs failed, -13
> May 14 14:08:16 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
> not enough pid entries, got(512) need(521)
> May 14 14:08:16 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
> pm_timeout_work: pm_get_procs failed, -13
> 
> 
> Release_Project: Cougar
> 
> 
