X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C8B60B.62913642@onstor-exch02.onstor.net>; Wed, 14 May 2008 14:42:19 -0700
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Content-class: urn:content-classes:message
Subject: RE: Defect  SW-BSD Opened TED00023791
Date: Wed, 14 May 2008 14:42:19 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E09EE8468@onstor-exch02.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E09EE845B@onstor-exch02.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: Defect  SW-BSD Opened TED00023791
Thread-Index: Aci2BtTJKjsjhoXeRyWXATcxyCFOOQAABypwAACMEIAAACfEUAAAG5GAAAAgJmA=
References: <BB375AF679D4A34E9CA8DFA650E2B04E09EE842B@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E09EE8455@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E09EE8459@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E09EE845B@onstor-exch02.onstor.net>
From: "Raj Kumar" <raj.kumar@onstor.com>
To: "Maxim Kozlovsky" <maxim.kozlovsky@onstor.com>,
	"Andy Sharp" <andy.sharp@onstor.com>
Cc: "Jonathan Goldick" <jonathan.goldick@onstor.com>,
	"Tim Gardner" <tim.gardner@onstor.com>,
	"Sandrine Boulanger" <sandrine.boulanger@onstor.com>

Hemm, that's weird. I have only 2 ssh sessions that I have started but I
see at least 10 sh sessions. Wonder who kicked these off. There are 521
processes but most of them are idle.


Based on top, it looks like we have enough memory:

load averages:  1.14,  1.13,  1.09
14:41:34
521 processes: 2 running, 519 idle
CPU states:  1.6% user,  0.0% nice,  0.0% system,  0.0% interrupt, 98.4%
idle
Memory: Real: 35M/108M act/tot  Free: 123M  Swap: 4K/30M used/tot

  PID USERNAME PRI NICE  SIZE   RES STATE WAIT     TIME    CPU COMMAND
28107 root       2    0 1132K 1396K sleep select  10:54  0.15% pm
 9876 root      28    0  612K 1220K run   -        0:28  0.00% top
 1356 root       2    0 1428K 2216K run   -        0:06  0.00% vtmd



-----Original Message-----
From: Maxim Kozlovsky=20
Sent: Wednesday, May 14, 2008 2:34 PM
To: Raj Kumar; Andy Sharp
Cc: Jonathan Goldick; Tim Gardner; Sandrine Boulanger
Subject: RE: Defect SW-BSD Opened TED00023791

Quit some of the shells that you have started. You have at least 5 in
the "top" output.

>-----Original Message-----
>From: Raj Kumar
>Sent: Wednesday, May 14, 2008 2:31 PM
>To: Maxim Kozlovsky; Andy Sharp
>Cc: Jonathan Goldick; Tim Gardner; Sandrine Boulanger
>Subject: RE: Defect SW-BSD Opened TED00023791
>
>Ps fails.
>
>># ps ax | grep onstor
>>sh: cannot fork - try again
>
>However I had "top" running on a session, so I have provided that
>information in the defect
>
>-----Original Message-----
>From: Maxim Kozlovsky
>Sent: Wednesday, May 14, 2008 2:30 PM
>To: Raj Kumar; Andy Sharp
>Cc: Jonathan Goldick; Tim Gardner; Sandrine Boulanger
>Subject: RE: Defect SW-BSD Opened TED00023791
>
>Yes, the list of the processes.
>
>>-----Original Message-----
>>From: Raj Kumar
>>Sent: Wednesday, May 14, 2008 2:11 PM
>>To: Maxim Kozlovsky; Andy Sharp
>>Cc: Jonathan Goldick; Tim Gardner; Sandrine Boulanger
>>Subject: FW: Defect SW-BSD Opened TED00023791
>>
>>Guys,
>>
>>Is there anything needs to be collected?
>>
>>Thanks.
>>
>>-----Original Message-----
>>From: raj.kumar@onstor.com [mailto:raj.kumar@onstor.com]
>>Sent: Wednesday, May 14, 2008 2:10 PM
>>To: Andy Sharp
>>Cc: Raj Kumar
>>Subject: Defect SW-BSD Opened TED00023791
>>
>>id: TED00023791
>>Headline: S-Soak (G8R9): BSD can not fork any more processes (May 14
>>14:07:59 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs)
>>Severity: 2-Major
>>Build: Submittal 20 Beta
>>Description: Submittal : 20_BETA
>>Setup: SS
>>Node: G8r9
>>Elog at /n/newcorevol/defect_23791
>>
>>BSD on thsi particular node is not able to fork any more processes.
>>
>>I was trying to get a SGA on this node and the CLI failed. Then I
noticed
>>several pm related messages on the elog. When I tried to look at
process
>>list using ps, ps failed.
>>
>>I wonder whether this is due to the fact that I have startedusing NCM
on
>>this node or not.
>>
>># ps ax | grep onstor
>>sh: cannot fork - try again
>># Connection to g8r9 closed.
>>
>>g8r9 diag> system get all
>>% Command failure.
>>
>># nfxsh
>>
>>sh: cannot fork - try again
>>
>>************** Elog*********
>>
>>May 14 14:07:59 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>enough pid entries, got(512) need(521)
>>May 14 14:07:59 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>pm_get_procs failed, -13
>>May 14 14:08:00 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>enough pid entries, got(512) need(521)
>>May 14 14:08:00 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>pm_get_procs failed, -13
>>May 14 14:08:01 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>enough pid entries, got(512) need(521)
>>May 14 14:08:01 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>pm_get_procs failed, -13
>>May 14 14:08:02 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>enough pid entries, got(512) need(521)
>>May 14 14:08:02 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>pm_get_procs failed, -13
>>May 14 14:08:03 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>enough pid entries, got(512) need(521)
>>May 14 14:08:03 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>pm_get_procs failed, -13
>>May 14 14:08:04 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>enough pid entries, got(512) need(521)
>>May 14 14:08:04 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>pm_get_procs failed, -13
>>May 14 14:08:05 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>enough pid entries, got(512) need(521)
>>May 14 14:08:05 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>pm_get_procs failed, -13
>>May 14 14:08:06 g2r5-2280.onstor.lab : 0:0:cluster2:INFO:
>>Cluster_SendMsgSock: sendto to 10.4.1.1 failed, msgId 10452, code 64
(Host
>>is down)
>>May 14 14:08:06 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>enough pid entries, got(512) need(521)
>>May 14 14:08:06 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>pm_get_procs failed, -13
>>May 14 14:08:07 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>enough pid entries, got(512) need(521)
>>May 14 14:08:07 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>pm_get_procs failed, -13
>>May 14 14:08:09 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>enough pid entries, got(512) need(521)
>>May 14 14:08:09 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>pm_get_procs failed, -13
>>May 14 14:08:10 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>enough pid entries, got(512) need(521)
>>May 14 14:08:10 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>pm_get_procs failed, -13
>>May 14 14:08:11 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>enough pid entries, got(512) need(521)
>>May 14 14:08:11 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>pm_get_procs failed, -13
>>May 14 14:08:12 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>enough pid entries, got(512) need(521)
>>May 14 14:08:12 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>pm_get_procs failed, -13
>>May 14 14:08:13 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>enough pid entries, got(512) need(521)
>>May 14 14:08:13 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>pm_get_procs failed, -13
>>May 14 14:08:14 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>enough pid entries, got(512) need(521)
>>May 14 14:08:14 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>pm_get_procs failed, -13
>>May 14 14:08:16 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>enough pid entries, got(512) need(521)
>>May 14 14:08:16 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>pm_get_procs failed, -13
>>
>>
>>Release_Project: Cougar
>>

