X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C8B613.7F724208@onstor-exch02.onstor.net>; Wed, 14 May 2008 15:40:23 -0700
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C8B613.7F724208"
Content-class: urn:content-classes:message
Subject: RE: Defect  SW-BSD Opened TED00023791
Date: Wed, 14 May 2008 15:40:23 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E09FCEBB5@onstor-exch02.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E09FCEBAF@onstor-exch02.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: Defect  SW-BSD Opened TED00023791
Thread-Index: Aci2BtTJKjsjhoXeRyWXATcxyCFOOQAABypwAACMEIAAACfEUAAAG5GAAAAgJmAAADCbcAAApOJ+AAAP0xAAARgRUAAAIExg
References: <BB375AF679D4A34E9CA8DFA650E2B04E09EE842B@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E09EE8455@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E09EE8459@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E09EE845B@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E09EE8468@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E09EE846B@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E0422919C@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E09EE848C@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E09FCEBAF@onstor-exch02.onstor.net>
From: "Maxim Kozlovsky" <maxim.kozlovsky@onstor.com>
To: "Raj Kumar" <raj.kumar@onstor.com>,
	"Andy Sharp" <andy.sharp@onstor.com>
Cc: "Jonathan Goldick" <jonathan.goldick@onstor.com>,
	"Tim Gardner" <tim.gardner@onstor.com>,
	"Sandrine Boulanger" <sandrine.boulanger@onstor.com>

This is a multi-part message in MIME format.

------_=_NextPart_001_01C8B613.7F724208
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Good job. Please change the defect subsystem. Isn't this supposed to be
fixed already several times over? Each time this happens somebody tells
me that this time we've got it and it will never happen again but
somehow it keeps coming back.

=20

________________________________

From: Raj Kumar=20
Sent: Wednesday, May 14, 2008 3:36 PM
To: Maxim Kozlovsky; Andy Sharp
Cc: Jonathan Goldick; Tim Gardner; Sandrine Boulanger
Subject: RE: Defect SW-BSD Opened TED00023791

=20

Looks like EMRS, complete process list at /n/newcorevol/defect_23791/

=20

=20

=20

________________________________

From: Maxim Kozlovsky=20
Sent: Wednesday, May 14, 2008 3:05 PM
To: Raj Kumar; Andy Sharp
Cc: Jonathan Goldick; Tim Gardner; Sandrine Boulanger
Subject: RE: Defect SW-BSD Opened TED00023791

=20

You've got to find out which one, otherwise it will go straight to MI.
kill is a shell builtin, it should work without forking. Try killing off
some of the processes in top output. May be top has an option to kill a
process.

=20

________________________________

From: Raj Kumar=20
Sent: Wednesday, May 14, 2008 3:01 PM
To: Maxim Kozlovsky; Andy Sharp
Cc: Jonathan Goldick; Tim Gardner; Sandrine Boulanger
Subject: RE: Defect SW-BSD Opened TED00023791

=20

Its not human. Very few people access this setup. Some process is
kicking off these shells.

=20

Only different test thats running today on this node is NCM.

=20

________________________________

From: Maxim Kozlovsky
Sent: Wed 5/14/2008 2:43 PM
To: Raj Kumar; Andy Sharp
Cc: Jonathan Goldick; Tim Gardner; Sandrine Boulanger
Subject: RE: Defect SW-BSD Opened TED00023791

Well find the one who did it and kick him.

WAD.

>-----Original Message-----
>From: Raj Kumar
>Sent: Wednesday, May 14, 2008 2:42 PM
>To: Maxim Kozlovsky; Andy Sharp
>Cc: Jonathan Goldick; Tim Gardner; Sandrine Boulanger
>Subject: RE: Defect SW-BSD Opened TED00023791
>
>Hemm, that's weird. I have only 2 ssh sessions that I have started but
I
>see at least 10 sh sessions. Wonder who kicked these off. There are 521
>processes but most of them are idle.
>
>
>Based on top, it looks like we have enough memory:
>
>load averages:  1.14,  1.13,  1.09
>14:41:34
>521 processes: 2 running, 519 idle
>CPU states:  1.6% user,  0.0% nice,  0.0% system,  0.0% interrupt,
98.4%
>idle
>Memory: Real: 35M/108M act/tot  Free: 123M  Swap: 4K/30M used/tot
>
>  PID USERNAME PRI NICE  SIZE   RES STATE WAIT     TIME    CPU COMMAND
>28107 root       2    0 1132K 1396K sleep select  10:54  0.15% pm
> 9876 root      28    0  612K 1220K run   -        0:28  0.00% top
> 1356 root       2    0 1428K 2216K run   -        0:06  0.00% vtmd
>
>
>
>-----Original Message-----
>From: Maxim Kozlovsky
>Sent: Wednesday, May 14, 2008 2:34 PM
>To: Raj Kumar; Andy Sharp
>Cc: Jonathan Goldick; Tim Gardner; Sandrine Boulanger
>Subject: RE: Defect SW-BSD Opened TED00023791
>
>Quit some of the shells that you have started. You have at least 5 in
the
>"top" output.
>
>>-----Original Message-----
>>From: Raj Kumar
>>Sent: Wednesday, May 14, 2008 2:31 PM
>>To: Maxim Kozlovsky; Andy Sharp
>>Cc: Jonathan Goldick; Tim Gardner; Sandrine Boulanger
>>Subject: RE: Defect SW-BSD Opened TED00023791
>>
>>Ps fails.
>>
>>># ps ax | grep onstor
>>>sh: cannot fork - try again
>>
>>However I had "top" running on a session, so I have provided that
>>information in the defect
>>
>>-----Original Message-----
>>From: Maxim Kozlovsky
>>Sent: Wednesday, May 14, 2008 2:30 PM
>>To: Raj Kumar; Andy Sharp
>>Cc: Jonathan Goldick; Tim Gardner; Sandrine Boulanger
>>Subject: RE: Defect SW-BSD Opened TED00023791
>>
>>Yes, the list of the processes.
>>
>>>-----Original Message-----
>>>From: Raj Kumar
>>>Sent: Wednesday, May 14, 2008 2:11 PM
>>>To: Maxim Kozlovsky; Andy Sharp
>>>Cc: Jonathan Goldick; Tim Gardner; Sandrine Boulanger
>>>Subject: FW: Defect SW-BSD Opened TED00023791
>>>
>>>Guys,
>>>
>>>Is there anything needs to be collected?
>>>
>>>Thanks.
>>>
>>>-----Original Message-----
>>>From: raj.kumar@onstor.com [mailto:raj.kumar@onstor.com]
>>>Sent: Wednesday, May 14, 2008 2:10 PM
>>>To: Andy Sharp
>>>Cc: Raj Kumar
>>>Subject: Defect SW-BSD Opened TED00023791
>>>
>>>id: TED00023791
>>>Headline: S-Soak (G8R9): BSD can not fork any more processes (May 14
>>>14:07:59 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs)
>>>Severity: 2-Major
>>>Build: Submittal 20 Beta
>>>Description: Submittal : 20_BETA
>>>Setup: SS
>>>Node: G8r9
>>>Elog at /n/newcorevol/defect_23791
>>>
>>>BSD on thsi particular node is not able to fork any more processes.
>>>
>>>I was trying to get a SGA on this node and the CLI failed. Then I
noticed
>>>several pm related messages on the elog. When I tried to look at
process
>>>list using ps, ps failed.
>>>
>>>I wonder whether this is due to the fact that I have startedusing NCM
on
>>>this node or not.
>>>
>>># ps ax | grep onstor
>>>sh: cannot fork - try again
>>># Connection to g8r9 closed.
>>>
>>>g8r9 diag> system get all
>>>% Command failure.
>>>
>>># nfxsh
>>>
>>>sh: cannot fork - try again
>>>
>>>************** Elog*********
>>>
>>>May 14 14:07:59 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>>enough pid entries, got(512) need(521)
>>>May 14 14:07:59 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>>pm_get_procs failed, -13
>>>May 14 14:08:00 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>>enough pid entries, got(512) need(521)
>>>May 14 14:08:00 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>>pm_get_procs failed, -13
>>>May 14 14:08:01 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>>enough pid entries, got(512) need(521)
>>>May 14 14:08:01 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>>pm_get_procs failed, -13
>>>May 14 14:08:02 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>>enough pid entries, got(512) need(521)
>>>May 14 14:08:02 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>>pm_get_procs failed, -13
>>>May 14 14:08:03 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>>enough pid entries, got(512) need(521)
>>>May 14 14:08:03 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>>pm_get_procs failed, -13
>>>May 14 14:08:04 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>>enough pid entries, got(512) need(521)
>>>May 14 14:08:04 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>>pm_get_procs failed, -13
>>>May 14 14:08:05 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>>enough pid entries, got(512) need(521)
>>>May 14 14:08:05 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>>pm_get_procs failed, -13
>>>May 14 14:08:06 g2r5-2280.onstor.lab : 0:0:cluster2:INFO:
>>>Cluster_SendMsgSock: sendto to 10.4.1.1 failed, msgId 10452, code 64
>(Host
>>>is down)
>>>May 14 14:08:06 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>>enough pid entries, got(512) need(521)
>>>May 14 14:08:06 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>>pm_get_procs failed, -13
>>>May 14 14:08:07 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>>enough pid entries, got(512) need(521)
>>>May 14 14:08:07 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>>pm_get_procs failed, -13
>>>May 14 14:08:09 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>>enough pid entries, got(512) need(521)
>>>May 14 14:08:09 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>>pm_get_procs failed, -13
>>>May 14 14:08:10 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>>enough pid entries, got(512) need(521)
>>>May 14 14:08:10 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>>pm_get_procs failed, -13
>>>May 14 14:08:11 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>>enough pid entries, got(512) need(521)
>>>May 14 14:08:11 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>>pm_get_procs failed, -13
>>>May 14 14:08:12 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>>enough pid entries, got(512) need(521)
>>>May 14 14:08:12 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>>pm_get_procs failed, -13
>>>May 14 14:08:13 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>>enough pid entries, got(512) need(521)
>>>May 14 14:08:13 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>>pm_get_procs failed, -13
>>>May 14 14:08:14 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>>enough pid entries, got(512) need(521)
>>>May 14 14:08:14 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>>pm_get_procs failed, -13
>>>May 14 14:08:16 g8r9-2260.onstor.lab : 0:0:pm:WARNING: pm_get_procs:
not
>>>enough pid entries, got(512) need(521)
>>>May 14 14:08:16 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:
>>>pm_get_procs failed, -13
>>>
>>>
>>>Release_Project: Cougar
>>>


------_=_NextPart_001_01C8B613.7F724208
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" =
xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:st1=3D"urn:schemas-microsoft-com:office:smarttags" =
xmlns=3D"http://www.w3.org/TR/REC-html40">

<head>
<meta http-equiv=3DContent-Type content=3D"text/html; =
charset=3Dus-ascii">
<meta name=3DGenerator content=3D"Microsoft Word 11 (filtered medium)">
<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<![endif]-->
<title>RE: Defect SW-BSD Opened TED00023791</title>
<o:SmartTagType =
namespaceuri=3D"urn:schemas-microsoft-com:office:smarttags"
 name=3D"PlaceType"/>
<o:SmartTagType =
namespaceuri=3D"urn:schemas-microsoft-com:office:smarttags"
 name=3D"PlaceName"/>
<o:SmartTagType =
namespaceuri=3D"urn:schemas-microsoft-com:office:smarttags"
 name=3D"place"/>
<o:SmartTagType =
namespaceuri=3D"urn:schemas-microsoft-com:office:smarttags"
 name=3D"PersonName"/>
<!--[if !mso]>
<style>
st1\:*{behavior:url(#default#ieooui) }
</style>
<![endif]-->
<style>
<!--
 /* Font Definitions */
 @font-face
	{font-family:Tahoma;
	panose-1:2 11 6 4 3 5 4 4 2 4;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman";}
a:link, span.MsoHyperlink
	{color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{color:blue;
	text-decoration:underline;}
p
	{mso-margin-top-alt:auto;
	margin-right:0in;
	mso-margin-bottom-alt:auto;
	margin-left:0in;
	font-size:12.0pt;
	font-family:"Times New Roman";}
span.EmailStyle18
	{mso-style-type:personal;
	font-family:Arial;
	color:navy;}
span.EmailStyle19
	{mso-style-type:personal;
	font-family:Arial;
	color:navy;}
span.EmailStyle20
	{mso-style-type:personal-reply;
	font-family:Arial;
	color:navy;}
@page Section1
	{size:8.5in 11.0in;
	margin:1.0in 1.25in 1.0in 1.25in;}
div.Section1
	{page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
 <o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
 <o:shapelayout v:ext=3D"edit">
  <o:idmap v:ext=3D"edit" data=3D"1" />
 </o:shapelayout></xml><![endif]-->
</head>

<body lang=3DEN-US link=3Dblue vlink=3Dblue>

<div class=3DSection1>

<p class=3DMsoNormal><font size=3D2 color=3Dnavy face=3DArial><span =
style=3D'font-size:
10.0pt;font-family:Arial;color:navy'>Good job. Please change the defect
subsystem. Isn&#8217;t this supposed to be fixed already several times =
over?
Each time this happens somebody tells me that this time we&#8217;ve got =
it and
it will never happen again but somehow it keeps coming =
back.<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 color=3Dnavy face=3DArial><span =
style=3D'font-size:
10.0pt;font-family:Arial;color:navy'><o:p>&nbsp;</o:p></span></font></p>

<div style=3D'border:none;border-left:solid blue 1.5pt;padding:0in 0in =
0in 4.0pt'>

<div>

<div class=3DMsoNormal align=3Dcenter style=3D'text-align:center'><font =
size=3D3
face=3D"Times New Roman"><span style=3D'font-size:12.0pt'>

<hr size=3D2 width=3D"100%" align=3Dcenter tabindex=3D-1>

</span></font></div>

<p class=3DMsoNormal><b><font size=3D2 face=3DTahoma><span =
style=3D'font-size:10.0pt;
font-family:Tahoma;font-weight:bold'>From:</span></font></b><font =
size=3D2
face=3DTahoma><span style=3D'font-size:10.0pt;font-family:Tahoma'> =
<st1:PersonName
w:st=3D"on">Raj Kumar</st1:PersonName> <br>
<b><span style=3D'font-weight:bold'>Sent:</span></b> Wednesday, May 14, =
2008 3:36
PM<br>
<b><span style=3D'font-weight:bold'>To:</span></b> Maxim Kozlovsky; Andy =
Sharp<br>
<b><span style=3D'font-weight:bold'>Cc:</span></b> Jonathan Goldick; Tim =
Gardner;
<st1:PersonName w:st=3D"on">Sandrine Boulanger</st1:PersonName><br>
<b><span style=3D'font-weight:bold'>Subject:</span></b> RE: Defect =
SW-BSD Opened
TED00023791</span></font><o:p></o:p></p>

</div>

<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
style=3D'font-size:
12.0pt'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 color=3Dnavy face=3DArial><span =
style=3D'font-size:
10.0pt;font-family:Arial;color:navy'>Looks like EMRS, complete process =
list at
/n/newcorevol/defect_23791/<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 color=3Dnavy face=3DArial><span =
style=3D'font-size:
10.0pt;font-family:Arial;color:navy'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 color=3Dnavy face=3DArial><span =
style=3D'font-size:
10.0pt;font-family:Arial;color:navy'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 color=3Dnavy face=3DArial><span =
style=3D'font-size:
10.0pt;font-family:Arial;color:navy'><o:p>&nbsp;</o:p></span></font></p>

<div>

<div class=3DMsoNormal align=3Dcenter style=3D'text-align:center'><font =
size=3D3
face=3D"Times New Roman"><span style=3D'font-size:12.0pt'>

<hr size=3D2 width=3D"100%" align=3Dcenter tabindex=3D-1>

</span></font></div>

<p class=3DMsoNormal><b><font size=3D2 face=3DTahoma><span =
style=3D'font-size:10.0pt;
font-family:Tahoma;font-weight:bold'>From:</span></font></b><font =
size=3D2
face=3DTahoma><span style=3D'font-size:10.0pt;font-family:Tahoma'> Maxim =
Kozlovsky <br>
<b><span style=3D'font-weight:bold'>Sent:</span></b> Wednesday, May 14, =
2008 3:05
PM<br>
<b><span style=3D'font-weight:bold'>To:</span></b> <st1:PersonName =
w:st=3D"on">Raj
 Kumar</st1:PersonName>; Andy Sharp<br>
<b><span style=3D'font-weight:bold'>Cc:</span></b> <st1:PersonName =
w:st=3D"on">Jonathan
 Goldick</st1:PersonName>; <st1:PersonName w:st=3D"on">Tim =
Gardner</st1:PersonName>;
<st1:PersonName w:st=3D"on">Sandrine Boulanger</st1:PersonName><br>
<b><span style=3D'font-weight:bold'>Subject:</span></b> RE: Defect =
SW-BSD Opened
TED00023791</span></font><o:p></o:p></p>

</div>

<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
style=3D'font-size:
12.0pt'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 color=3Dnavy face=3DArial><span =
style=3D'font-size:
10.0pt;font-family:Arial;color:navy'>You&#8217;ve got to find out which =
one, otherwise
it will go straight to MI. kill is a shell builtin, it should work =
without
forking. Try killing off some of the processes in top output. May be top =
has an
option to kill a process.<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 color=3Dnavy face=3DArial><span =
style=3D'font-size:
10.0pt;font-family:Arial;color:navy'><o:p>&nbsp;</o:p></span></font></p>

<div style=3D'border:none;border-left:solid blue 1.5pt;padding:0in 0in =
0in 4.0pt'>

<div>

<div class=3DMsoNormal align=3Dcenter style=3D'text-align:center'><font =
size=3D3
face=3D"Times New Roman"><span style=3D'font-size:12.0pt'>

<hr size=3D2 width=3D"100%" align=3Dcenter tabindex=3D-1>

</span></font></div>

<p class=3DMsoNormal><b><font size=3D2 face=3DTahoma><span =
style=3D'font-size:10.0pt;
font-family:Tahoma;font-weight:bold'>From:</span></font></b><font =
size=3D2
face=3DTahoma><span style=3D'font-size:10.0pt;font-family:Tahoma'> =
<st1:PersonName
w:st=3D"on">Raj Kumar</st1:PersonName> <br>
<b><span style=3D'font-weight:bold'>Sent:</span></b> Wednesday, May 14, =
2008 3:01
PM<br>
<b><span style=3D'font-weight:bold'>To:</span></b> Maxim Kozlovsky; Andy =
Sharp<br>
<b><span style=3D'font-weight:bold'>Cc:</span></b> <st1:PersonName =
w:st=3D"on">Jonathan
 Goldick</st1:PersonName>; <st1:PersonName w:st=3D"on">Tim =
Gardner</st1:PersonName>;
<st1:PersonName w:st=3D"on">Sandrine Boulanger</st1:PersonName><br>
<b><span style=3D'font-weight:bold'>Subject:</span></b> RE: Defect =
SW-BSD Opened
TED00023791</span></font><o:p></o:p></p>

</div>

<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
style=3D'font-size:
12.0pt'><o:p>&nbsp;</o:p></span></font></p>

<div id=3DidOWAReplyText46982>

<div>

<p class=3DMsoNormal><font size=3D2 color=3Dblack face=3DArial><span =
style=3D'font-size:
10.0pt;font-family:Arial;color:black'>Its not human. Very few people =
access
this setup. Some process is kicking off these =
shells.</span></font><o:p></o:p></p>

</div>

<div>

<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
style=3D'font-size:
12.0pt'>&nbsp;<o:p></o:p></span></font></p>

</div>

<div>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>Only different test thats running today on this node =
is NCM.</span></font><o:p></o:p></p>

</div>

</div>

<div>

<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
style=3D'font-size:
12.0pt'><o:p>&nbsp;</o:p></span></font></p>

<div class=3DMsoNormal align=3Dcenter style=3D'text-align:center'><font =
size=3D3
face=3D"Times New Roman"><span style=3D'font-size:12.0pt'>

<hr size=3D2 width=3D"100%" align=3Dcenter tabIndex=3D-1>

</span></font></div>

<p class=3DMsoNormal style=3D'margin-bottom:12.0pt'><b><font size=3D2 =
face=3DTahoma><span
style=3D'font-size:10.0pt;font-family:Tahoma;font-weight:bold'>From:</spa=
n></font></b><font
size=3D2 face=3DTahoma><span =
style=3D'font-size:10.0pt;font-family:Tahoma'> Maxim
Kozlovsky<br>
<b><span style=3D'font-weight:bold'>Sent:</span></b> Wed 5/14/2008 2:43 =
PM<br>
<b><span style=3D'font-weight:bold'>To:</span></b> <st1:PersonName =
w:st=3D"on">Raj
 Kumar</st1:PersonName>; Andy Sharp<br>
<b><span style=3D'font-weight:bold'>Cc:</span></b> <st1:PersonName =
w:st=3D"on">Jonathan
 Goldick</st1:PersonName>; <st1:PersonName w:st=3D"on">Tim =
Gardner</st1:PersonName>;
<st1:PersonName w:st=3D"on">Sandrine Boulanger</st1:PersonName><br>
<b><span style=3D'font-weight:bold'>Subject:</span></b> RE: Defect =
SW-BSD Opened
TED00023791</span></font><o:p></o:p></p>

</div>

<div>

<p style=3D'margin-bottom:12.0pt'><font size=3D2 face=3D"Times New =
Roman"><span
style=3D'font-size:10.0pt'>Well find the one who did it and kick =
him.<br>
<br>
WAD.<br>
<br>
&gt;-----Original Message-----<br>
&gt;From: <st1:PersonName w:st=3D"on">Raj Kumar</st1:PersonName><br>
&gt;Sent: Wednesday, May 14, 2008 2:42 PM<br>
&gt;To: Maxim Kozlovsky; Andy Sharp<br>
&gt;Cc: <st1:PersonName w:st=3D"on">Jonathan Goldick</st1:PersonName>; =
<st1:PersonName
w:st=3D"on">Tim Gardner</st1:PersonName>; <st1:PersonName =
w:st=3D"on">Sandrine
 Boulanger</st1:PersonName><br>
&gt;Subject: RE: Defect SW-BSD Opened TED00023791<br>
&gt;<br>
&gt;Hemm, that&#8217;s weird. I have only 2 ssh sessions that I have =
started
but I<br>
&gt;see at least 10 sh sessions. Wonder who kicked these off. There are =
521<br>
&gt;processes but most of them are idle.<br>
&gt;<br>
&gt;<br>
&gt;Based on top, it looks like we have enough memory:<br>
&gt;<br>
&gt;load averages:&nbsp; 1.14,&nbsp; 1.13,&nbsp; 1.09<br>
&gt;14:41:34<br>
&gt;521 processes: 2 running, 519 idle<br>
&gt;CPU states:&nbsp; 1.6% user,&nbsp; 0.0% nice,&nbsp; 0.0% =
system,&nbsp; 0.0%
interrupt, 98.4%<br>
&gt;idle<br>
&gt;Memory: Real: 35M/108M act/tot&nbsp; Free: 123M&nbsp; Swap: 4K/30M =
used/tot<br>
&gt;<br>
&gt;&nbsp; PID USERNAME PRI NICE&nbsp; SIZE&nbsp;&nbsp; <st1:place =
w:st=3D"on"><st1:PlaceName
 w:st=3D"on">RES</st1:PlaceName> <st1:PlaceType =
w:st=3D"on">STATE</st1:PlaceType></st1:place>
WAIT&nbsp;&nbsp;&nbsp;&nbsp; TIME&nbsp;&nbsp;&nbsp; CPU COMMAND<br>
&gt;28107 root&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp; 0 =
1132K
1396K sleep select&nbsp; 10:54&nbsp; 0.15% pm<br>
&gt; 9876 root&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 28&nbsp;&nbsp;&nbsp; =
0&nbsp; 612K
1220K run&nbsp;&nbsp; -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
0:28&nbsp;
0.00% top<br>
&gt; 1356 root&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp; 0 =
1428K
2216K run&nbsp;&nbsp; -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
0:06&nbsp;
0.00% vtmd<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;-----Original Message-----<br>
&gt;From: Maxim Kozlovsky<br>
&gt;Sent: Wednesday, May 14, 2008 2:34 PM<br>
&gt;To: <st1:PersonName w:st=3D"on">Raj Kumar</st1:PersonName>; Andy =
Sharp<br>
&gt;Cc: <st1:PersonName w:st=3D"on">Jonathan Goldick</st1:PersonName>; =
<st1:PersonName
w:st=3D"on">Tim Gardner</st1:PersonName>; <st1:PersonName =
w:st=3D"on">Sandrine
 Boulanger</st1:PersonName><br>
&gt;Subject: RE: Defect SW-BSD Opened TED00023791<br>
&gt;<br>
&gt;Quit some of the shells that you have started. You have at least 5 =
in the<br>
&gt;&quot;top&quot; output.<br>
&gt;<br>
&gt;&gt;-----Original Message-----<br>
&gt;&gt;From: <st1:PersonName w:st=3D"on">Raj Kumar</st1:PersonName><br>
&gt;&gt;Sent: Wednesday, May 14, 2008 2:31 PM<br>
&gt;&gt;To: Maxim Kozlovsky; Andy Sharp<br>
&gt;&gt;Cc: <st1:PersonName w:st=3D"on">Jonathan =
Goldick</st1:PersonName>; <st1:PersonName
w:st=3D"on">Tim Gardner</st1:PersonName>; <st1:PersonName =
w:st=3D"on">Sandrine
 Boulanger</st1:PersonName><br>
&gt;&gt;Subject: RE: Defect SW-BSD Opened TED00023791<br>
&gt;&gt;<br>
&gt;&gt;Ps fails.<br>
&gt;&gt;<br>
&gt;&gt;&gt;# ps ax | grep onstor<br>
&gt;&gt;&gt;sh: cannot fork - try again<br>
&gt;&gt;<br>
&gt;&gt;However I had &quot;top&quot; running on a session, so I have =
provided
that<br>
&gt;&gt;information in the defect<br>
&gt;&gt;<br>
&gt;&gt;-----Original Message-----<br>
&gt;&gt;From: Maxim Kozlovsky<br>
&gt;&gt;Sent: Wednesday, May 14, 2008 2:30 PM<br>
&gt;&gt;To: <st1:PersonName w:st=3D"on">Raj Kumar</st1:PersonName>; Andy =
Sharp<br>
&gt;&gt;Cc: <st1:PersonName w:st=3D"on">Jonathan =
Goldick</st1:PersonName>; <st1:PersonName
w:st=3D"on">Tim Gardner</st1:PersonName>; <st1:PersonName =
w:st=3D"on">Sandrine
 Boulanger</st1:PersonName><br>
&gt;&gt;Subject: RE: Defect SW-BSD Opened TED00023791<br>
&gt;&gt;<br>
&gt;&gt;Yes, the list of the processes.<br>
&gt;&gt;<br>
&gt;&gt;&gt;-----Original Message-----<br>
&gt;&gt;&gt;From: <st1:PersonName w:st=3D"on">Raj =
Kumar</st1:PersonName><br>
&gt;&gt;&gt;Sent: Wednesday, May 14, 2008 2:11 PM<br>
&gt;&gt;&gt;To: Maxim Kozlovsky; Andy Sharp<br>
&gt;&gt;&gt;Cc: <st1:PersonName w:st=3D"on">Jonathan =
Goldick</st1:PersonName>; <st1:PersonName
w:st=3D"on">Tim Gardner</st1:PersonName>; <st1:PersonName =
w:st=3D"on">Sandrine
 Boulanger</st1:PersonName><br>
&gt;&gt;&gt;Subject: FW: Defect SW-BSD Opened TED00023791<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;Guys,<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;Is there anything needs to be collected?<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;Thanks.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;-----Original Message-----<br>
&gt;&gt;&gt;From: raj.kumar@onstor.com [<a =
href=3D"mailto:raj.kumar@onstor.com">mailto:raj.kumar@onstor.com</a>]<br>=

&gt;&gt;&gt;Sent: Wednesday, May 14, 2008 2:10 PM<br>
&gt;&gt;&gt;To: Andy Sharp<br>
&gt;&gt;&gt;Cc: <st1:PersonName w:st=3D"on">Raj =
Kumar</st1:PersonName><br>
&gt;&gt;&gt;Subject: Defect SW-BSD Opened TED00023791<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;id: TED00023791<br>
&gt;&gt;&gt;Headline: S-Soak (G8R9): BSD can not fork any more processes =
(May
14<br>
&gt;&gt;&gt;14:07:59 g8r9-2260.onstor.lab : 0:0:pm:WARNING: =
pm_get_procs)<br>
&gt;&gt;&gt;Severity: 2-Major<br>
&gt;&gt;&gt;Build: Submittal 20 Beta<br>
&gt;&gt;&gt;Description: Submittal : 20_BETA<br>
&gt;&gt;&gt;Setup: SS<br>
&gt;&gt;&gt;Node: G8r9<br>
&gt;&gt;&gt;Elog at /n/newcorevol/defect_23791<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;BSD on thsi particular node is not able to fork any more =
processes.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;I was trying to get a SGA on this node and the CLI failed. =
Then I
noticed<br>
&gt;&gt;&gt;several pm related messages on the elog. When I tried to =
look at
process<br>
&gt;&gt;&gt;list using ps, ps failed.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;I wonder whether this is due to the fact that I have =
startedusing
NCM on<br>
&gt;&gt;&gt;this node or not.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;# ps ax | grep onstor<br>
&gt;&gt;&gt;sh: cannot fork - try again<br>
&gt;&gt;&gt;# Connection to g8r9 closed.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;g8r9 diag&gt; system get all<br>
&gt;&gt;&gt;% Command failure.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;# nfxsh<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;sh: cannot fork - try again<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;************** Elog*********<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;May 14 14:07:59 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_get_procs: not<br>
&gt;&gt;&gt;enough pid entries, got(512) need(521)<br>
&gt;&gt;&gt;May 14 14:07:59 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:<br>
&gt;&gt;&gt;pm_get_procs failed, -13<br>
&gt;&gt;&gt;May 14 14:08:00 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_get_procs: not<br>
&gt;&gt;&gt;enough pid entries, got(512) need(521)<br>
&gt;&gt;&gt;May 14 14:08:00 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:<br>
&gt;&gt;&gt;pm_get_procs failed, -13<br>
&gt;&gt;&gt;May 14 14:08:01 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_get_procs: not<br>
&gt;&gt;&gt;enough pid entries, got(512) need(521)<br>
&gt;&gt;&gt;May 14 14:08:01 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:<br>
&gt;&gt;&gt;pm_get_procs failed, -13<br>
&gt;&gt;&gt;May 14 14:08:02 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_get_procs: not<br>
&gt;&gt;&gt;enough pid entries, got(512) need(521)<br>
&gt;&gt;&gt;May 14 14:08:02 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:<br>
&gt;&gt;&gt;pm_get_procs failed, -13<br>
&gt;&gt;&gt;May 14 14:08:03 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_get_procs: not<br>
&gt;&gt;&gt;enough pid entries, got(512) need(521)<br>
&gt;&gt;&gt;May 14 14:08:03 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:<br>
&gt;&gt;&gt;pm_get_procs failed, -13<br>
&gt;&gt;&gt;May 14 14:08:04 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_get_procs: not<br>
&gt;&gt;&gt;enough pid entries, got(512) need(521)<br>
&gt;&gt;&gt;May 14 14:08:04 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:<br>
&gt;&gt;&gt;pm_get_procs failed, -13<br>
&gt;&gt;&gt;May 14 14:08:05 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_get_procs: not<br>
&gt;&gt;&gt;enough pid entries, got(512) need(521)<br>
&gt;&gt;&gt;May 14 14:08:05 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:<br>
&gt;&gt;&gt;pm_get_procs failed, -13<br>
&gt;&gt;&gt;May 14 14:08:06 g2r5-2280.onstor.lab : =
0:0:cluster2:INFO:<br>
&gt;&gt;&gt;Cluster_SendMsgSock: sendto to 10.4.1.1 failed, msgId 10452, =
code
64<br>
&gt;(Host<br>
&gt;&gt;&gt;is down)<br>
&gt;&gt;&gt;May 14 14:08:06 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_get_procs: not<br>
&gt;&gt;&gt;enough pid entries, got(512) need(521)<br>
&gt;&gt;&gt;May 14 14:08:06 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:<br>
&gt;&gt;&gt;pm_get_procs failed, -13<br>
&gt;&gt;&gt;May 14 14:08:07 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_get_procs: not<br>
&gt;&gt;&gt;enough pid entries, got(512) need(521)<br>
&gt;&gt;&gt;May 14 14:08:07 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:<br>
&gt;&gt;&gt;pm_get_procs failed, -13<br>
&gt;&gt;&gt;May 14 14:08:09 g8r9-2260.onstor.lab : 0:0:pm:WARNING: =
pm_get_procs:
not<br>
&gt;&gt;&gt;enough pid entries, got(512) need(521)<br>
&gt;&gt;&gt;May 14 14:08:09 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:<br>
&gt;&gt;&gt;pm_get_procs failed, -13<br>
&gt;&gt;&gt;May 14 14:08:10 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_get_procs: not<br>
&gt;&gt;&gt;enough pid entries, got(512) need(521)<br>
&gt;&gt;&gt;May 14 14:08:10 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:<br>
&gt;&gt;&gt;pm_get_procs failed, -13<br>
&gt;&gt;&gt;May 14 14:08:11 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_get_procs: not<br>
&gt;&gt;&gt;enough pid entries, got(512) need(521)<br>
&gt;&gt;&gt;May 14 14:08:11 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:<br>
&gt;&gt;&gt;pm_get_procs failed, -13<br>
&gt;&gt;&gt;May 14 14:08:12 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_get_procs: not<br>
&gt;&gt;&gt;enough pid entries, got(512) need(521)<br>
&gt;&gt;&gt;May 14 14:08:12 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:<br>
&gt;&gt;&gt;pm_get_procs failed, -13<br>
&gt;&gt;&gt;May 14 14:08:13 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_get_procs: not<br>
&gt;&gt;&gt;enough pid entries, got(512) need(521)<br>
&gt;&gt;&gt;May 14 14:08:13 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:<br>
&gt;&gt;&gt;pm_get_procs failed, -13<br>
&gt;&gt;&gt;May 14 14:08:14 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_get_procs: not<br>
&gt;&gt;&gt;enough pid entries, got(512) need(521)<br>
&gt;&gt;&gt;May 14 14:08:14 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:<br>
&gt;&gt;&gt;pm_get_procs failed, -13<br>
&gt;&gt;&gt;May 14 14:08:16 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_get_procs: not<br>
&gt;&gt;&gt;enough pid entries, got(512) need(521)<br>
&gt;&gt;&gt;May 14 14:08:16 g8r9-2260.onstor.lab : 0:0:pm:WARNING:
pm_timeout_work:<br>
&gt;&gt;&gt;pm_get_procs failed, -13<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;Release_Project: Cougar<br>
&gt;&gt;&gt;</span></font><o:p></o:p></p>

</div>

</div>

</div>

</div>

</body>

</html>

------_=_NextPart_001_01C8B613.7F724208--
