AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080616091432.6fce12ff@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<kumarv@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E0A6E8D58@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Mon, 16 Jun 2008 09:14:48 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Kumar Vakacharla (HCL)" <kumarv@onstor.com>
Subject: Re: Review Request : TED22005
Message-ID: <20080616091448.37dd22b3@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E0A6E8D58@onstor-exch02.onstor.net>
References: <BB375AF679D4A34E9CA8DFA650E2B04E0A60C96D@onstor-exch02.onstor.net>
	<20080612114827.609ea49c@ripper.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E0A6E85FD@onstor-exch02.onstor.net>
	<20080612131534.7825d252@ripper.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E0A6E8D58@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Hi Kumar,

The changelist 29620 has no files in it.

They're probably in your default change list?  You can move them to
29620 with the reopen command:

p4 reopen -c 29620 <file>



On Fri, 13 Jun 2008 22:03:42 -0700 "Kumar Vakacharla (HCL)"
<kumarv@onstor.com> wrote:

> Andy, 
> 
>   I have modified the code according to your comments and ready for
> review.
> 
>   Please let me know if you see any problems. 
> 
> P4CLIENT=kumarv-DEV
> P4 Change Id: 29620
> PATH: /homes/kumarv/work/dev/
> ========================================================================
> =
> kumarv@compile2>p4 describe 29620
> Change 29620 by perforce@kumarv-DEV on 2008/06/10 18:23:56 *pending*
> 
>         TED00022005 (LSI-PA 6989) Each system hung after a number of
> crashes
>         (minority pcc state)
> 
>         Fix Description:
>         	Make pgid of reboot process different from pm process
> group.
>     		The processes in pm group may receive SIGTERM
> during the reboot 		operation.
> Affected files ...
> 
> ... //depot/dev/nfx-tree/code/sm-chassis/chassisd-bc.c#12 edit
> ... //depot/dev/nfx-tree/code/sm-chassis/chassisd-cg.c#10 edit
> ... //depot/dev/nfx-tree/code/sm-chassis/chassisd-msg.c#12 edit
> ... //depot/dev/nfx-tree/code/ssc-genlib/genlib.c#1 edit
> ========================================================================
> =
> 
> Since I need to provide a patch for 3.2.0.5, I have made similar
> changes in r320 branch. Please review them too. 
> 
> P4CLIENT=kumarv-r320rel
> P4 Change Id: 29678
> PATH: /homes/kumarv/work/r320rel/
> ========================================================================
> ====
> kumarv@linux-compile>p4 describe 29678
> Change 29678 by perforce@kumarv-r320rel on 2008/06/13 15:26:31
> *pending*
> 
>            TED00022005 (LSI-PA 6989) Each system hung after a number
> of crashes
>         (minority pcc state)
> 
>         Fix Description:
>         Make pgid of reboot process different from pm process group.
>     The processes in pm group may receive SIGTERM during the reboot
> operation.
> 
> Affected files ...
> 
> ... //depot/r320rel/nfx-tree/code/sm-chassis/chassisd-bc.c#1 edit
> ... //depot/r320rel/nfx-tree/code/sm-chassis/chassisd-msg.c#1 edit
> ... //depot/r320rel/nfx-tree/code/sm-chassis/chassisd.c#1 edit
> ... //depot/r320rel/nfx-tree/code/ssc-genlib/cm-reboot-linux.c#1 edit
> ... //depot/r320rel/nfx-tree/code/ssc-genlib/cm-reboot-openbsd.c#1
> edit ... //depot/r320rel/nfx-tree/code/ssc-genlib/genlib-linux.c#2
> edit ... //depot/r320rel/nfx-tree/code/ssc-genlib/genlib-openbsd.c#1
> edit
> 
> ========================================================================
> ====
> 
> Thanks,
> Kumar.
> 
> -----Original Message-----
> From: Andy Sharp 
> Sent: Thursday, June 12, 2008 1:16 PM
> To: Kumar Vakacharla (HCL)
> Subject: Re: Review Request : TED22005
> 
> Feel free to come by and talk about it.  Right now, I'm doing some
> follow on work to some code that Chris Vandever is soon to check in
> that will be the start of an attempt to consolidate all attempts to
> reboot the system from our code, including daemons and nfxsh.  So
> perhaps if you concentrated on adding it to the genlib code, that
> might be enough for now, and the other places in our code that
> unwisely do something like system("reboot") on their own will be
> cleaned up later.
> 
> BTW, I don't think the reboot program should be immune to SIGTERM.
> Perhaps I might want to be able to kill the reboot program from some
> other program, who knows?
> 
> 
> On Thu, 12 Jun 2008 12:15:37 -0700 "Kumar Vakacharla (HCL)"
> <kumarv@onstor.com> wrote:
> 
> > Hi Andy, 
> > 
> > I understand it. In fact I have tried similar thing in our code
> > initially. Then I realized that there are many places we reboot the
> > system using system("reboot")". So I thought instead of changing it
> > in multiple places I can make it in reboot code of BSD itself so
> > that even future calls to system(reboot) won't break it. I think
> > reboot process is not supposed to be terminated by SIGTERM from the
> > other processes and that's why I made the fix there.  
> > 
> > Anyways, I will try to do it as you suggested.
> > 
> > Thanks,
> > Kumar.
> > 
> > -----Original Message-----
> > From: Andy Sharp 
> > Sent: Thursday, June 12, 2008 11:48 AM
> > To: Kumar Vakacharla (HCL)
> > Subject: Re: Review Request : TED22005
> > 
> > Hi Kumar,
> > 
> > I've had a chance to take a look at this, and while you're right,
> > this is one viable approach, I would much prefer to stick to a
> > design philosophy of modifying our code first and system/distro
> > code only as a last resort.
> > 
> > Can we instead code up a method whereby the reboot command is run
> > in a process that is not part of the initial process group?  Ie, do
> > a fork;setpgrp;do_system(reboot) kind of thing?
> > 
> > Thanks,
> > 
> > a
> > 
> > On Tue, 10 Jun 2008 18:28:36 -0700 "Kumar Vakacharla (HCL)"
> > <kumarv@onstor.com> wrote:
> > 
> > > Andy, 
> > > 
> > >  
> > > 
> > > Can you please review the fix for this defect?
> > > 
> > >  
> > > 
> > >  
> > > 
> > > Defect : 
> > > 
> > >  
> > > 
> > > TED00022005 (LSI-PA 6989) Each system hung after a number of
> > > crashes (minority pcc state)
> > > 
> > >  
> > > 
> > >  
> > > 
> > > Root Cause:  
> > > 
> > > "reboot" process is getting killed in the middle of reboot
> > > operation hence the system hangs. 
> > > 
> > >  
> > > 
> > > Details: 
> > > 
> > >  
> > > 
> > > During the reboot process... ".  
> > > 
> > > -          reboot program (/sbin/reboot) issues "kill(-1,
> > > SIGTERM)" to kill all the processes in the system except "init"
> > > and himself.
> > > 
> > > -          When any of the forked shells (e.g. "sh support.sh" or
> > > shells created by system command) receives this signal they
> > > sometimes in turn send that signal to all the group using kill(0,
> > > SIGTERM). Since the reboot process also belongs to the same
> > > process group it gets killed hence the system hangs. 
> > > 
> > >  
> > > 
> > > Fix Description: 
> > > 
> > >  
> > > 
> > >             Initially tried by cleaning up the processes
> > > (support.sh. emrscron, pm, etc) before reboot issues "kill (-1,
> > > SIGTERM)". Also tried changing the order in which we terminate
> > > these processes during the cleanup.  But both of these approaches
> > > didn't work out. 
> > > 
> > >  
> > > 
> > >             Finally, the fix would be to ignore the SIGTERM signal
> > > during "reboot" operation. 
> > > 
> > >  
> > > 
> > >  
> > > 
> > > Affected Files:
> > > 
> > > *         /homes/kumarv//work/dev/openbsd/src/sbin/reboot/reboot.c
> > > 
> > >  
> > > 
> > > P4CLIENT=kumarv-DEV
> > > 
> > > P4 Change Id: 29620
> > > 
> > >  
> > > 
> > > Please let me know if you need any clarifications. 
> > > 
> > >  
> > > 
> > >  
> > > 
> > > Thanks,
> > > Kumar.
> > > 
