AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080612131115.74f17a69@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<kumarv@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E0A6E85FD@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Thu, 12 Jun 2008 13:15:34 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Kumar Vakacharla (HCL)" <kumarv@onstor.com>
Subject: Re: Review Request : TED22005
Message-ID: <20080612131534.7825d252@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E0A6E85FD@onstor-exch02.onstor.net>
References: <BB375AF679D4A34E9CA8DFA650E2B04E0A60C96D@onstor-exch02.onstor.net>
	<20080612114827.609ea49c@ripper.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E0A6E85FD@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Feel free to come by and talk about it.  Right now, I'm doing some
follow on work to some code that Chris Vandever is soon to check in
that will be the start of an attempt to consolidate all attempts to
reboot the system from our code, including daemons and nfxsh.  So
perhaps if you concentrated on adding it to the genlib code, that might
be enough for now, and the other places in our code that unwisely do
something like system("reboot") on their own will be cleaned up later.

BTW, I don't think the reboot program should be immune to SIGTERM.
Perhaps I might want to be able to kill the reboot program from some
other program, who knows?


On Thu, 12 Jun 2008 12:15:37 -0700 "Kumar Vakacharla (HCL)"
<kumarv@onstor.com> wrote:

> Hi Andy, 
> 
> I understand it. In fact I have tried similar thing in our code
> initially. Then I realized that there are many places we reboot the
> system using system("reboot")". So I thought instead of changing it in
> multiple places I can make it in reboot code of BSD itself so that
> even future calls to system(reboot) won't break it. I think reboot
> process is not supposed to be terminated by SIGTERM from the other
> processes and that's why I made the fix there.  
> 
> Anyways, I will try to do it as you suggested.
> 
> Thanks,
> Kumar.
> 
> -----Original Message-----
> From: Andy Sharp 
> Sent: Thursday, June 12, 2008 11:48 AM
> To: Kumar Vakacharla (HCL)
> Subject: Re: Review Request : TED22005
> 
> Hi Kumar,
> 
> I've had a chance to take a look at this, and while you're right, this
> is one viable approach, I would much prefer to stick to a design
> philosophy of modifying our code first and system/distro code only as
> a last resort.
> 
> Can we instead code up a method whereby the reboot command is run in a
> process that is not part of the initial process group?  Ie, do a
> fork;setpgrp;do_system(reboot) kind of thing?
> 
> Thanks,
> 
> a
> 
> On Tue, 10 Jun 2008 18:28:36 -0700 "Kumar Vakacharla (HCL)"
> <kumarv@onstor.com> wrote:
> 
> > Andy, 
> > 
> >  
> > 
> > Can you please review the fix for this defect?
> > 
> >  
> > 
> >  
> > 
> > Defect : 
> > 
> >  
> > 
> > TED00022005 (LSI-PA 6989) Each system hung after a number of crashes
> > (minority pcc state)
> > 
> >  
> > 
> >  
> > 
> > Root Cause:  
> > 
> > "reboot" process is getting killed in the middle of reboot operation
> > hence the system hangs. 
> > 
> >  
> > 
> > Details: 
> > 
> >  
> > 
> > During the reboot process... ".  
> > 
> > -          reboot program (/sbin/reboot) issues "kill(-1, SIGTERM)"
> > to kill all the processes in the system except "init" and himself.
> > 
> > -          When any of the forked shells (e.g. "sh support.sh" or
> > shells created by system command) receives this signal they
> > sometimes in turn send that signal to all the group using kill(0,
> > SIGTERM). Since the reboot process also belongs to the same process
> > group it gets killed hence the system hangs. 
> > 
> >  
> > 
> > Fix Description: 
> > 
> >  
> > 
> >             Initially tried by cleaning up the processes
> > (support.sh. emrscron, pm, etc) before reboot issues "kill (-1,
> > SIGTERM)". Also tried changing the order in which we terminate
> > these processes during the cleanup.  But both of these approaches
> > didn't work out. 
> > 
> >  
> > 
> >             Finally, the fix would be to ignore the SIGTERM signal
> > during "reboot" operation. 
> > 
> >  
> > 
> >  
> > 
> > Affected Files:
> > 
> > *         /homes/kumarv//work/dev/openbsd/src/sbin/reboot/reboot.c
> > 
> >  
> > 
> > P4CLIENT=kumarv-DEV
> > 
> > P4 Change Id: 29620
> > 
> >  
> > 
> > Please let me know if you need any clarifications. 
> > 
> >  
> > 
> >  
> > 
> > Thanks,
> > Kumar.
> > 
