AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080612114655.6c1ce900@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<kumarv@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E0A60C96D@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Thu, 12 Jun 2008 11:48:27 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Kumar Vakacharla (HCL)" <kumarv@onstor.com>
Subject: Re: Review Request : TED22005
Message-ID: <20080612114827.609ea49c@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E0A60C96D@onstor-exch02.onstor.net>
References: <BB375AF679D4A34E9CA8DFA650E2B04E0A60C96D@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Hi Kumar,

I've had a chance to take a look at this, and while you're right, this
is one viable approach, I would much prefer to stick to a design
philosophy of modifying our code first and system/distro code only as a
last resort.

Can we instead code up a method whereby the reboot command is run in a
process that is not part of the initial process group?  Ie, do a
fork;setpgrp;do_system(reboot) kind of thing?

Thanks,

a

On Tue, 10 Jun 2008 18:28:36 -0700 "Kumar Vakacharla (HCL)"
<kumarv@onstor.com> wrote:

> Andy, 
> 
>  
> 
> Can you please review the fix for this defect?
> 
>  
> 
>  
> 
> Defect : 
> 
>  
> 
> TED00022005 (LSI-PA 6989) Each system hung after a number of crashes
> (minority pcc state)
> 
>  
> 
>  
> 
> Root Cause:  
> 
> "reboot" process is getting killed in the middle of reboot operation
> hence the system hangs. 
> 
>  
> 
> Details: 
> 
>  
> 
> During the reboot process... ".  
> 
> -          reboot program (/sbin/reboot) issues "kill(-1, SIGTERM)" to
> kill all the processes in the system except "init" and himself.
> 
> -          When any of the forked shells (e.g. "sh support.sh" or
> shells created by system command) receives this signal they sometimes
> in turn send that signal to all the group using kill(0, SIGTERM).
> Since the reboot process also belongs to the same process group it
> gets killed hence the system hangs. 
> 
>  
> 
> Fix Description: 
> 
>  
> 
>             Initially tried by cleaning up the processes (support.sh.
> emrscron, pm, etc) before reboot issues "kill (-1, SIGTERM)". Also
> tried changing the order in which we terminate these processes during
> the cleanup.  But both of these approaches didn't work out. 
> 
>  
> 
>             Finally, the fix would be to ignore the SIGTERM signal
> during "reboot" operation. 
> 
>  
> 
>  
> 
> Affected Files:
> 
> *         /homes/kumarv//work/dev/openbsd/src/sbin/reboot/reboot.c
> 
>  
> 
> P4CLIENT=kumarv-DEV
> 
> P4 Change Id: 29620
> 
>  
> 
> Please let me know if you need any clarifications. 
> 
>  
> 
>  
> 
> Thanks,
> Kumar.
> 
