AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20090520111949.1fc8fec9@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:mail.onstor.net
NSV:
SSH:
R:<ed.kwan@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@exch1.onstor.net/INBOX	0	2779531E7C760D4491C96305019FEEB52AC92D8145@exch1.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Wed, 20 May 2009 11:20:13 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: Ed Kwan <ed.kwan@onstor.com>
Subject: Re: please review code change for TED 26769 GUI hangs, requires
 restarting sscccc
Message-ID: <20090520112013.57fede16@ripper.onstor.net>
In-Reply-To: <2779531E7C760D4491C96305019FEEB52AC92D8145@exch1.onstor.net>
References: <2779531E7C760D4491C96305019FEEB52AC92D80CB@exch1.onstor.net>
	<20090519232821.4aa8544e@ripper.onstor.net>
	<2779531E7C760D4491C96305019FEEB52AC92D8145@exch1.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Wed, 20 May 2009 00:25:22 -0700 Ed Kwan <ed.kwan@onstor.com> wrote:

> Line 370, -10 because the leak can be in the SSL library code, and I
> don't want to use up all the fd and have the failure there and
> possibly cause the GUI hang that the customer reported.  Maybe -5 or
> -3 is good enough.

-5 should be more than enough

> Line 373, sct_terminate() is a signal handler, and I thought the
> convention is not to call the handler directly.

doesn't bother me if it doesn't bother you.  but exit(1) works too ~:^)
i don't care, you can leave it like it is

> -----Original Message-----
> From: Andy Sharp 
> Sent: Tuesday, May 19, 2009 11:28 PM
> To: Ed Kwan
> Subject: Re: please review code change for TED 26769 GUI hangs,
> requires restarting sscccc
> 
> 
> = Change 32453 by edk@edk-dev on 2009/05/19 16:33:34 *pending*
> = 
> = 	For TED 26769 GUI hangs, requires restarting sscccc
> = 	Restart sscccc if we approach the "open file" resource
> limit. = 	Reviewed by
> = 
> 
> nfx-tree/code/sm-sct/sct.c
> 
>      line 370, why -10?  that's a lot of perfectly good fd's to waste
> 
>      line 373, why are you sending yourself a signal?  just call exit
>      or sct_terminate()
> 
> 
> 
> 
> 
> On Tue, 19 May 2009 16:35:11 -0700 Ed Kwan <ed.kwan@onstor.com> wrote:
> 
> > Hi Andy,
> > 
> > I'm implementing the quick and dirty fix of restarting sscccc if we
> > approach the "open file" resource limit.
> > 
> > [edk@edk-linux sm-sct]$ p4 info
> > User name: edk
> > Client name: edk-dev
> > Client host: edk-linux.onstor.net
> > Client root: /homes/edk/p4/dev
> > Current directory: /homes/edk/p4/dev/nfx-tree/code/sm-sct
> > Client address: 10.0.0.137:40471
> > Server address: liszt.onstor.net:1717
> > Server root: /p4data/p4root
> > Server date: 2009/05/19 16:33:42 -0700 PDT
> > Server version: P4D/LINUX26X86_64/2007.2/122958 (2007/05/23)
> > Server license: ONStor 40 users (support ends 2009/05/20)
> > 
> > [edk@edk-linux sm-sct]$ p4 describe 32453
> > Change 32453 by edk@edk-dev on 2009/05/19 16:33:34 *pending*
> > 
> >         For TED 26769 GUI hangs, requires restarting sscccc
> >         Restart sscccc if we approach the "open file" resource
> > limit. Reviewed by
> > 
> > Affected files ...
> > 
> > ... //depot/dev/nfx-tree/code/sm-sct/sct.c#14 edit
> > 
> > [edk@edk-linux sm-sct]$ p4 diff -dc ...
> > ==== //depot/dev/nfx-tree/code/sm-sct/sct.c#14
> > - /homes/edk/p4/dev/nfx-tree/code/sm-sct/sct.c ==== ***************
> > *** 29,35 ****
> >   #include <netinet/in.h>
> >   #include <arpa/inet.h>
> >   #include <errno.h>
> > !
> >   #include <time.h>
> > 
> >   #include "nfx-incl.h"
> > --- 29,35 ----
> >   #include <netinet/in.h>
> >   #include <arpa/inet.h>
> >   #include <errno.h>
> > ! #include <sys/resource.h>
> >   #include <time.h>
> > 
> >   #include "nfx-incl.h"
> > ***************
> > *** 235,240 ****
> > --- 235,241 ----
> >       struct timeval timeout={30,0}; //timeout 30 seconds
> >       int loop;
> >       fd_set rset;
> > +     struct rlimit rlim_nofile;
> > 
> >       // initialize variables...
> >       logMinTempFileSize = (unsigned long) -1;    // default - no
> > log ***************
> > *** 347,352 ****
> > --- 348,356 ----
> >       sct_start_listener();
> > 
> >       read_sscccc_hosts_deny();
> > +     getrlimit(RLIMIT_NOFILE, &rlim_nofile);
> > +     E_LOG(class_1, info_s, SSCCCC_APP_ID, 0, 0, 0,
> > +         ("soft nofile resource limit = %d",
> > (int)rlim_nofile.rlim_cur));
> > 
> >       // main loop
> >       while (1) {
> > ***************
> > *** 363,368 ****
> > --- 367,377 ----
> >           ccc_getfdset(&rset);
> > 
> >           maxfd = ccc_getmaxfd();
> > +         if (maxfd > (rlim_nofile.rlim_cur-10)) {
> > +             E_LOG(class_1, err_s, SSCCCC_APP_ID, 0, 0, 0,
> > +                   ("Too many fd allocated; restarting"));
> > +             kill(getpid(), SIGTERM);
> > +         }
> > 
> >           rmc_get_fds(&rset, &maxfd);
> > 
