X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C88EE2.FD89F288@onstor-exch02.onstor.net>; Tue, 25 Mar 2008 18:44:54 -0700
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Content-class: urn:content-classes:message
Subject: RE: vsd issue
Date: Tue, 25 Mar 2008 18:44:54 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E06B16DD2@onstor-exch02.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E07A8DAF6@onstor-exch02.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: vsd issue
Thread-Index: AciGNr/5cBUGswnLTeeX4tKpkACmVgAAB0mwAABa5WAAANiCEAABy0ZTABqcrJIAYwkuEAAIHWOgAA52C0AABDt9UAAD4CIfAAYL8n0AFpv0IAAKfrswAACCJ/ABY3KTcAAActDQ
From: "Svati Chandra" <schandra@onstor.com>
To: "Mike Lee" <mike.lee@onstor.com>,
	"Tim Gardner" <tim.gardner@onstor.com>,
	"Rendell Fong" <rendell.fong@onstor.com>,
	"Kumar Vakacharla (HCL)" <kumarv@onstor.com>
Cc: "Larry Scheer" <larry.scheer@onstor.com>,
	"Maxim Kozlovsky" <maxim.kozlovsky@onstor.com>,
	"Andy Sharp" <andy.sharp@onstor.com>


Hi everyone,

It appears that the default semantic would be BSD.
Could someone please confirm..

Thanks,
Svati.



/usr/include/signal.h

     87 /* Set the handler for the signal SIG to HANDLER, returning the
old
     88    handler, or SIG_ERR on error.
     89    By default `signal' has the BSD semantic.  */
     90 __BEGIN_NAMESPACE_STD
     91 #ifdef __USE_BSD
     92 extern __sighandler_t signal (int __sig, __sighandler_t
__handler)
     93      __THROW;
     94 #else
     95 /* Make sure the used `signal' implementation is the SVID
version. */
     96 # ifdef __REDIRECT_NTH
     97 extern __sighandler_t __REDIRECT_NTH (signal,
     98                                       (int __sig, __sighandler_t
_handler),
     99                                       __sysv_signal);








-----Original Message-----
From: Mike Lee=20
Sent: Tuesday, March 25, 2008 6:30 PM
To: Tim Gardner; Svati Chandra; Rendell Fong; Kumar Vakacharla (HCL)
Cc: Larry Scheer; Maxim Kozlovsky; Andy Sharp
Subject: RE: vsd issue


Tim:
Recall the following email that I forwarded you, which initiated the
conversation effort that Kumar has completed and I'm reviewing. =20
I just consulted Larry to make sure that we're using bsd/signal.h for
our bobcat/cheetach compile, and it does not seem like we are.  I will
forward Larry's finding shortly.=20
However, as such, do you still think we need to go through with this
conversion?
Thanks.
-Mike


-----Original Message-----
From: Svati Chandra=20
Sent: Tuesday, March 18, 2008 3:47 PM
To: Rendell Fong
Cc: Mike Lee
Subject: RE: vsd issue



Hi Rendell,

Linux and BSD semantics for the signal() function differ=20
as described below. BSD implements reliable signals,
to implement the same behavior on Linux we would need
to compile with #include <bsd/signal.h>=20

http://tldp.org/LDP/lpg/node136.html

When using bsd/signal.h, setjmp and longjmp will save and
restore the signal mask, but signal.h will not.
However sigrelse is not available in Linux, so maybe it's
better to use the POSIX sigsetjmp and siglongjmp methods
if we want to use reliable signals.

Let me know how things work out with this change.

In general, "sigaction" is more portable than "signal"..

Thanks,
Svati.




-----Original Message-----
From: Svati Chandra=20
Sent: Tuesday, March 18, 2008 10:36 AM
To: Mike Lee; Rendell Fong
Cc: Tim Gardner
Subject: RE: vsd issue


Hi Mike,

The good thing is that you can reproduce it.
Could you try with a change that saves the transaction entries on stack
and doesn't clear them as they complete. This may help us localize the
area of structure corruption.

In the function is vsd_popTxnState, we could bracket the area that
clears
the stack under #ifndef DEBUG.

    331 #ifndef DEBUG
    332     txn->txnStack[txn->numStackElements].procFunc      =3D NULL;
    333     txn->txnStack[txn->numStackElements].context       =3D NULL;
    334     txn->txnStack[txn->numStackElements].returnState   =3D 0;
    335     txn->txnStack[txn->numStackElements].evtDetectFlag =3D 0;
    336 #endif


Thanks Mike,
Svati.





-----Original Message-----
From: Mike Lee=20
Sent: Tuesday, March 18, 2008 12:48 AM
To: Rendell Fong; Svati Chandra
Cc: Raj Kumar; Tim Gardner
Subject: RE: vsd issue


Rendell:
I just tried it.  The change did not cure vsd, and vsd crashed after 3
enables/disables.
However, I think we still need to take care of the rmc changes you
indicated.
Thanks.
-Mike

-----Original Message-----
From: Rendell Fong
Sent: Mon 3/17/2008 10:06 PM
To: Mike Lee; Svati Chandra
Cc: Raj Kumar; Tim Gardner
Subject: RE: vsd issue
=20
I haven't tried it.  If you start down that path, there's also several
other places inside of rmc that might need the same adjustment as well.

So far I've linked in the dmalloc library and done some testing.  The
problems seem to go away with dmalloc usage.
This suggests that there's something that is timing sensitive because
dmalloc will just slow things down.

Or perhaps there's conflicts between malloc/free system calls and the
rmc signal handler?



-----Original Message-----
From: Mike Lee
Sent: Mon 3/17/2008 8:10 PM
To: Mike Lee; Rendell Fong; Svati Chandra
Cc: Raj Kumar
Subject: RE: vsd issue
=20
Rendell & Svati:

I noticed that vsd_receiveMessage() is using nfx_select(), which uses
select(), which as Chris V discovered last week, requires timeout
adjustment.
Have we tried making that change yet?

I don't think that it is solution, since I observed a corrupt program
counter between two of my breakpoints (that somehow recovered).

However, I think it is worthwhile to fix this one if it hasn't been done
yet.

-Mike
-----Original Message-----
From: Mike Lee=20
Sent: Monday, March 17, 2008 5:03 PM
To: Rendell Fong; Svati Chandra
Subject: RE: vsd issue


Rendell:
If you/Svati are still on this one, is this problem reproducible?
If so, can u please tell me how?
Thanks.
-Mike

-----Original Message-----
From: Rendell Fong=20
Sent: Monday, March 17, 2008 10:23 AM
To: Svati Chandra
Cc: Mike Lee
Subject: RE: vsd issue


I'm open to suggestions.  Where the crash occurs does not seem
indicative of the area in which the bug is.

After studying the vsd code, I've noticed the following so far:

1) vs-daemon.c 1ine 12660: same flag tested 3 times

2) vs-daemon.c 1ine 6354: when ctx is NULL can't goto
VSD_CREATE_VS_RUN_TIME_FAILED since ctx is later referenced (must be
non-NULL)

3) vs-util.c: vsd_allocTxn() always allocates transaction entries
starting from the beginning of the array.  Not necessarily an error but
may cause messages from prior transaction to get processed in the
context of a new transaction.


-----Original Message-----
From: Svati Chandra
Sent: Monday, March 17, 2008 7:21 AM
To: Rendell Fong; Mike Lee
Cc: Tim Gardner
Subject: RE: vsd issue


Hi Rendell,

dmalloc has a couple of tunables, e.g how often the heap is checked,
etc.
Maybe we could verify/experiment with those.

Thanks,
Svati.


-----Original Message-----
From: Rendell Fong=20
Sent: Saturday, March 15, 2008 8:00 AM
To: Mike Lee
Cc: Svati Chandra; Tim Gardner
Subject: RE: vsd issue

I think that VSD_RETURN_TO_PARENT pops the current vsdTxn (ctx entry)
off the stack.  When the same ctx entry is reused the context parameter
is being initialized (set to NULL).  So I don't think its necessary.
What I don't understand is why dmalloc didn't catch this problem.


-----Original Message-----
From: Mike Lee
Sent: Fri 3/14/2008 7:18 PM
To: Rendell Fong
Cc: Svati Chandra; Tim Gardner
Subject: RE: vsd issue
=20
Rendell:

At the very least, do you think we should set txn->context to NULL after
those three calls to VSD_CHECK_AND_FREE_PTR()?

-Mike

-----Original Message-----
From: Rendell Fong
Sent: Fri 3/14/2008 6:28 PM
To: Mike Lee
Cc: Svati Chandra; Tim Gardner
Subject: RE: vsd issue
=20
After further analysis that place wasn't it.  Somehow the ctx->req
address is invalid such that a seg fault occurs when it is freed.  Where
it's getting messed up isn't obvious at the moment.=20


-----Original Message-----
From: Rendell Fong=20
Sent: Friday, March 14, 2008 6:03 PM
To: Mike Lee
Cc: Svati Chandra; Tim Gardner
Subject: FW: vsd issue

It looks like vs-daemon.c: line 12629 needs to be deleted.


-----Original Message-----
From: Rendell Fong=20
Sent: Friday, March 14, 2008 5:56 PM
To: Mike Lee; Svati Chandra
Cc: Tim Gardner
Subject: RE: vsd issue

This is the vsd crash that we were able to reproduce by just enabling
the vsvr CSOAK-VS3 on g11r10.

Program received signal SIGSEGV, Segmentation fault.
0x2b6df5f4 in free () from /lib/libc.so.6
(gdb) bt
#0  0x2b6df5f4 in free () from /lib/libc.so.6
#1  0x0043e06c in vsd_removeVsReqProc (vsdTxn=3D0x2bbb3008) at
vs-daemon.c:12733
#2  0x00453530 in vsd_processTxnRunQueue (maxProcess=3D128) at
vs-util.c:1455
#3  0x0040ddf8 in main (argc=3D2, argv=3D0x7f914e24) at vs-daemon.c:2736
(gdb)

-----Original Message-----
From: Mike Lee=20
Sent: Friday, March 14, 2008 5:52 PM
To: Svati Chandra
Cc: Jobi Ariyamannil; Rendell Fong; Vikas Saini
Subject: vsd issue


Hi Svati:

Rendell and Vikas are chasing a VSD crash related to free(). =20

I recall that you had found a TAILQ related bug last night.
If so, do you plan to check in a fix for it?

Thanks.
-Mike






