X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C80AD7.CC1632A7@onstor-exch02.onstor.net>; Tue, 9 Oct 2007 16:52:13 -0800
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Content-class: urn:content-classes:message
Subject: RE: common ssc core stack
Date: Tue, 9 Oct 2007 16:52:13 -0800
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E030E38D0@onstor-exch02.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E05EF0DCB@onstor-exch02.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: common ssc core stack
Thread-Index: AcgK098jmD4/9WmiSQuOFK8Q3ghr0QAAJYlAAACvXFA=
From: "Mike Lee" <mike.lee@onstor.com>
To: "dl-Cougar" <dl-Cougar@onstor.com>

Thanks all.
Since Max had already introduced dmalloc into our code, I will look into
dmalloc first.
However, I don't think it is part of the "golden" root file system yet.

I will try to install it first; I'll check out mpatrol if dmalloc does
not work out.
-Mike

-----Original Message-----
From: Tim Gardner=20
Sent: Tuesday, October 09, 2007 5:30 PM
To: Mike Lee; dl-Cougar
Subject: RE: common ssc core stack


Exact same core that I saw when I worked on the spm problem that turned
out to be a free of a pointer that was offset from the location returned
from the malloc. Linux memory allocation/deallocation appears to be much
more strict about proper coding than BSD.

Tim

-----Original Message-----
From: Mike Lee=20
Sent: Tuesday, October 09, 2007 5:24 PM
To: dl-Cougar
Subject: common ssc core stack


Hi Team:

In three of the ssc daemon crashes I've analyzed thus far, we're getting
the same stack:

Program terminated with signal 6, Aborted.
#0  0x2b52ab04 in kill () from /lib/libc.so.6
(gdb) where
#0  0x2b52ab04 in kill () from /lib/libc.so.6
#1  0x2b52c200 in abort () from /lib/libc.so.6
#2  0x2b568454 in __fsetlocking () from /lib/libc.so.6
#3  0x2b568454 in __fsetlocking () from /lib/libc.so.6
Previous frame identical to this frame (corrupt stack?)
(gdb)
The instruction address on the stack frames are not the same, but the
function names are.

Specifically, this stack was observed in:
defect 20632 - spm crash
defect 20649 - vsd crash=20
defect 20651 - sanmd crash

So, I think we're seeing manifestations of the same problem. =20
Please let me know if you have recommendations/insights on this symptom.
For now, I'm trying to reproduce the crash on a case-by-case basis.

Thanks.

-Mike
