X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C80B6C.494E43CF@onstor-exch02.onstor.net>; Wed, 10 Oct 2007 10:35:09 -0800
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Content-class: urn:content-classes:message
Subject: RE: common ssc core stack
Date: Wed, 10 Oct 2007 10:35:09 -0800
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E05EF11A5@onstor-exch02.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E030E38D0@onstor-exch02.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: common ssc core stack
Thread-Index: AcgK098jmD4/9WmiSQuOFK8Q3ghr0QAAJYlAAACvXFAAJTUtwA==
References: <BB375AF679D4A34E9CA8DFA650E2B04E05EF0DCB@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E030E38D0@onstor-exch02.onstor.net>
From: "Maxim Kozlovsky" <maxim.kozlovsky@onstor.com>
To: "Mike Lee" <mike.lee@onstor.com>,
	"dl-Cougar" <dl-Cougar@onstor.com>

I've just got a "good" core from eventd at eng131 together with a log
from dmalloc, according to which there is a memcpy overwriting the
allocated memory.

If you want to take a look, the core file and the logs are in
eng131:/var/run.

>-----Original Message-----
>From: Mike Lee
>Sent: Tuesday, October 09, 2007 5:52 PM
>To: dl-Cougar
>Subject: RE: common ssc core stack
>
>Thanks all.
>Since Max had already introduced dmalloc into our code, I will look
into
>dmalloc first.
>However, I don't think it is part of the "golden" root file system yet.
>I will try to install it first; I'll check out mpatrol if dmalloc does
not
>work out.
>-Mike
>
>-----Original Message-----
>From: Tim Gardner
>Sent: Tuesday, October 09, 2007 5:30 PM
>To: Mike Lee; dl-Cougar
>Subject: RE: common ssc core stack
>
>
>Exact same core that I saw when I worked on the spm problem that turned
out
>to be a free of a pointer that was offset from the location returned
from
>the malloc. Linux memory allocation/deallocation appears to be much
more
>strict about proper coding than BSD.
>
>Tim
>
>-----Original Message-----
>From: Mike Lee
>Sent: Tuesday, October 09, 2007 5:24 PM
>To: dl-Cougar
>Subject: common ssc core stack
>
>
>Hi Team:
>
>In three of the ssc daemon crashes I've analyzed thus far, we're
getting
>the same stack:
>
>Program terminated with signal 6, Aborted.
>#0  0x2b52ab04 in kill () from /lib/libc.so.6
>(gdb) where
>#0  0x2b52ab04 in kill () from /lib/libc.so.6
>#1  0x2b52c200 in abort () from /lib/libc.so.6
>#2  0x2b568454 in __fsetlocking () from /lib/libc.so.6
>#3  0x2b568454 in __fsetlocking () from /lib/libc.so.6
>Previous frame identical to this frame (corrupt stack?)
>(gdb)
>The instruction address on the stack frames are not the same, but the
>function names are.
>
>Specifically, this stack was observed in:
>defect 20632 - spm crash
>defect 20649 - vsd crash
>defect 20651 - sanmd crash
>
>So, I think we're seeing manifestations of the same problem.
>Please let me know if you have recommendations/insights on this
symptom.
>For now, I'm trying to reproduce the crash on a case-by-case basis.
>
>Thanks.
>
>-Mike
