AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20070719093940.7ee4a0bc@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<charissa.willard@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E04936D6D@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Thu, 19 Jul 2007 09:39:51 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Charissa Willard" <charissa.willard@onstor.com>
Subject: Re: BSD crashes: fs_abort
Message-ID: <20070719093951.06cf4abb@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E04936D6D@onstor-exch02.onstor.net>
References: <BB375AF679D4A34E9CA8DFA650E2B04E04936D6D@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Asserts should become nop's in production builds, but otherwise be
basically the same as panic in debug builds.

Cheers,

a

On Thu, 19 Jul 2007 08:13:45 -0700 "Charissa Willard"
<charissa.willard@onstor.com> wrote:

> Andy,
> 
>  
> 
> When a function calls PANIC or ASSERT does that cause a BSD crash? In
> particular, I see a lot of "Panic: fs_abort"s (which is coming from
> the FS code) in the 1.2 files in /var/crash.
> 
>  
> 
> Thanks,
> 
> Charissa
> 
>  
> 
> ________________________________
> 
> From: Jobi Ariyamannil 
> Sent: Wednesday, July 18, 2007 4:24 PM
> To: Charissa Willard
> Subject: RE: BSD crashes: fs_abort
> 
>  
> 
> BSD crashes are not because of filesystem problems.  You should not be
> looking at 1.2 crash file when BSD crashes.
> 
> 1.2 keeps tracks of FP crashes and there should be another crash file
> which keeps tracks of BSD crashes.
> 
>  
> 
> Regards,
> 
> Jobi
> 
>  
> 
> ________________________________
> 
> From: Charissa Willard 
> Sent: Wednesday, July 18, 2007 4:20 PM
> To: Jobi Ariyamannil
> Subject: RE: BSD crashes: fs_abort
> 
>  
> 
> We should be able to add code to distinguish where the panics are
> being thrown. This will help us debug the root cause of why the
> panics are being thrown. (I found that "failing now" is coming from
> rmc_util for RMC_FAIL cases. That's the least helpful message I've
> found so far.) 
> 
>  
> 
> I got the information for the fs_abort from looking at /var/crash on
> eng93, eng62 and g2r5 (see bugs below). These were all in triage this
> morning. With regard to the one on g2r5 (number 3 below), Erik was
> running a script that may not be a real work flow. I also noticed he
> has other crash dumps on his system. 
> 
>  
> 
> 1.) TED00019839 - BSD Crash: panic: malloc: out of space in kmem_map
> (eng62)
> 
>  
> 
> eng62# ls -l
> 
> total 213
> 
> -rw-r--r--  1 root  wheel  45850 Jul 13 10:05 0.0
> 
> -rw-r--r--  1 root  wheel  71241 Jul 13 09:48 1.2
> 
> -rw-r--r--  1 root  wheel  77679 Jul 13 10:05 2.0
> 
> -rw-r--r--  1 root  wheel  20735 Jul 13 09:55 locks
> 
> -rw-r--r--  1 root  wheel      5 Feb  7 21:53 minfree
> 
> eng62# more 1.2
> 
> "TERMCAP", line 0, col 15, terminal 'vt100': older tic versions may
> treat the description field as an alias
> 
>  
> 
> crashdump_begin: Thu Jul 12 09:01:19 2007
> 
>  
> 
> Crashdump:
> 
> ----------
> 
> Panic : fs_abort
> 
> Image Version : NFP_FP : R3.1.0.0BCDBG-070207 : Mon Jul  2 22:33:59
> 2007
> 
> PROM Version  : PROM_SIBYTE_BC : prom-2.0.0 : Mon May 16 15:58:00 2005
> 
> Boot Time  : Wed Jul 11 22:06:28 GMT 2007
> 
> Crash Time : Thu Jul 12 06:20:57 GMT 2007
> 
>  
> 
>  
> 
>  
> 
> 2.) TED00019839 - BSD crash followed by FP crash on eng93 two days in
> a row (would probably repeat again) (eng93)
> 
>  
> 
> # cd /var/crash
> 
> 
> # ls -l
> 
> total 212
> 
> -rw-r--r--  1 root  wheel  18340 Jul 15 19:36 0.0
> 
> -rw-r--r--  1 root  wheel  82136 Jul 18 08:58 1.2
> 
> -rw-r--r--  1 root  wheel   5220 Jul 17 08:48 locks
> 
> -rw-r--r--  1 root  wheel      5 Jun  4 15:54 minfree
> 
> # more 1.2
> 
> "TERMCAP", line 0, col 15, terminal 'vt100': older tic versions may
> treat the description field as an alias
> 
> crashdump_begin: Sat Jul 14 18:39:42 2007
> 
>  
> 
> Crashdump:
> 
> ----------
> 
> Panic : fs_abort
> 
> Image Version : NFP_FP : R3.1.0.0BCDBG-071207 : Thu Jul 12 22:11:17
> 2007
> 
> PROM Version  : PROM_SIBYTE_BC : prom-2.0.2 : Fri Apr  7 13:13:34 2006
> 
> Boot Time  : Fri Jul 13 11:09:40 GMT 2007
> 
> Crash Time : Fri Jul 13 23:08:41 GMT 2007
> 
> Regs  :
> 
>  
> 
>  
> 
> 3.) TED00019861 - BSD crashed and filer rebooted when TTE tests were
> started on TTE filer (g4r5) 
> 
>  
> 
> g2r5# more 1.2
> 
> "TERMCAP", line 0, col 15, terminal 'vt100': older tic versions may
> treat the description field as an alias
> 
> Jul  6 11:00:02 g2r5 newsyslog[2967]: logfile turned over
> 
> crashdump_begin: Mon Jul  9 06:56:36 2007
> 
>  
> 
> Crashdump:
> 
> ----------
> 
> Panic : fs_abort
> 
> Image Version : NFP_FP : R3.1.0.0BCDBG-070207 : Mon Jul  2 22:33:59
> 2007
> 
> PROM Version  : PROM_SIBYTE_BC : prom-2.0.4 : Fri Sep 29 19:04:54 2006
> 
>  
> 
>  
> 
> ________________________________
> 
> From: Jobi Ariyamannil 
> Sent: Wednesday, July 18, 2007 3:41 PM
> To: Charissa Willard
> Subject: RE: BSD crashes: fs_abort
> 
>  
> 
> fs_abort can happen from dump and mirroring code as well, which can be
> figured out by looking at the stack.
> 
> Do we know the stack traces of the assert hit ?  What are the defect
> numbers ?
> 
>  
> 
> Regards,
> 
> Jobi
> 
>  
> 
> ________________________________
> 
> From: Charissa Willard 
> Sent: Wednesday, July 18, 2007 2:59 PM
> To: Jobi Ariyamannil
> Subject: RE: BSD crashes: fs_abort
> 
>  
> 
>  
> 
> Should these be assigned to you then?
> 
>  
> 
> ________________________________
> 
> From: Jobi Ariyamannil 
> Sent: Wednesday, July 18, 2007 2:33 PM
> To: Charissa Willard
> Subject: RE: BSD crashes: fs_abort
> 
>  
> 
> Yes Charissa.
> 
>  
> 
> ________________________________
> 
> From: Charissa Willard 
> Sent: Wednesday, July 18, 2007 2:32 PM
> To: Jobi Ariyamannil
> Subject: BSD crashes: fs_abort
> 
>  
> 
> Jobi,
> 
>  
> 
> John K. is seeing several BSD crashes on his system (eng93). Raj has
> one of these, too, on eng62. Does the "Panic: fs_abort" mean that
> this is coming from the FS code?
> 
>  
> 
> Thanks,
> 
> Charissa
> 
>  
> 
> #
> cd /var/crash                                                         
> 
> # ls -l
> 
> total 212
> 
> -rw-r--r--  1 root  wheel  18340 Jul 15 19:36 0.0
> 
> -rw-r--r--  1 root  wheel  82136 Jul 18 08:58 1.2
> 
> -rw-r--r--  1 root  wheel   5220 Jul 17 08:48 locks
> 
> -rw-r--r--  1 root  wheel      5 Jun  4 15:54 minfree
> 
> # more 1.2
> 
> "TERMCAP", line 0, col 15, terminal 'vt100': older tic versions may
> treat the description field as an alias
> 
>  
> 
>  
> 
>  
> 
> crashdump_begin: Sat Jul 14 18:39:42 2007
> 
>  
> 
>  
> 
> Crashdump:
> 
> ----------
> 
> Panic : fs_abort
> 
> Image Version : NFP_FP : R3.1.0.0BCDBG-071207 : Thu Jul 12 22:11:17
> 2007
> 
> PROM Version  : PROM_SIBYTE_BC : prom-2.0.2 : Fri Apr  7 13:13:34 2006
> 
> Boot Time  : Fri Jul 13 11:09:40 GMT 2007
> 
> Crash Time : Fri Jul 13 23:08:41 GMT 2007
> 
> Regs  :
> 
>  
> 
