AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080402142340.387c4a64@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<brian.stark@onstor.com>,<manohar.divate@onstor.com>,<dl-Cougar@onstor.com>,<chris.vandever@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E09321DB2@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Wed, 2 Apr 2008 14:23:54 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Brian Stark" <brian.stark@onstor.com>
Cc: "Manohar Divate" <manohar.divate@onstor.com>, "dl-Cougar"
 <dl-Cougar@onstor.com>, "Chris Vandever" <chris.vandever@onstor.com>
Subject: Re: Data bus error, epc == ffffffff8218afe4, ra == ffffffff8204f614
Message-ID: <20080402142354.779271ea@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E09321DB2@onstor-exch02.onstor.net>
References: <BB375AF679D4A34E9CA8DFA650E2B04E050CFA07@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E09321DB2@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

It's definitely software.  A bug has already been filed for it.


On Wed, 2 Apr 2008 14:20:54 -0700 "Brian Stark"
<brian.stark@onstor.com> wrote:

> I don't this this is hardware.  A data bus error typically results
> when the CPU accesses a bogus address.
> 
> 
> Brian
> 
> 
> > _____________________________________________ 
> > From: 	Manohar Divate  
> > Sent:	Wednesday, April 02, 2008 1:56 PM
> > To:	dl-Cougar
> > Cc:	Chris Vandever
> > Subject:	Data bus error, epc == ffffffff8218afe4, ra ==
> > ffffffff8204f614
> > 
> > In a 3 node Cougar cluster
> > 
> > One node rebooted as expected when it lost its sc interface
> > ( ifconfig eth0 down)
> > After node joined the cluster the pcc started going down and hit
> > this panic
> > 
> > Is it a hardware eroor ?/
> > 
> > ThanX
> > manny
> > 
> > 
> > 
> > Apr  2 13:46:59 g9r204 : 0:0:vsd:ERROR: vsd_doDomainOp : Error doing
> > domain opearation for VS 1
> > Apr  2 13:46:59 g9r204 : 0:0:vsd:ERROR: vsd_doDomainOp : Error doing
> > domain opearation for VS 5
> > Apr  2 13:46:59 g9r204 : 0:0:vsd:ERROR: vsd_disableVsProc : Not able
> > to stop authentication service for VS 1
> > Apr  2 13:46:59 g9r204 : 0:0:vsd:ERROR: vsd_disableVsProc : Not able
> > to stop authentication service for VS 5
> > Apr  2 13:46:59 g9r204 : 0:0:vsd:ERROR: vsd_removeVsReqProc : remove
> > VS proc failed for VS 1
> > Apr  2 13:46:59 g9r204 : 0:0:vsd:ERROR: vsd_removeVsReqProc : remove
> > VS proc failed for VS 5
> > Apr  2 13:47:00 g9r204 : 0:0:vsd:ERROR: vsd_doDomainOp : Error doing
> > domain opearation for VS 3
> > Apr  2 13:47:00 g9r204 : 0:0:vsd:ERROR: vsd_disableVsProc : Not able
> > to stop authentication service for VS 3
> > Apr  2 13:47:00 g9r204 : 0:0:vsd:ERROR: vsd_removeVsReqProc : remove
> > VS proc failed for VS 3
> > Apr  2 13:47:00 g9r204 : 0:0:cluster2:ERROR: Node going down for
> > reboot! (cluster_server: invalidating clusDb).
> > Apr  2 13:47:01 g9r204 : 0:0:eventd:CRITICAL: Process-EVENT Node:
> > Name 'local', State Down, Msg 'Node going down for reboot!
> > (cluster_server: invalidating clusDb).'
> > Apr  2 13:47:01 g9r204 : 0:0:spm:NOTICE: spm_ncmNodeEvent: Lost
> > connect for
> > Apr  2 13:47:01 g9r204 : 0:0:spm:NOTICE: spm_ncmNodeEvent:
> > disconnected
> > INIT: Sending processes the TERM signal Rcvd post request APP:
> > unknown EVENT: N
> > Apr  2 13:47:01 g9r204 : 0:0:nfxsh:NOTICE: cmd[0]: clu show clu :
> > status[0]
> > Stopping deferred execution scheduler: atd.
> > Stopping periodic command scheduler: crond.
> > Stopping MTA: exim4_liste
> > Stopping internet superserver: inetd.
> > Stopping OpenBSD Secure Shell server: sshd.
> > Stopping automounter: done.
> > Stopping NTP server: ntpd.
> > Saving the system clock..
> > Stopping NFS common utilities: statd.
> > Stopping kernel log daemon: klogd.
> > Stopping system log daemon: syslogd.
> > Stopping ONStor services:DBE physical address: 0041001000
> > Data bus error, epc == ffffffff8218afe4, ra == ffffffff8204f614
> > Oops[#1]:
> > Cpu 0
> > $ 0   : 0000000000000000 0000000030001fe1 ffffffffffffffff
> > 9000000041001000
> > $ 4   : 0000000000000039 a8000000049bc800 0200000000000000
> > 0000000000000000
> > $ 8   : a8000000049bc800 9000000000000000 ffffffff822f019c
> > 6f62657220726f66
> > $12   : 0000000030001fe0 000000001000001f 0000000000000000
> > 620a7064752f3436
> > $16   : a80000000490e520 0000000000000000 0000000000000039
> > 0000000000000000
> > $20   : 0000000000000001 900000f81a7084a0 0000000000000500
> > a80000008eca88e0
> > $24   : 0000000000000010 ffffffff82195ec0
> > $28   : a80000008e4c4000 a80000008e4c78c0 0000000000000000
> > ffffffff8204f614
> > Hi    : 0000000000000000
> > Lo    : 0000000000001398
> > epc   : ffffffff8218afe4 yenta_interrupt+0x14/0x118     Not tainted
> > ra    : ffffffff8204f614 handle_IRQ_event+0x6c/0xe8
> > Status: 30001fe3    KX SX UX KERNEL EXL IE
> > Cause : 0080841c
> > PrId  : 00040103
> > Modules linked in: autofs4
> > Process eventd (pid: 996, threadinfo=a80000008e4c4000,
> > task=a80000008e4c0cb8)
> > Stack : ffffffff8204f614 fffffffffffffbff ffffffff822b6878
> > 0000000000000039
> >         a80000000490e520 fffffffffffffbff ffffffff82310000
> > ffffffff8204f764
> >         0200000000000000 a80000008e95a810 a80000008acdfb40
> > 900000008f0012e0
> >         900000008f004b40 ffffffff820011a4 0000000000000000
> > ffffffff82001840
> >         0000000000000000 00000000004aaa20 900000f81a5a2700
> > 00000000005a2700
> >         900000f81a5a2860 a80000008e95a972 00000000000003b0
> > ffffff0000000000
> >         0000000000000000 696f672065646f4e 206e776f6420676e
> > 6f62657220726f66
> >         0000000000000010 a80000008e4c7de8 0000000000000000
> > 620a7064752f3436
> >         0000000000000510 a80000008e95a810 a80000008acdfb40
> > 900000008f0012e0
> >         900000008f004b40 900000f81a7084a0 0000000000000500
> > a80000008eca88e0
> >         ...
> > Call Trace:
> > [<ffffffff8218afe4>] yenta_interrupt+0x14/0x118
> > [<ffffffff8204f614>] handle_IRQ_event+0x6c/0xe8
> > [<ffffffff8204f764>] __do_IRQ+0xd4/0x160
> > [<ffffffff820011a4>] plat_irq_dispatch+0x1e4/0x1f0
> > [<ffffffff82001840>] ret_from_irq+0x0/0x4
> > [<ffffffff8212e3e4>] src_unaligned_dst_aligned+0xc/0x50
> > [<ffffffff82195fd8>] mgmtbus_hard_start_xmit+0x118/0x178
> > [<ffffffff821a9ea4>] dev_queue_xmit+0x30c/0x458
> > [<ffffffff82220938>] eee_dgram_sendmsg+0x2b8/0x440
> > [<ffffffff82199540>] sock_sendmsg+0x98/0xe8
> > [<ffffffff821997d8>] sys_sendmsg+0x248/0x320
> > [<ffffffff8200fec8>] handle_sys+0x108/0x124
> > 
> > 
> > Code: ffbf0000  dca30010  8c620000 <0040202d> ac620000  dca60010
> > 8cc30000  90c20804  1480001f
> > Kernel panic - not syncing: Fatal exception in interrupt
> > Rebooting in 5 seconds..<2>SiByte Watchdog in danger of initiating
> > system reset in 4.1 seconds
> > SiByte Watchdog in danger of initiating system reset in 4.1 seconds
> > 
> > 
> > 
> > PowerOn Self Test........OK
> > 
> > Initializing System......please wait
> > 
> > irtual servers on nas gateway g7r204
> > 
> >  ID  State                             Name
> > ====================================================
> > 6    Enabled                           VS_MGMT_1883
> > Cluster Name: g9r204       Cluster State:   On
> > NAS Gateways        IP              State   PCC
> > ------------------------------------------------------
> > g9r204              10.2.204.9      UP      YES
> > g10r204             10.2.204.10     DOWN    NO
> > g7r204              10.2.204.7      UP      NO
> > Virtual servers on nas gateway g9r204
> > 
> >  ID  State                             Name
> > ====================================================
> > 1    Enabled                           VS_MGMT_1874
> > 2    Disabled                          G9R204-VS-2
> > 3    Enabled                           VLANTAG
> > 5    Enabled                           NOLPORT
> > 8    Disabled                          G10R204-VS-3
> > Virtual servers on nas gateway g7r204
> > 
> >  ID  State                             Name
> > ====================================================
> > 6    Enabled                           VS_MGMT_1883
> > Cluster Name: g9r204       Cluster State:   On
> > NAS Gateways        IP              State   PCC
> > ------------------------------------------------------
> > g9r204              10.2.204.9      UP      YES
> > g10r204             10.2.204.10     UP      NO
> > g7r204              10.2.204.7      UP      NO
> > Virtual servers on nas gateway g9r204
> > 
> >  ID  State                             Name
> > ====================================================
> > 1    Enabled                           VS_MGMT_1874
> > 2    Disabled                          G9R204-VS-2
> > 3    Enabled                           VLANTAG
> > 5    Enabled                           NOLPORT
> > 8    Disabled                          G10R204-VS-3
> > Virtual servers on nas gateway g7r204
> > 
> >  ID  State                             Name
> > ====================================================
> > 6    Enabled                           VS_MGMT_1883
> > Cluster Name: g9r204       Cluster State:   On
> > NAS Gateways        IP              State   PCC
> > ------------------------------------------------------
> > g9r204              10.2.204.9      UP      YES
> > g10r204             10.2.204.10     UP      NO
> > g7r204              10.2.204.7      UP      NO
> > Virtual servers on nas gateway g9r204
> > 
> >  ID  State                             Name
> > ====================================================
> > 1    Enabled                           VS_MGMT_1874
> > 2    Disabled                          G9R204-VS-2
> > 3    Enabled                           VLANTAG
> > 5    Enabled                           NOLPORT
> > 8    Disabled                          G10R204-VS-3
> > 
> > 
