AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:
CFG:
PT:0
S:andy.sharp@lsi.com
RQ:
SSV:mhbs.lsil.com
NSV:
SSH:
R:<Jan.Seidel@lsi.com>,<Chris.Greiveldinger@lsi.com>,<Jobi.Ariyamannil@lsi.com>,<Maxim.Kozlovsky@lsi.com>
MAID:2
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/LSI/INBOX	0	D41A5B864986B546A595D85C4701A9C0B70639F5@cosmail03.lsi.com
X-Sylpheed-End-Special-Headers: 1
Date: Wed, 17 Feb 2010 17:14:51 -0800
From: Andrew Sharp <andy.sharp@lsi.com>
To: "Seidel, Jan" <Jan.Seidel@lsi.com>
Cc: "Greiveldinger, Chris" <Chris.Greiveldinger@lsi.com>, "Ariyamannil,
 Jobi" <Jobi.Ariyamannil@lsi.com>, "Kozlovsky, Maxim"
 <Maxim.Kozlovsky@lsi.com>
Subject: Re: Debian 5 kernel errors
Message-ID: <20100217171451.552a3c82@ripper.onstor.net>
In-Reply-To: <D41A5B864986B546A595D85C4701A9C0B70639F5@cosmail03.lsi.com>
References: <D41A5B864986B546A595D85C4701A9C0B70639EA@cosmail03.lsi.com>
	<20100217155820.427b2d42@ripper.onstor.net>
	<D41A5B864986B546A595D85C4701A9C0B70639F5@cosmail03.lsi.com>
Organization: LSI
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Most likely becomes unresponsive because too many threads are blocked
waiting for I/O on the NFS volume that isn't responding.  Just like
when your workstation becomes mostly unuseable when MD is out to lunch.

Or it could be a bug in the kernel of some kind.  But that's less
likely.  Except for the fact that there are infinite bugs in the
kernel.  Oops, sorry, didn't mean to get all zen.


On Wed, 17 Feb 2010 17:09:42 -0700 "Seidel, Jan" <Jan.Seidel@lsi.com>
wrote:

> Alright, thanks for the information. I should have included this in
> the original e-mail: After several of these messages Chris' client
> became unresponsive. That's why I'm worried about this. I also copied
> this part below. I thought those 2 problems were related. But maybe
> they're not. Maybe the stress test is just overloading this client?
> 
> Feb 16 19:06:49 cgreivel-linux kernel: [777678.997621]
> ======================= Feb 16 19:06:49 cgreivel-linux kernel:
> [777678.997636] fsstress_oper D b9ae4430     0 13583      1 Feb 16
> 19:06:49 cgreivel-linux kernel: [777678.997642]        cb001140
> 00000046 f7ae1000 b9ae4430 00027a55 cb0012cc c180bfa0 00000000 Feb 16
> 19:06:49 cgreivel-linux kernel: [777678.997660]        f8bd2ba2
> c945ab84 00612005 00000000 f8bcda72 00000000 00000000 00000246 Feb 16
> 19:06:49 cgreivel-linux kernel: [777678.997672]        d0dbdd80
> 00000000 d0dbdd88 c1800644 f8c49749 c02b8b86 f8c49724 f03e2828 Feb 16
> 19:06:49 cgreivel-linux kernel: [777678.997697] Call Trace: Feb 16
> 19:06:49 cgreivel-linux kernel: [777678.997724]  [<f8bd2ba2>]
> __rpc_execute+0x5e/0x1d9 [sunrpc] Feb 16 19:06:49 cgreivel-linux
> kernel: [777678.997765]  [<f8bcda72>] rpc_run_task+0x40/0x45 [sunrpc]
> Feb 16 19:06:49 cgreivel-linux kernel: [777678.997815]  [<f8c49749>]
> nfs_wait_bit_killable+0x25/0x2a [nfs] Feb 16 19:06:49 cgreivel-linux
> kernel: [777678.997843]  [<c02b8b86>] __wait_on_bit+0x33/0x58 Feb 16
> 19:06:49 cgreivel-linux kernel: [777678.997852]  [<f8c49724>]
> nfs_wait_bit_killable+0x0/0x2a [nfs] Feb 16 19:06:49 cgreivel-linux
> kernel: [777678.997888]  [<f8c49724>] nfs_wait_bit_killable+0x0/0x2a
> [nfs] Feb 16 19:06:49 cgreivel-linux kernel: [777678.997932]
> [<c02b8c0a>] out_of_line_wait_on_bit+0x5f/0x67 Feb 16 19:06:49
> cgreivel-linux kernel: [777678.997954]  [<c01319c9>]
> wake_bit_function+0x0/0x3c Feb 16 19:06:49 cgreivel-linux kernel:
> [777678.997977]  [<f8c4971d>] nfs_wait_on_request+0x1d/0x24 [nfs] Feb
> 16 19:06:49 cgreivel-linux kernel: [777678.998014]  [<f8c4cb7a>]
> nfs_sync_mapping_wait+0xd2/0x245 [nfs] Feb 16 19:06:49 cgreivel-linux
> kernel: [777678.998080]  [<f8c4ceea>] __nfs_write_mapping+0x22/0x3b
> [nfs] Feb 16 19:06:49 cgreivel-linux kernel: [777678.998119]
> [<f8c4cf37>] nfs_write_mapping+0x34/0x52 [nfs] Feb 16 19:06:49
> cgreivel-linux kernel: [777678.998181]  [<f8c42c27>]
> nfs_do_fsync+0x13/0x2d [nfs] Feb 16 19:06:49 cgreivel-linux kernel:
> [777678.998211]  [<f8c43013>] nfs_file_flush+0x63/0x83 [nfs] Feb 16
> 19:06:49 cgreivel-linux kernel: [777678.998264]  [<c0172c8e>]
> filp_close+0x2e/0x53 Feb 16 19:06:49 cgreivel-linux kernel:
> [777678.998280]  [<c0123fc9>] put_files_struct+0x60/0xa8 Feb 16
> 19:06:49 cgreivel-linux kernel: [777678.998307]  [<c012508f>]
> do_exit+0x1fe/0x5bb Feb 16 19:06:49 cgreivel-linux kernel:
> [777678.998335]  [<c01254b0>] do_group_exit+0x64/0x8d Feb 16 19:06:49
> cgreivel-linux kernel: [777678.998352]  [<c012c755>]
> get_signal_to_deliver+0x30d/0x32d Feb 16 19:06:49 cgreivel-linux
> kernel: [777678.998385]  [<c0102fc6>] do_notify_resume+0x7e/0x649 Feb
> 16 19:06:49 cgreivel-linux kernel: [777678.998485]  [<c025a6ef>]
> net_rx_action+0x9c/0x1b9 Feb 16 19:06:49 cgreivel-linux kernel:
> [777678.998508]  [<c0174abe>] vfs_write+0xe5/0x120 Feb 16 19:06:49
> cgreivel-linux kernel: [777678.998534]  [<c017502e>]
> sys_write+0x3c/0x63 Feb 16 19:06:49 cgreivel-linux kernel:
> [777678.998561]  [<c01039a8>] work_notifysig+0x13/0x1b Feb 16
> 19:06:49 cgreivel-linux kernel: [777678.998618]
> ======================= Feb 16 19:11:56 cgreivel-linux kernel:
> [778007.605043] nfs: server 10.3.76.122 not responding, still trying
> Feb 16 19:11:56 cgreivel-linux kernel: [778007.605088] nfs: server
> 10.3.76.122 not responding, still trying Feb 16 19:11:56
> cgreivel-linux kernel: [778007.613035] nfs: server 10.3.76.122 not
> responding, still trying Feb 16 19:11:56 cgreivel-linux kernel:
> [778007.613080] nfs: server 10.3.76.122 not responding, still trying
> Feb 16 19:11:56 cgreivel-linux kernel: [778007.613102] nfs: server
> 10.3.76.122 not responding, still trying Feb 16 19:11:56
> cgreivel-linux Feb 17 09:13:18 cgreivel-linux syslogd 1.5.0#5:
> restart. Feb 17 09:13:18 cgreivel-linux kernel: klogd 1.5.0#5, log
> source = /proc/kmsg started. Feb 17 09:13:18 cgreivel-linux kernel:
> [    0.000000] Initializing cgroup subsys cpuset Feb 17 09:13:18
> cgreivel-linux kernel: [    0.000000] Initializing cgroup subsys cpu
> Feb 17 09:13:18 cgreivel-linux kernel: [    0.000000] Linux version
> 2.6.26-2-686 (Debian 2.6.26-19lenny1) (dannf@debian.org) (gcc version
> 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 SMP Sat Oct 17
> 17:59:23 UTC 200
> 
> 
> -----Original Message-----
> From: Andrew Sharp [mailto:andy.sharp@lsi.com] 
> Sent: Wednesday, February 17, 2010 3:58 PM
> To: Seidel, Jan
> Cc: Greiveldinger, Chris; Ariyamannil, Jobi; Kozlovsky, Maxim
> Subject: Re: Debian 5 kernel errors
> 
> Nothing to worry about.  I don't know the exact circumstances that it
> does this, but generally it happens when NFS operations time out.  I
> get it on my 2.6.32.7 kernel when MD craps it's drawers and takes too
> long to fail over.  By 'too long' I mean, too long to suit the kernel.
> 
> If you want to file a kernel bug, be my guest.  But do it right.  You
> will want to include a tcpdump of the NFS session (gonna be huge in
> this case, obviously), as well as a copy of the test code, or a
> portion of it that gives the developer some idea of what was
> happening.
> 
> 
> On Wed, 17 Feb 2010 16:47:07 -0700 "Seidel, Jan" <Jan.Seidel@lsi.com>
> wrote:
> 
> > Hi Andy,
> > 
> > Chris G gets the following kernel errors when running the filesystem
> > stress test. Max recommended opening a defect on a debian bug list.
> > Can you please have a quick look at this? Maybe you already know
> > something about similar problems or just have a good idea what to do
> > about it :)
> > 
> > The client IP is 10.0.0.56 and the kernel version is:
> > Linux cgreivel-linux 2.6.26-2-686 #1 SMP Sat Oct 17 17:59:23 UTC
> > 2009 i686 GNU/Linux
> > 
> > fsstress_operations is the name of the binary that's used in the
> > filesystem stress test to generate nfs traffic.
> > 
> > Thanks,
> > Jan
> > 
> > Feb 16 17:58:29 cgreivel-linux kernel: [773157.468818]
> > ======================= Feb 16 18:02:03 cgreivel-linux kernel:
> > [773389.393864] fsstress_oper D ba3112f6     0 13577      1 Feb 16
> > 18:02:03 cgreivel-linux kernel: [773389.393871]        d3f10ca0
> > 00000082 00000293 ba3112f6 00027a55 d3f10e2c c180bfa0 00000000 Feb
> > 16 18:02:03 cgreivel-linux kernel: [773389.393884]        c65f266c
> > 0e0a5256 00000001 00000000 0e0a5256 c65f266c 0e0a5256 ddef1f18 Feb
> > 16 18:02:03 cgreivel-linux kernel: [773389.393895]        c180bfa0
> > 01451000 ddef1f18 c1801370 c02b8998 ddef1f10 00000000 c0156a1 Feb 16
> > 18:02:03 cgreivel-linux kernel: [773389.393907] Call Trace: Feb 16
> > 18:02:03 cgreivel-linux kernel: [773389.393980]  [<c02b8998>]
> > io_schedule+0x49/0x80 Feb 16 18:02:03 cgreivel-linux kernel:
> > [773389.393996]  [<c0156a1d>] sync_page+0x33/0x36 Feb 16 18:02:03
> > cgreivel-linux kernel: [773389.394004]  [<c02b8b86>]
> > __wait_on_bit+0x33/0x58 Feb 16 18:02:03 cgreivel-linux kernel:
> > [773389.394010]  [<c01569ea>] sync_page+0x0/0x36 Feb 16 18:02:03
> > cgreivel-linux kernel: [773389.394025]  [<c0156c17>]
> > wait_on_page_bit+0x57/0x5d Feb 16 18:02:03 cgreivel-linux kernel:
> > [773389.394043]  [<c01319c9>] wake_bit_function+0x0/0x3c Feb 16
> > 18:02:03 cgreivel-linux kernel: [773389.394057]  [<c0156fc8>]
> > wait_on_page_writeback_range+0x51/0xf4 Feb 16 18:02:03
> > cgreivel-linux kernel: [773389.394779]  [<c018f1d5>]
> > do_fsync+0x59/0x83 Feb 16 18:02:03 cgreivel-linux kernel:
> > [773389.394796]  [<c018f21c>] __do_fsync+0x1d/0x2b Feb 16 18:02:03
> > cgreivel-linux kernel: [773389.394806]  [<c0103853>]
> > sysenter_past_esp+0x78/0xb1 Feb 16 18:02:03 cgreivel-linux kernel:
> > [773389.394850] ======================= Feb 16 18:02:03
> > cgreivel-linux kernel: [773389.394863] fsstress_oper D b9ae4430
> > 0 13583      1 Feb 16 18:02:03 cgreivel-linux kernel:
> > [773389.394868]        cb001140 00000046 f7ae1000 b9ae4430 00027a55
> > cb0012cc c180bfa0 00000000 Feb 16 18:02:03 cgreivel-linux kernel:
> > [773389.394880]        f8bd2ba2 c945ab84 00612005 00000000 f8bcda72
> > 00000000 00000000 00000246 Feb 16 18:02:03 cgreivel-linux kernel:
> > [773389.394891]        d0dbdd80 00000000 d0dbdd88 c1800644 f8c49749
> > c02b8b86 f8c49724 f03e2828 Feb 16 18:02:03 cgreivel-linux kernel:
> > [773389.394902] Call Trace: Feb 16 18:02:03 cgreivel-linux kernel:
> > [773389.394923]  [<f8bd2ba2>] __rpc_execute+0x5e/0x1d9 [sunrpc] Feb
> > 16 18:02:03 cgreivel-linux kernel: [773389.394961]  [<f8bcda72>]
> > rpc_run_task+0x40/0x45 [sunrpc] Feb 16 18:02:03 cgreivel-linux
> > kernel: [773389.395059]  [<f8c49749>]
> > nfs_wait_bit_killable+0x25/0x2a [nfs] Feb 16 18:02:03
> > cgreivel-linux kernel: [773389.395167]  [<c02b8b86>]
> > __wait_on_bit+0x33/0x58 Feb 16 18:02:03 cgreivel-linux kernel:
> > [773389.395176]  [<f8c49724>] nfs_wait_bit_killable+0x0/0x2a [nfs]
> > Feb 16 18:02:03 cgreivel-linux kernel: [773389.395278]
> > [<f8c49724>] nfs_wait_bit_killable+0x0/0x2a [nfs] Feb 16 18:02:03
> > cgreivel-linux kernel: [773389.395304] [<c02b8c0a>]
> > out_of_line_wait_on_bit+0x5f/0x67 Feb 16 18:02:03 cgreivel-linux
> > kernel: [773389.395325]  [<c01319c9>] wake_bit_function+0x0/0x3c
> > Feb 16 18:02:03 cgreivel-linux kernel: [773389.395351]
> > [<f8c4971d>] nfs_wait_on_request+0x1d/0x24 [nfs] Feb 16 18:02:03
> > cgreivel-linux kernel: [773389.395382]  [<f8c4cb7a>]
> > nfs_sync_mapping_wait+0xd2/0x245 [nfs] Feb 16 18:02:03
> > cgreivel-linux kernel: [773389.395442]  [<f8c4ceea>]
> > __nfs_write_mapping+0x22/0x3b [nfs] Feb 16 18:02:03 cgreivel-linux
> > kernel: [773389.395478] [<f8c4cf37>] nfs_write_mapping+0x34/0x52
> > [nfs] Feb 16 18:02:03 cgreivel-linux kernel: [773389.395533]
> > [<f8c42c27>] nfs_do_fsync+0x13/0x2d [nfs] Feb 16 18:02:03
> > cgreivel-linux kernel: [773389.395562]  [<f8c43013>]
> > nfs_file_flush+0x63/0x83 [nfs] Feb 16 18:02:03 cgreivel-linux
> > kernel: [773389.395595]  [<c0172c8e>] filp_close+0x2e/0x53 Feb 16
> > 18:02:03 cgreivel-linux kernel: [773389.395607]  [<c0123fc9>]
> > put_files_struct+0x60/0xa8 Feb 16 18:02:03 cgreivel-linux kernel:
> > [773389.395627]  [<c012508f>] do_exit+0x1fe/0x5bb Feb 16 18:02:03
> > cgreivel-linux kernel: [773389.395650]  [<c01254b0>]
> > do_group_exit+0x64/0x8d Feb 16 18:02:03 cgreivel-linux kernel:
> > [773389.395663]  [<c012c755>] get_signal_to_deliver+0x30d/0x32d Feb
> > 16 18:02:03 cgreivel-linux kernel: [773389.395689]  [<c0102fc6>]
> > do_notify_resume+0x7e/0x649 Feb 16 18:02:03 cgreivel-linux kernel:
> > [773389.395751]  [<c025a6ef>] net_rx_action+0x9c/0x1b9 Feb 16
> > 18:02:03 cgreivel-linux kernel: [773389.395768]  [<c0174abe>]
> > vfs_write+0xe5/0x120 Feb 16 18:02:03 cgreivel-linux kernel:
> > [773389.395789]  [<c017502e>] sys_write+0x3c/0x63 Feb 16 18:02:03
> > cgreivel-linux kernel: [773389.395806]  [<c01039a8>]
> > work_notifysig+0x13/0x1b Feb 16 18:02:03 cgreivel-linux kernel:
> > [773389.395848] =======================
