AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20090109170411.4eb7761b@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:exch1.onstor.net
NSV:
SSH:
R:<jan.seidel@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@exch1.onstor.net/INBOX	0	2779531E7C760D4491C96305019FEEB51762FDD27A@exch1.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Fri, 9 Jan 2009 17:05:28 -0800
From: Andrew Sharp <andy.sharp@onstor.com>
To: Jan Seidel <jan.seidel@onstor.com>
Subject: Re: Problem with nfs client on SSC
Message-ID: <20090109170528.44ee7435@ripper.onstor.net>
In-Reply-To: <2779531E7C760D4491C96305019FEEB51762FDD27A@exch1.onstor.net>
References: <20090109100459.3ea81d2f@ripper.onstor.net>
	<2779531E7C760D4491C96305019FEEB51762FDD27A@exch1.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Yeah, I saw this one already.  Unfortunately the douchebags didn't
quite realize that the deadlock is not really related to unmounting, or
they might have fixed it in 2.6.22 itself.  Nor did they post any
patch(s) that definitively fix the problem.  And the listing doesn't
include email attachments, so Trond's patches aren't visible.  Sigh.
I'll see what else I can dig up.

Try and figure out what's going on with the mount of .snapshot.  It
might definitely be related to this problem, because this deadlock
should be hard to hit, not easy, so possibly there is a way to work
around this if we're doing something that we shouldn't be.

Cheers,

a

On Fri, 9 Jan 2009 16:10:10 -0800 Jan Seidel <jan.seidel@onstor.com>
wrote:

> Hi Andy,
> 
> This is the thread I was talking about:
> http://www.mail-archive.com/linux-nfs@vger.kernel.org/msg00708.html
> 
> But I don't know enough about rpciod to verify if this is the same
> problem.
> 
> Regards,
> Jan
> 
> 
> -----Original Message-----
> From: Andy Sharp 
> Sent: Friday, January 09, 2009 10:05 AM
> To: Jan Seidel
> Subject: Re: Problem with nfs client on SSC
> 
> OK, I'll have to take a look at it.  Let me know when it's in that
> state.
> 
> On Thu, 8 Jan 2009 18:31:03 -0800 Jan Seidel <jan.seidel@onstor.com>
> wrote:
> 
> > OK, now it happened again and here is the output for the 3 commands:
> > 
> > g1r8:~# cat /proc/loadavg
> > 0.83 0.40 0.25 1/88 9258
> > g1r8:~# cat /proc/meminfo
> > MemTotal:       433692 kB
> > MemFree:        331216 kB
> > Buffers:          7364 kB
> > Cached:          36408 kB
> > SwapCached:          0 kB
> > Active:          64776 kB
> > Inactive:        21956 kB
> > SwapTotal:       30232 kB
> > SwapFree:        30232 kB
> > Dirty:             316 kB
> > Writeback:           4 kB
> > AnonPages:       42972 kB
> > Mapped:          13260 kB
> > Slab:             9040 kB
> > SReclaimable:     2376 kB
> > SUnreclaim:       6664 kB
> > PageTables:       2372 kB
> > NFS_Unstable:        0 kB
> > Bounce:              0 kB
> > CommitLimit:    247076 kB
> > Committed_AS:    99732 kB
> > VmallocTotal: 1073741824 kB
> > VmallocUsed:        20 kB
> > VmallocChunk: 1073741804 kB
> > g1r8:~#
> > g1r8:~# df / /var
> > Filesystem           1K-blocks      Used Available Use% Mounted on
> > /dev/sdb1               701408    294944    406464  43% /
> > /dev/sdb3               168871     79281     89590  47% /var
> > 
> > I don't see any problem in the output, but maybe you do...
> > 
> > Thanks for your help,
> > Jan
> > 
> > -----Original Message-----
> > From: Andy Sharp 
> > Sent: Thursday, January 08, 2009 5:55 PM
> > To: Jan Seidel
> > Cc: Sandrine Boulanger; Raj Kumar
> > Subject: Re: Problem with nfs client on SSC
> > 
> > I don't know what "standalone" means in this context.  Standalone as
> > opposed to what?
> > 
> > next time this happens, do these:
> > 
> > cat /proc/loadavg
> > cat /proc/meminfo
> > df / /var
> > 
> > 
> > On Thu, 8 Jan 2009 15:15:34 -0800 Jan Seidel <jan.seidel@onstor.com>
> > wrote:
> > 
> > > Hi Andy,
> > > 
> > > 
> > > 
> > > I've got a problem with the nfs client running on the SSC and
> > > Sandrine told me that you can maybe help with that:
> > > 
> > > I'm currently working on the ndmp automation where I use tail to
> > > follow 3 files during ndmp runs. I redirect the tail output to an
> > > nfs-mounted directory:
> > > 
> > > 10.3.0.222:/tfw/log on /mnt/ndmplogs type nfs
> > > (rw,hard,tcp,nfsvers=3,rsize=32768,wsize=32768,intr,timeo=600,addr=10.3.0.222)
> > > 
> > > 
> > > 
> > > tail -f /var/onstor/ndmpd.trace >& $log_dir/ndmpd.trace &
> > > 
> > > 
> > > 
> > > where $log_dir
> > > is /mnt/ndmplogs/users/jseidel/test-logs/t/all/features/ndmp/ndmplogs/
> > > 
> > > At the end of the run I terminate the processes running in the
> > > background:
> > > 
> > > "ps aux | grep \"tail -f $logfile\" | grep -v grep". ' | awk
> > > \' { print $2 } \' | xargs -r kill'
> > > 
> > > 
> > > 
> > > The script runs without problems when I run it standalone (I ran
> > > it in a loop 1000 times). But together with the ndmp tests after a
> > > while the nfs client on the ssc seems to hang itself up:
> > > 
> > > Rpciod ends up in uniterruptible sleep and no nfs operation goes
> > > through any more (even a umount -fl fails).
> > > 
> > > root     26874  0.0  0.0      0     0 ?        D<   14:08   0:00
> > > [rpciod/0]
> > > 
> > > 
> > > 
> > > ps also hangs when it tries to lookup the tail target:
> > > 
> > > g1r8:~# strace ps aux
> > > 
> > > [..]
> > > 
> > > open("/proc/29636/cmdline", O_RDONLY)   = 6
> > > 
> > > read(6, "tail\0-f\0/var/onstor/ndmpd.trace\0", 2047) = 32
> > > 
> > > close(6)                                = 0
> > > 
> > > stat("/dev/pts2", 0x7fb74eb0)           = -1 ENOENT (No such file
> > > or directory)
> > > 
> > > stat("/dev/pts", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> > > 
> > > readlink("/proc/29636/fd/2",
> > > "/mnt/ndmplogs/users/jseidel/test-logs/t/all/features/ndmp/ndmplogs/ndmpd.trace.10.2.8.1",
> > > 127) = 87
> > > 
> > > stat("/mnt/ndmplogs/users/jseidel/test-logs/t/all/features/ndmp/ndmplogs/ndmpd.trace.10.2.8.1",
> > > <unfinished ...>
> > > 
> > > 
> > > 
> > > After a while ps runs through again and the tail processes also
> > > show up as uninterruptible sleep:
> > > 
> > > root     29636  0.0  0.1   2016   472 ?        D    14:20   0:00
> > > tail -f /var/onstor/ndmpd.trace
> > > 
> > > root     29637  0.0  0.1   2016   468 ?        D    14:20   0:00
> > > tail -f /var/log/onstor/messages
> > > 
> > > root     29638  0.0  0.1   2016   468 ?        D    14:20   0:00
> > > tail -f /var/log/messages
> > > 
> > > 
> > > 
> > > An ls of the /mnt/ndmplogs directory also doesn't work anymore:
> > > 
> > > g1r8:~# strace ls /mnt/ndmplogs
> > > 
> > > [..]
> > > 
> > > open("/proc/mounts", O_RDONLY|O_LARGEFILE) = 3
> > > 
> > > fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
> > > 
> > > old_mmap(NULL, 65536, PROT_READ|PROT_WRITE,
> > > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aac5000
> > > 
> > > read(3, "rootfs / rootfs rw 0 0\n/dev/root"..., 1024) = 599
> > > 
> > > read(3, "", 1024)                       = 0
> > > 
> > > close(3)                                = 0
> > > 
> > > munmap(0x2aac5000, 65536)               = 0
> > > 
> > > ioctl(1, TIOCNXCL, {B38400 opost isig icanon echo ...}) = 0
> > > 
> > > ioctl(1, 0x40087468, 0x7fc4fc20)        = 0
> > > 
> > > stat64(0x7fc4fe34, 0x45a0d8 <unfinished ...>
> > > 
> > > 
> > > 
> > > A ping and also a showmount of the server still works:
> > > 
> > > g1r8:~# showmount -e 10.3.0.222
> > > 
> > > Export list for 10.3.0.222:
> > > 
> > > /tfw      *
> > > 
> > > /tftpstor *
> > > 
> > > g1r8:~# ping 10.3.0.222
> > > 
> > > PING 10.3.0.222 (10.3.0.222) 56(84) bytes of data.
> > > 
> > > 64 bytes from 10.3.0.222: icmp_seq=1 ttl=254 time=4.64 ms
> > > 
> > > 64 bytes from 10.3.0.222: icmp_seq=2 ttl=254 time=4.83 ms
> > > 
> > > 
> > > 
> > > Do you have an idea why the nfs client might hang itself up in
> > > this case? As I said, running the script standalone works without
> > > problems.
> > > 
> > > Sorry for that long mail, but I tried to get everything in it
> > > that I found out about this problem :)
> > > 
> > > 
> > > 
> > > Any help would be greatly appreciated!
> > > 
> > > 
> > > 
> > > Thanks,
> > > 
> > > Jan
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
