AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20090108174922.76eeeddd@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:exch1.onstor.net
NSV:
SSH:
R:<jan.seidel@onstor.com>,<sandrine.boulanger@onstor.com>,<raj.kumar@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@exch1.onstor.net/INBOX	0	2779531E7C760D4491C96305019FEEB51762FDD26F@exch1.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Thu, 8 Jan 2009 17:55:07 -0800
From: Andrew Sharp <andy.sharp@onstor.com>
To: Jan Seidel <jan.seidel@onstor.com>
Cc: Sandrine Boulanger <sandrine.boulanger@onstor.com>, Raj Kumar
 <raj.kumar@onstor.com>
Subject: Re: Problem with nfs client on SSC
Message-ID: <20090108175507.169dbe05@ripper.onstor.net>
In-Reply-To: <2779531E7C760D4491C96305019FEEB51762FDD26F@exch1.onstor.net>
References: <2779531E7C760D4491C96305019FEEB51762FDD26F@exch1.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

I don't know what "standalone" means in this context.  Standalone as
opposed to what?

next time this happens, do these:

cat /proc/loadavg
cat /proc/meminfo
df / /var


On Thu, 8 Jan 2009 15:15:34 -0800 Jan Seidel <jan.seidel@onstor.com>
wrote:

> Hi Andy,
> 
> 
> 
> I've got a problem with the nfs client running on the SSC and
> Sandrine told me that you can maybe help with that:
> 
> I'm currently working on the ndmp automation where I use tail to
> follow 3 files during ndmp runs. I redirect the tail output to an
> nfs-mounted directory:
> 
> 10.3.0.222:/tfw/log on /mnt/ndmplogs type nfs
> (rw,hard,tcp,nfsvers=3,rsize=32768,wsize=32768,intr,timeo=600,addr=10.3.0.222)
> 
> 
> 
> tail -f /var/onstor/ndmpd.trace >& $log_dir/ndmpd.trace &
> 
> 
> 
> where $log_dir
> is /mnt/ndmplogs/users/jseidel/test-logs/t/all/features/ndmp/ndmplogs/
> 
> At the end of the run I terminate the processes running in the
> background:
> 
> "ps aux | grep \"tail -f $logfile\" | grep -v grep". ' | awk
> \' { print $2 } \' | xargs -r kill'
> 
> 
> 
> The script runs without problems when I run it standalone (I ran it
> in a loop 1000 times). But together with the ndmp tests after a while
> the nfs client on the ssc seems to hang itself up:
> 
> Rpciod ends up in uniterruptible sleep and no nfs operation goes
> through any more (even a umount -fl fails).
> 
> root     26874  0.0  0.0      0     0 ?        D<   14:08   0:00
> [rpciod/0]
> 
> 
> 
> ps also hangs when it tries to lookup the tail target:
> 
> g1r8:~# strace ps aux
> 
> [..]
> 
> open("/proc/29636/cmdline", O_RDONLY)   = 6
> 
> read(6, "tail\0-f\0/var/onstor/ndmpd.trace\0", 2047) = 32
> 
> close(6)                                = 0
> 
> stat("/dev/pts2", 0x7fb74eb0)           = -1 ENOENT (No such file or
> directory)
> 
> stat("/dev/pts", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> 
> readlink("/proc/29636/fd/2",
> "/mnt/ndmplogs/users/jseidel/test-logs/t/all/features/ndmp/ndmplogs/ndmpd.trace.10.2.8.1",
> 127) = 87
> 
> stat("/mnt/ndmplogs/users/jseidel/test-logs/t/all/features/ndmp/ndmplogs/ndmpd.trace.10.2.8.1",
> <unfinished ...>
> 
> 
> 
> After a while ps runs through again and the tail processes also show
> up as uninterruptible sleep:
> 
> root     29636  0.0  0.1   2016   472 ?        D    14:20   0:00 tail
> -f /var/onstor/ndmpd.trace
> 
> root     29637  0.0  0.1   2016   468 ?        D    14:20   0:00 tail
> -f /var/log/onstor/messages
> 
> root     29638  0.0  0.1   2016   468 ?        D    14:20   0:00 tail
> -f /var/log/messages
> 
> 
> 
> An ls of the /mnt/ndmplogs directory also doesn't work anymore:
> 
> g1r8:~# strace ls /mnt/ndmplogs
> 
> [..]
> 
> open("/proc/mounts", O_RDONLY|O_LARGEFILE) = 3
> 
> fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
> 
> old_mmap(NULL, 65536, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aac5000
> 
> read(3, "rootfs / rootfs rw 0 0\n/dev/root"..., 1024) = 599
> 
> read(3, "", 1024)                       = 0
> 
> close(3)                                = 0
> 
> munmap(0x2aac5000, 65536)               = 0
> 
> ioctl(1, TIOCNXCL, {B38400 opost isig icanon echo ...}) = 0
> 
> ioctl(1, 0x40087468, 0x7fc4fc20)        = 0
> 
> stat64(0x7fc4fe34, 0x45a0d8 <unfinished ...>
> 
> 
> 
> A ping and also a showmount of the server still works:
> 
> g1r8:~# showmount -e 10.3.0.222
> 
> Export list for 10.3.0.222:
> 
> /tfw      *
> 
> /tftpstor *
> 
> g1r8:~# ping 10.3.0.222
> 
> PING 10.3.0.222 (10.3.0.222) 56(84) bytes of data.
> 
> 64 bytes from 10.3.0.222: icmp_seq=1 ttl=254 time=4.64 ms
> 
> 64 bytes from 10.3.0.222: icmp_seq=2 ttl=254 time=4.83 ms
> 
> 
> 
> Do you have an idea why the nfs client might hang itself up in this
> case? As I said, running the script standalone works without problems.
> 
> Sorry for that long mail, but I tried to get everything in it that I
> found out about this problem :)
> 
> 
> 
> Any help would be greatly appreciated!
> 
> 
> 
> Thanks,
> 
> Jan
> 
> 
> 
> 
> 
> 
