AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20090109100453.1dd7b00d@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:exch1.onstor.net
NSV:
SSH:
R:<jan.seidel@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@exch1.onstor.net/INBOX	0	2779531E7C760D4491C96305019FEEB51762FDD272@exch1.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Fri, 9 Jan 2009 10:04:59 -0800
From: Andrew Sharp <andy.sharp@onstor.com>
To: Jan Seidel <jan.seidel@onstor.com>
Subject: Re: Problem with nfs client on SSC
Message-ID: <20090109100459.3ea81d2f@ripper.onstor.net>
In-Reply-To: <2779531E7C760D4491C96305019FEEB51762FDD272@exch1.onstor.net>
References: <20090108175507.169dbe05@ripper.onstor.net>
	<2779531E7C760D4491C96305019FEEB51762FDD272@exch1.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

OK, I'll have to take a look at it.  Let me know when it's in that
state.

On Thu, 8 Jan 2009 18:31:03 -0800 Jan Seidel <jan.seidel@onstor.com>
wrote:

> OK, now it happened again and here is the output for the 3 commands:
> 
> g1r8:~# cat /proc/loadavg
> 0.83 0.40 0.25 1/88 9258
> g1r8:~# cat /proc/meminfo
> MemTotal:       433692 kB
> MemFree:        331216 kB
> Buffers:          7364 kB
> Cached:          36408 kB
> SwapCached:          0 kB
> Active:          64776 kB
> Inactive:        21956 kB
> SwapTotal:       30232 kB
> SwapFree:        30232 kB
> Dirty:             316 kB
> Writeback:           4 kB
> AnonPages:       42972 kB
> Mapped:          13260 kB
> Slab:             9040 kB
> SReclaimable:     2376 kB
> SUnreclaim:       6664 kB
> PageTables:       2372 kB
> NFS_Unstable:        0 kB
> Bounce:              0 kB
> CommitLimit:    247076 kB
> Committed_AS:    99732 kB
> VmallocTotal: 1073741824 kB
> VmallocUsed:        20 kB
> VmallocChunk: 1073741804 kB
> g1r8:~#
> g1r8:~# df / /var
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/sdb1               701408    294944    406464  43% /
> /dev/sdb3               168871     79281     89590  47% /var
> 
> I don't see any problem in the output, but maybe you do...
> 
> Thanks for your help,
> Jan
> 
> -----Original Message-----
> From: Andy Sharp 
> Sent: Thursday, January 08, 2009 5:55 PM
> To: Jan Seidel
> Cc: Sandrine Boulanger; Raj Kumar
> Subject: Re: Problem with nfs client on SSC
> 
> I don't know what "standalone" means in this context.  Standalone as
> opposed to what?
> 
> next time this happens, do these:
> 
> cat /proc/loadavg
> cat /proc/meminfo
> df / /var
> 
> 
> On Thu, 8 Jan 2009 15:15:34 -0800 Jan Seidel <jan.seidel@onstor.com>
> wrote:
> 
> > Hi Andy,
> > 
> > 
> > 
> > I've got a problem with the nfs client running on the SSC and
> > Sandrine told me that you can maybe help with that:
> > 
> > I'm currently working on the ndmp automation where I use tail to
> > follow 3 files during ndmp runs. I redirect the tail output to an
> > nfs-mounted directory:
> > 
> > 10.3.0.222:/tfw/log on /mnt/ndmplogs type nfs
> > (rw,hard,tcp,nfsvers=3,rsize=32768,wsize=32768,intr,timeo=600,addr=10.3.0.222)
> > 
> > 
> > 
> > tail -f /var/onstor/ndmpd.trace >& $log_dir/ndmpd.trace &
> > 
> > 
> > 
> > where $log_dir
> > is /mnt/ndmplogs/users/jseidel/test-logs/t/all/features/ndmp/ndmplogs/
> > 
> > At the end of the run I terminate the processes running in the
> > background:
> > 
> > "ps aux | grep \"tail -f $logfile\" | grep -v grep". ' | awk
> > \' { print $2 } \' | xargs -r kill'
> > 
> > 
> > 
> > The script runs without problems when I run it standalone (I ran it
> > in a loop 1000 times). But together with the ndmp tests after a
> > while the nfs client on the ssc seems to hang itself up:
> > 
> > Rpciod ends up in uniterruptible sleep and no nfs operation goes
> > through any more (even a umount -fl fails).
> > 
> > root     26874  0.0  0.0      0     0 ?        D<   14:08   0:00
> > [rpciod/0]
> > 
> > 
> > 
> > ps also hangs when it tries to lookup the tail target:
> > 
> > g1r8:~# strace ps aux
> > 
> > [..]
> > 
> > open("/proc/29636/cmdline", O_RDONLY)   = 6
> > 
> > read(6, "tail\0-f\0/var/onstor/ndmpd.trace\0", 2047) = 32
> > 
> > close(6)                                = 0
> > 
> > stat("/dev/pts2", 0x7fb74eb0)           = -1 ENOENT (No such file or
> > directory)
> > 
> > stat("/dev/pts", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> > 
> > readlink("/proc/29636/fd/2",
> > "/mnt/ndmplogs/users/jseidel/test-logs/t/all/features/ndmp/ndmplogs/ndmpd.trace.10.2.8.1",
> > 127) = 87
> > 
> > stat("/mnt/ndmplogs/users/jseidel/test-logs/t/all/features/ndmp/ndmplogs/ndmpd.trace.10.2.8.1",
> > <unfinished ...>
> > 
> > 
> > 
> > After a while ps runs through again and the tail processes also show
> > up as uninterruptible sleep:
> > 
> > root     29636  0.0  0.1   2016   472 ?        D    14:20   0:00
> > tail -f /var/onstor/ndmpd.trace
> > 
> > root     29637  0.0  0.1   2016   468 ?        D    14:20   0:00
> > tail -f /var/log/onstor/messages
> > 
> > root     29638  0.0  0.1   2016   468 ?        D    14:20   0:00
> > tail -f /var/log/messages
> > 
> > 
> > 
> > An ls of the /mnt/ndmplogs directory also doesn't work anymore:
> > 
> > g1r8:~# strace ls /mnt/ndmplogs
> > 
> > [..]
> > 
> > open("/proc/mounts", O_RDONLY|O_LARGEFILE) = 3
> > 
> > fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
> > 
> > old_mmap(NULL, 65536, PROT_READ|PROT_WRITE,
> > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aac5000
> > 
> > read(3, "rootfs / rootfs rw 0 0\n/dev/root"..., 1024) = 599
> > 
> > read(3, "", 1024)                       = 0
> > 
> > close(3)                                = 0
> > 
> > munmap(0x2aac5000, 65536)               = 0
> > 
> > ioctl(1, TIOCNXCL, {B38400 opost isig icanon echo ...}) = 0
> > 
> > ioctl(1, 0x40087468, 0x7fc4fc20)        = 0
> > 
> > stat64(0x7fc4fe34, 0x45a0d8 <unfinished ...>
> > 
> > 
> > 
> > A ping and also a showmount of the server still works:
> > 
> > g1r8:~# showmount -e 10.3.0.222
> > 
> > Export list for 10.3.0.222:
> > 
> > /tfw      *
> > 
> > /tftpstor *
> > 
> > g1r8:~# ping 10.3.0.222
> > 
> > PING 10.3.0.222 (10.3.0.222) 56(84) bytes of data.
> > 
> > 64 bytes from 10.3.0.222: icmp_seq=1 ttl=254 time=4.64 ms
> > 
> > 64 bytes from 10.3.0.222: icmp_seq=2 ttl=254 time=4.83 ms
> > 
> > 
> > 
> > Do you have an idea why the nfs client might hang itself up in this
> > case? As I said, running the script standalone works without
> > problems.
> > 
> > Sorry for that long mail, but I tried to get everything in it that I
> > found out about this problem :)
> > 
> > 
> > 
> > Any help would be greatly appreciated!
> > 
> > 
> > 
> > Thanks,
> > 
> > Jan
> > 
> > 
> > 
> > 
> > 
> > 
