AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20090707195754.49627c76@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:mail.onstor.net
NSV:
SSH:
R:<sandrine.boulanger@onstor.com>,<maxim.kozlovsky@onstor.com>,<Yogesh.Sawant@onstor.com>,<Dilip.Jha@onstor.com>,<Sandeep.Chavan@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@exch1.onstor.net/INBOX	0	102AB4F33EBBDB4C91915B145C8E9FB31377A82E04@exch1.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Tue, 7 Jul 2009 19:58:38 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: Sandrine Boulanger <sandrine.boulanger@onstor.com>
Cc: Maxim Kozlovsky <maxim.kozlovsky@onstor.com>, Yogesh Sawant
 <Yogesh.Sawant@onstor.com>, Dilip Jha <Dilip.Jha@onstor.com>, Sandeep
 Chavan <Sandeep.Chavan@onstor.com>
Subject: Re: what is kswapd0 on Cougar? It takes too much cpu and slows down
 ssc on latest Cougar dev build (07/06/09)
Message-ID: <20090707195838.517273c1@ripper.onstor.net>
In-Reply-To: <102AB4F33EBBDB4C91915B145C8E9FB31377A82E04@exch1.onstor.net>
References: <102AB4F33EBBDB4C91915B145C8E9FB31377A82E02@exch1.onstor.net>
	<102AB4F33EBBDB4C91915B145C8E9FB31377A82E04@exch1.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Yup, it's out of memory.  Let me guess, you are testing a system with
5-million lun/path combinations?  Just kidding.  If you can log in, try
to capture the output of top -b -n 1 and then we can figure out what
process(es) is(are) sucking the life out of the thing.

Cheers,

a


On Tue, 7 Jul 2009 18:57:09 -0700 Sandrine Boulanger
<sandrine.boulanger@onstor.com> wrote:

> I see those too on the console:
> 
> Out of memory: kill process 971 (pm) score 1062 or a child
> Killed process 983 (ncmd)
> 
> Out of memory: kill process 804 (exim4) score 2819 or a child
> Killed process 7699 (exim4)
> 
> I'm starting to wonder if we should put back a stable 4.0.2.x build
> on those systems to be able to use them for the test automation
> development...
> 
> _____________________________________________
> From: Sandrine Boulanger
> Sent: Tuesday, July 07, 2009 6:41 PM
> To: Sandrine Boulanger; Andy Sharp
> Cc: Jonathan Goldick; Maxim Kozlovsky; Yogesh Sawant; Dilip Jha;
> Sandeep Chavan Subject: RE: what is kswapd0 on Cougar? It takes too
> much cpu and slows down ssc on latest Cougar dev build (07/06/09)
> 
> Well, can't be HW, now g9r204 is showing this too. No idea how to
> recover from this but power cycle, but we'll eventually end up there
> again. What's happening?
> 
> SiByte User Watchdog in danger of initiating system reset in 4.1
> seconds SiByte User Watchdog in danger of initiating system reset in
> 4.1 seconds SiByte User Watchdog in danger of initiating system reset
> in 4.1 seconds SiByte User Watchdog in danger of initiating system
> reset in 4.1 seconds SiByte User Watchdog in danger of initiating
> system reset in 4.1 seconds SiByte User Watchdog in danger of
> initiating system reset in 4.1 seconds SiByte User Watchdog in danger
> of initiating system reset in 4.1 seconds SiByte User Watchdog in
> danger of initiating system reset in 4.1 seconds
> 
> _____________________________________________
> From: Sandrine Boulanger
> Sent: Tuesday, July 07, 2009 6:22 PM
> To: Andy Sharp
> Cc: Jonathan Goldick; Maxim Kozlovsky; Yogesh Sawant; Dilip Jha;
> Sandeep Chavan Subject: what is kswapd0 on Cougar? It takes too much
> cpu and slows down ssc on latest Cougar dev build (07/06/09)
> 
> top - 18:14:33 up  5:52,  1 user,  load average: 10.41, 7.52, 5.75
> Tasks:  70 total,   2 running,  68 sleeping,   0 stopped,   0 zombie
> Cpu(s):  1.6%us, 10.6%sy,  0.0%ni,  0.0%id,  7.2%wa, 79.4%hi,
> 1.2%si,  0.0%st Mem:    466460k total,   459428k used,     7032k
> free,      164k buffers Swap:    30232k total,    30232k used,
> 0k free,     5392k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>    50 root      10  -5     0    0    0 D 21.5  0.0   1:11.95 kswapd0
>   984 root      15   0 16780  444  184 S  7.5  0.1   0:38.44 pm
>  7665 root      18   0 22188 1524  852 D  5.9  0.3   0:01.50 nfxsh
> 21206 root      10  -5 18680 1460  376 S  5.9  0.3   3:50.38
> cluster_contrl 7668 root      18   0 22188 1544  868 R  5.6  0.3
> 0:01.46 nfxsh 21205 root      10  -5 18680 1352  284 S  5.6  0.3
> 0:16.33 cluster_contrl
> 
> I reconfigured again the cluster g9r204/g5r204 because I kept having
> cluster errors with any build. The cluster now seems stable but
> executing anything on the SSC is super slow.
> 
> Brian, are you aware of g5r204 having HW issues? It is stuck with
> "SiByte User Watchdog in danger of initiating system reset in 8.2
> seconds" messages on the console, no way to interrupt and access the
> prompt.
> 
> The ssc sonsoles are 10.2.203.235 9039 for g8r204 and 9041 for g5r204.
> 
