X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C8942C.897CE1A0@onstor-exch02.onstor.net>; Tue, 1 Apr 2008 12:13:58 -0700
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Content-class: urn:content-classes:message
Subject: RE: Onstor Case 7430 - ndmp
Date: Tue, 1 Apr 2008 12:13:57 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E093216C5@onstor-exch02.onstor.net>
In-Reply-To: <01ac01c89429$a2d38400$0300a8c0@lab.css.glasshouse.com>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: Onstor Case 7430 - ndmp
Thread-Index: AciRHMs/klp4YZP2SzG5tY7AgpDFQQCQVv9wADLF1lAAAHv4IA==
References: <01ac01c89429$a2d38400$0300a8c0@lab.css.glasshouse.com>
From: "Narain Ramadass" <narain.ramadass@onstor.com>
To: "Fred McFadden (Glasshouse)" <fredm@css.glasshouse.com>,
	"dl-cstech" <dl-cstech@onstor.com>

Answering to the extent I know:

1. The CDB problem is probably a non-issue. It is possible for a DMA
(netbackup in this case) to send pass-thru CDB's to the tape drive at
the backend using NDMP. In this case, a CDB was sent that had an invalid
field and was therefore rejected by the tape drive or the robot.

2. Their DMA configuration is the problem here. Yes - a backup does seem
to begin as soon as the first one is supposed to end (I am not sure how
this was figured out - but assume that somehow we know) - but there is
no internal mechanism to start backups. Backups can only be started by a
DMA and their configuration of netbackup is the reason for this. Will
need the complete elogs and the ndmpd.trace file to ascertain whats
going on here.

3. The End-of-media condition is a way of telling the DMA that it needs
to change the tape on the drive it is using - the ONStor cannot
automatically do this as the index of the data is maintained by NBU -
not ONStor. Therefore - that - IMHO - is normal too.=20

Think that the configuration of the DMA needs to be setup properly in
this case.

Narain.

-----Original Message-----
From: Fred McFadden (Glasshouse)=20
Sent: Tuesday, April 01, 2008 11:53 AM
To: dl-cstech
Subject: FW: Onstor Case 7430 - ndmp

Can anyone answer customer questions?

-He has two ip addressed per virtual server.=20
-Ndmp appears to contact the vsvr through the first ip configured on the
virtual server.=20
-Netbackup appears is contacting through the second ip configured.=20

-Fred

-----Original Message-----
From: Larry O'Donnell [mailto:larry.odonnell@Qsent.com]
Sent: Monday, March 31, 2008 12:15 PM
To: Call Center TSE
Subject: RE: Onstor Case 7430 - ndmp

Fred ;

Appreciate the response and your time in helping solve this.

I do have a question in regards to the actual datagrams and control data
that NDMP uses to transfer data across the SAN fabric. But lets start
with a more detailed description of my environment.=20

Currently we have the Bobcat configured with 2 virtual servers which
each have an Additional virtual IP on another subnet :

	Onstor-dat1     	10.150.5.101  Primary IP
				10.140.1.241  Virtual IP - onstor-dat11

	Onstor-dat2		10.150.5.102  Primary IP
				10.140.1.242  Virtual IP - onstor-dat12

	Nbmstr		172.17.177.41
				10.140.1.115  nat address through the
firewall

We currently have a firewall rule that allows access between the
NetBackup master and the 10.140.1.242 (onstor-dat12 or virtual IP) only,
not the primary IP of the virtual server. So Netbackup only has access
to the virtual IP(s).

Deep down in the log files on the Netbackup side we do see about the
time the error occurs that it is attempting to contact the 10.150.5.102
address and not the 10.140.1.242 which it has access to. We initiate the
backup job using the virtual IP address of 10.140.1.242.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
It seems like bptm could not talk to this IP and then it failed with
status 25:=20
19:47:42.346 [22424] <2> NdmpMoverClient: ndmp_mover_listen returned
addr
10.150.5.102 port 48203
19:51:27.017 [22424] <2> NdmpMoverClient: ConnectSocket - connect
failed: 145 Connection timed out
19:51:27.017 [22424] <16> NdmpMoverClient: ERROR ConnectSocket failed.
19:51:27.017 [22424] <16> NdmpMoverClient: ERROR Start failed
19:51:39.210 [22424] <2> NdmpMoverClient: mover halted reason
NDMP_MOVER_HALT_ABORTED
19:51:39.309 [22424] <2> NdmpMoverClient: ndmp_mover_abort status =3D 0
19:51:39.324 [22424] <2> NdmpMoverClient: ndmp_mover_stop status =3D 0
19:51:39.324 [22424] <2> NdmpMoverClient: Shutdown complete
19:51:39.324 [22424] <2> check_error_history: just tpunmount: called
from bptm line 19244, EXIT_Status =3D 25
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Is there somewhere within the data or control NDMP protocol where the IP
address is used instead of a node name? Should the backup be initiated
at the primary IP address only of the virtual server?

Will give you a call in a few minutes.

Larry
=20

-----Original Message-----
From: Call Center TSE [mailto:support@cc.onstor.com]
Sent: Friday, March 28, 2008 2:44 PM
To: Larry O'Donnell
Subject: Onstor Case 7430 - nmdp

Hi Larry

Terribly sorry it tool so long to get back. I just now finished looking
at the logs.

Our logs file show an issue at 19:31, which just confirms the time
differences.=20


Firstly, In our messages file we have:
---------
Mar 26 17:48:53 onstor-dat0 : 0:0:tape-driver:ERROR:
tape_proc_sdm_relay_rsp: status: 0xe0000002 sense: key:0x5 asc:0x24
ascQ:0x0 Mar 26 17:48:53 onstor-dat0 : 0:0:tape-driver:ERROR:
tape_proc_sdm_relay_rsp: raw sense: f0 00 05 00 00 00 00 16
----------------------
This occurs regularly, but the dump continues to run fine. I think there
may be some timing issue, tape not ready issue, etc, that the ndmp
encounters but works through. Since the dump keeps running I dont see it
as a problem.
--------------
Next we have what I see as a problem. Note time stamp. When the ndmp is
supposed to have completed, here we have a EOM error, end of media. Why
are we getting an end of tape error?
this is from our ndmp logs, ndmp.trace.1 file in your system get all.=20
--------------
Wed Mar 26 19:31:15 2008 (1206585075): Transmitted to 10.140.1.115;
Session:1201953768
vs:3
Message   : 0x305 (NDMP_TAPE_READ)
Timestamp : 1206585075
XSequence : 12
RSequence :=20
11
Error     : 0 (NDMP_NO_ERR)
	Error : 13 (NDMP_EOM_ERR !!!)
	Datain len 0
	Datain 0K <DATA>
---------------
Other team members have commented that the error code 25 is a pretty
common message. It has been suggested that maybe too many ndmp backup
jobs are set up and are overlapping. The logs do not seem to indicate
this, at least the Onstor logs and the log you pasted in.
---------
What does the tcp fusion setting have to do with; that you were asked to
set? Please confirm you added that to the /etc/system on the solaris
server or the Onstor filer?
----------
The last and biggest error that baffles me is the following, from our
messages logs:
----------


 Mar 26 19:31:22 onstor-dat0 : 0:0:nfxsh:NOTICE: cmd[1]: vsvr stats -i 1
-c 1 : status[0] Mar 26 19:31:23 onstor-dat0 : 0:0:tape-driver:NOTICE:
tape_do_locate:
dev 0x100ba000, cookie 0x100b3170
Mar 26 19:31:23 onstor-dat0 : 0:0:tape-driver:NOTICE: tape_do_locate:
Current Partition 0 block number 1683795 Mar 26 19:31:32 onstor-dat0 :
0:0:tape-driver:NOTICE: tape_do_locate:
dev 0x100ba000, cookie 0x100b3170
Mar 26 19:31:32 onstor-dat0 : 0:0:tape-driver:NOTICE: tape_do_locate:
Current Partition 0 block number 1683794 Mar 26 19:31:32 onstor-dat0 :
1:3:ndmp:NOTICE: 55370: BACKUP Begins -
path:  /, sessId 1201953770
Mar 26 19:31:46 onstor-dat0 : 1:3:efs:NOTICE: 55371: FS: tupa
0x5530000008f - dumpStart - snp - snap create complete for dump_54 id 1
------------------
It looks like the ndmp session is looping, or a new one is starting
"BACKUP begins", yet it has been indicated that here was where the ndmp
session completed. Was another backup session started?
-----------
I continue to investigate these issues on our side. Engineering may
warrant this as a worthy of a defect report on our side, I am checking
on that. If you are able to report these issues to Veritas it would be
interesting to get thier response.

thanks
Fred
866-726-3453





