X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C720A3.D209D99F@onstor-exch02.onstor.net>; Fri, 15 Dec 2006 15:50:37 -0800
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Content-class: urn:content-classes:message
Subject: RE: Updates
Date: Fri, 15 Dec 2006 15:50:37 -0800
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E0A9430@onstor-exch02.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E01C0B542@onstor-exch02.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: Updates
thread-index: Acce+b3Mfy+xIPS2TRi4/4armFj97QAAFnpQAAA/8cAAAA8pAAAAMVEwAAtKwPAAAB7EkAAAXasgAAF+nMAAG+bGsAAAEIdhABPlQMAAA51kuQAkXd7gAACp2IAAAHC+DAAAKMggAAMydfA=
From: "Larry Scheer" <larry.scheer@onstor.com>
To: "Paul Hammer" <paul.hammer@onstor.com>,
	"Ken Renshaw" <ken.renshaw@onstor.com>,
	"Eric Barrett" <eric.barrett@onstor.com>,
	"Tim Gardner" <tim.gardner@onstor.com>,
	"Sandrine Boulanger" <sandrine.boulanger@onstor.com>,
	"Ed Kwan" <ed.kwan@onstor.com>,
	"Charissa Willard" <charissa.willard@onstor.com>
Cc: "Kevin Matthews" <kevin.matthews@onstor.com>,
	"Brian Baker" <brian.baker@onstor.com>,
	"Vikas Saini" <vikas.saini@onstor.com>,
	"dl-Clio" <dl-Clio@onstor.com>

When dogfood was upgraded 317 files were refreshed on the flash. Looking
at the order they were installed this is what I see:

/usr/local/agile/lib/libucdagent.so            was file # 120 of 317
/usr/local/agile/lib/libucdmibs.so             was file # 121 of 317
/usr/local/agile/web/js/console/login.js       was file # 206 of 317
/usr/local/agile/web/js/console/view.js        was file # 207 of 317
/usr/local/agile/web/js/domain/util.js         was file # 211 of 317

Tar file's size: 41136086 bytes

-----Original Message-----
From: Paul Hammer=20
Sent: Friday, December 15, 2006 2:09 PM
To: Ken Renshaw; Larry Scheer; Eric Barrett; Tim Gardner; Sandrine
Boulanger; Ed Kwan; Charissa Willard
Cc: Kevin Matthews; Brian Baker; Vikas Saini; dl-Clio
Subject: RE: Updates

You and I both seem to be heading in the same direction share the same
concerns/questions.

-----Original Message-----
From: Ken Renshaw=20
Sent: Friday, December 15, 2006 2:04 PM
To: Larry Scheer; Paul Hammer; Eric Barrett; Tim Gardner; Sandrine
Boulanger; Ed Kwan; Charissa Willard
Cc: Kevin Matthews; Brian Baker; Vikas Saini; dl-Clio
Subject: Re: Updates

Are the the sequence they get installed during sys upgrade, or as listed
in the tarfile? I'm just not sure if they're the same and if not would
be curious to know if they're towards the end of install. We've seen
some bizarre things happen when the ramdisk created to hold the
extracted tarfile during upgrade gets full or munged somehow. I think
the tarfile size of about 40mb is where the ramdisk used to choke.

Just a thought, and thanks for following through on this Larry.

-Ken

=20

-----Original Message-----
From: Larry Scheer
To: Paul Hammer; Eric Barrett; Tim Gardner; Sandrine Boulanger; Ed Kwan;
Charissa Willard
CC: Kevin Matthews; Brian Baker; Vikas Saini; dl-Clio
Sent: Fri Dec 15 13:57:44 2006
Subject: RE: Updates

From the source distribution R2.1.0.0-120906.tar.gz used to upgrade
dogfood:

/usr/local/agile/lib/libucdagent.so               is file # 1647 of 4866
/usr/local/agile/lib/libucdmibs.so                is file # 1649 of 4866

/usr/local/agile/web/js/console/login.js       is file # 2478 of 4866

/usr/local/agile/web/js/console/view.js        is file # 2479 of 4866
/usr/local/agile/web/js/domain/util.js           is file # 2489 of 4866


Is this the information you were seeking?

=20

________________________________

From: Paul Hammer=20
Sent: Friday, December 15, 2006 1:33 PM
To: Paul Hammer; Larry Scheer; Eric Barrett; Tim Gardner; Sandrine
Boulanger; Ed Kwan; Charissa Willard
Cc: Kevin Matthews; Brian Baker; Vikas Saini; dl-Clio
Subject: RE: Updates

=20

Can I get a reply about where these files are located in the shipping
package?

=20

________________________________

From: Paul Hammer=20
Sent: Thursday, December 14, 2006 8:11 PM
To: Larry Scheer; Eric Barrett; Tim Gardner; Sandrine Boulanger; Ed
Kwan; Charissa Willard
Cc: Kevin Matthews; Brian Baker; Vikas Saini; dl-Clio
Subject: RE: Updates

=20

Good progress. Hard to explain why these files, since this seems to be
the same problem in the field with the same manifestation, that would
seem to rule out the flash. Curious where these files are located in the
installation package, are they near the end of the BOM?

=20

________________________________

From: Larry Scheer
Sent: Thu 12/14/2006 7:44 PM
To: Paul Hammer; Eric Barrett; Tim Gardner; Sandrine Boulanger; Ed Kwan;
Charissa Willard
Cc: Kevin Matthews; Brian Baker; Vikas Saini; dl-Clio
Subject: RE: Updates

I spent some time investigating the upgrade problems that occurred on
dogfood. Of the list of filer reported previously these files had
problems:

/usr/local/agile/lib/libucdagent.so
/usr/local/agile/lib/libucdmibs.so

/usr/local/agile/web/js/console/login.js=20

/usr/local/agile/web/js/console/view.js
/usr/local/agile/web/js/domain/util.js=20

I found that although their file size was the same, their check sums did
not match the source files stored in the compressed tar file used for
the release distribution.

=20

Examining the contents of the files I found that the files were
corrupted. They were not copies of the files from a previous release but
were versions of the files for 2.1.0.0 with either nulls or somewhat
random bits stored in the file. The files were unpacked by the upgrade
program from the source distribution and stored on the compact flash
incorrectly.

=20

I was able to replace these files with copies from the source
distribution without any problems. After copying good sources to the
same files on the compact flash their check sums verified. I modified
the pmtab to start snmpd and rebooted dogfood. The system is running and
snmpd is staying up.

=20

Although I was unable to determine the root cause, the most likely
reason for this type of problem is a degrading compact flash. I
recommended to Brian and Kevin to replace this flash at their earliest
convenience. After talking to Andy Sharp and Tim I ruled out network and
memory problems as a cause. If memory was an issue the system would have
been failing in other more obvious ways. If the network were an issue
the entire upgrade would have failed in a more obvious manner.

=20

I will add this information to the bug report.

=20

Larry

=20

________________________________

From: Paul Hammer=20
Sent: Thursday, December 14, 2006 8:58 AM
To: Eric Barrett; Tim Gardner; Sandrine Boulanger; Ed Kwan; Charissa
Willard
Cc: Kevin Matthews; Brian Baker; Vikas Saini; dl-Clio
Subject: RE: Updates

=20

Can we run the tests some place other than on MD?

=20

________________________________

From: Eric Barrett
Sent: Thu 12/14/2006 8:57 AM
To: Tim Gardner; Sandrine Boulanger; Ed Kwan; Charissa Willard; Paul
Hammer
Cc: Kevin Matthews; Brian Baker; Vikas Saini; dl-Clio
Subject: RE: Updates

I would also suggest comparison to mktg3 -- it's the other node in the
cluster.  It was not affected by the snmp issue despite also being
upgraded to Clio.

=20

_____________________________________________=20
From:   Tim Gardner =20
Sent:   Wednesday, December 13, 2006 7:43 PM=20
To:     Sandrine Boulanger; Ed Kwan; Charissa Willard; Paul Hammer=20
Cc:     Kevin Matthews; Brian Baker; Eric Barrett; Vikas Saini; dl-Clio=20
Subject:        RE: Updates=20

/mnt/usr/local/agile/lib/libucdagent.so=20
/mnt/usr/local/agile/lib/libucdmibs.so=20

These files are both used by snmp. We suspect that they may be the cause
of the snmp problem.=20
We need to do another upgrade to dogfood. But prior to the upgrade, we
should reinstall just the libraries=20
that have checksum differences and try to start snmp to verify that the
libraries are the cause of the problem.=20

The next issue will be to try to understand why system upgrade either
munged the libraries or simply=20
failed to upgrade them. Before overwriting them we should see if their
checksums match the checksums=20
of the libraries from the previous release that was on dogfood.=20

=20

_____________________________________________
From: Sandrine Boulanger
Sent: Wednesday, December 13, 2006 6:56 PM
To: Sandrine Boulanger; Ed Kwan; Charissa Willard; Paul Hammer
Cc: Kevin Matthews; Brian Baker; Eric Barrett; Vikas Saini; dl-Clio
Subject: RE: Updates=20

I copied the upgrade log (which was on the other flash), to
~sandrineb/traces/sys_upgrade.log. It looks like those files were
supposed to be upgraded and they did, check the Installing part. I don't
know why the checksum would not match then...

_____________________________________________
From: Sandrine Boulanger
Sent: Wednesday, December 13, 2006 6:46 PM
To: Ed Kwan; Charissa Willard; Paul Hammer
Cc: Kevin Matthews; Brian Baker; Eric Barrett; Vikas Saini; dl-Clio
Subject: RE: Updates=20

I ran a system compare, here are the results. We should get the system
upgrade log file on Dogfood.=20

The following files are different:=20
/mnt/etc/newsyslog.conf =3D> normal, always different=20
/mnt/usr/local/agile/lib/libucdagent.so=20
/mnt/usr/local/agile/lib/libucdmibs.so=20
/mnt/usr/local/agile/etc/pmtab =3D> scary, did someone modified that =
one?=20
/mnt/usr/local/agile/web/js/console/login.js=20
/mnt/usr/local/agile/web/js/console/view.js=20
/mnt/usr/local/agile/web/js/domain/util.js=20
/mnt/var/log/sendmail.st=20
/mnt/version =3D> normal, always different=20
Dogfood diag>=20

_____________________________________________
From: Ed Kwan
Sent: Wednesday, December 13, 2006 6:44 PM
To: Charissa Willard; Paul Hammer
Cc: Kevin Matthews; Brian Baker; Eric Barrett; Vikas Saini; dl-Clio
Subject: RE: Updates=20

Checksums of some libraries on dogfood don't match the release, e.g.=20

Dogfood# cksum libucdmibs.so=20
2089852042 3278706 libucdmibs.so=20

[edk@edk-linux lib]$ pwd=20
/n/build-trees/R2.1.0.0/R2.1.0.0-120906/nfx-tree/Build/ch/opt/lib=20
[edk@edk-linux lib]$ cksum  libucdmibs.so=20
4212168562 3278706 libucdmibs.so=20

Sandrine is running a "system compare" on dogfood.=20

_____________________________________________
From: Charissa Willard
Sent: Wednesday, December 13, 2006 1:20 PM
To: Paul Hammer
Cc: Kevin Matthews; Brian Baker; Ed Kwan; Eric Barrett; Vikas Saini;
dl-Clio
Subject: RE: Updates=20

Paul,=20

Ed's been looking into 16471 on MD. We still haven't been able to get a
stack trace.=20

-Charissa=20

=20

=3D=3D=3D=3D=3D State: Assigned by:edk at 12/12/2006 7:01:15 PM =
=3D=3D=3D=3D=3D

Ran the unstripped version of snmpd in gdb on dogfood, and still didn't
get a good stack trace:

(gdb) bt
#0 0x5ffe31a0 in ?? ()
warning: Hit heuristic-fence-post without finding
warning: enclosing function for address 0x5ffe46c0

I set breakpoints at main, __main, __start and __init, and snmpd got
SIGSEGV before hitting the break points.

Ran "readelf snmpd" on my Linux box, and the executable instructions in
snmpd don't map to the pc (0x5ffe31a0).
There are 15 shared libraries used by snmpd, but gdb says they are not
even loaded at the time:

(gdb) info sharedlibrary
No shared libraries loaded at this time.

I played with ldd and the "LD_TRACE_LOADED_OBJECTS" environment
variables, but I can't get any useful info.

May need to compile all the libraries with "-g" and load them on
dogfood...=20

_____________________________________________
From: Paul Hammer
Sent: Wednesday, December 13, 2006 1:14 PM
To: Eric Barrett; Vikas Saini; dl-Clio
Cc: Kevin Matthews; Brian Baker
Subject: RE: Updates=20

This one should have been submitted separately against 2.1, need core
Dev to have a look at this one in Clio. Agree that it looks similar to
the one at Shopzilla, however the version at the customer did not core
on MD nor did it leave a core file, appears now with 2.1 we have a core.


_____________________________________________
From: Eric Barrett
Sent: Wednesday, December 13, 2006 1:10 PM
To: Vikas Saini; Paul Hammer; dl-Clio
Cc: Kevin Matthews; Brian Baker
Subject: RE: Updates=20

snmpd issue is already filed as 16471.=20

=20

_____________________________________________=20
From:   Vikas Saini =20
Sent:   Wednesday, December 13, 2006 1:08 PM=20
To:     Paul Hammer; dl-Clio=20
Cc:     Eric Barrett; Kevin Matthews; Brian Baker=20
Subject:        RE: Updates=20
Importance:     High=20

Clio is going fine. First regression is complete. Second regression pass
is going on and should be complete soon. Just sent the Clio Soak
status..

There are still 2 defects in Dev court that needs resolution.=20

QA has verified all the defects which were targeted for Clio. Right now
we are verifying the defects which were not targeted for Clio but got
resolved in Clio.

I haven't seen any defect on snmpd issue. will file one soon. Yes that
should be resolved.=20

Thanks=20
Vikas=20

=20

_____________________________________________
From: Paul Hammer
Sent: Wednesday, December 13, 2006 1:01 PM
To: dl-Clio
Cc: Eric Barrett; Kevin Matthews; Brian Baker
Subject: Updates=20

Any updates on Clio status?=20

Soak Updates? Uptime numbers=20

Number of defects in Dev that we need to address still in Clio?=20

Defects in QA that we need to resolve?=20

I have not seen any update on the Defect found with SNMP on MD, assume
this is MF for Clio too.=20

-Paul=20

=20

