X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C71FAB.E4A27B63@onstor-exch02.onstor.net>; Thu, 14 Dec 2006 10:15:53 -0800
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Content-class: urn:content-classes:message
Subject: RE: Notes from Our Discussion about 20% Effort on Internal/Improvement Projects
Date: Thu, 14 Dec 2006 10:15:53 -0800
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E01400EA9@onstor-exch02.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E01C0AB9B@onstor-exch02.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: Notes from Our Discussion about 20% Effort on Internal/Improvement Projects
thread-index: AccfGcgVq/bkmbtIR8OF5A6mFWjFTwAh7CHQAAImFHA=
From: "Wencheng Chai" <wencheng.chai@onstor.com>
To: "Maxim Kozlovsky" <maxim.kozlovsky@onstor.com>,
	"Jay Michlin" <jay.michlin@onstor.com>,
	"dl-Software" <dl-software@onstor.com>
Cc: "Ian Brown" <ian.brown@onstor.com>


    I second Max's comments regarding error tracing mechanism in the
code.
    Windows has a tracing mechanism called WPP tracing which is a=20
    very useful debugging tool, it would be very helpful for
troubleshooting
    if we have similar tool.

    Wencheng


-----Original Message-----
From: Maxim Kozlovsky=20
Sent: Thursday, December 14, 2006 9:31 AM
To: Jay Michlin; dl-Software
Cc: Ian Brown
Subject: RE: Notes from Our Discussion about 20% Effort on
Internal/Improvement Projects

Here are some items - if you need more, just ask.

Eliminate sendAgile - should be at the top of the priority order. There
is no need to do the whole thing at once, just pick a piece of code and
make it use RMC.=20

Global memory statistics, runtime type information for FP/TXRX/FC - must
have to be able to solve the memory leaks problem happening in the field
in timely manner instead of working for months on a single problem like
Motorola's buffer leak.

Support NFS mount of the EFS volumes from the SSC through the loopback -
should be easy to implement and this takes care about the number of
problems as it provides the SSC with lots of disk storage. The examples
are: 1) flash wear off problem - logs can be stored directly on the
management volumes so there is no need to write to the flash anymore
except during upgrades and boot up 2) upgrades - fixes the failures from
the lack of temporary storage 3) Space to write coredumps without
overflowing /var partition

Memory leak detector - Find the slow memory leaks proactively instead of
waiting for the customers to come up with the workload which makes slow
leaks fast

Parallel make system - the builds take way too much time.

Allow separate reboot of FP/TxRx/FC on bobcat, make it the same as
cheetah - the compile/run cycle is way too long on bobcat.

Error traceability - in the current product it is sometimes hard to
trace the error reported by various daemons to the origin of the
problem. The requests originated by the clients or created internally
should have a unique identifier carried through the RPCs executed on the
behalf of the request, the error logs and the CLI logs should include
that identifier so the failures can be related to the command that
failed.=20

Tracing of the internal messages - this will help in diagnosing the
problems, performance tuning and understanding how the system works. For
example, the recent discussion about what happens in the system during
the failover could be made much shorter if we would be able to look at
the message trace. Make Charissa do a nice GUI to display the traces.
=20
FC coredump - or may be wait for Cougar so the problem goes away.=20
=20

> -----Original Message-----
> From: Jay Michlin=20
> Sent: Wednesday, December 13, 2006 4:50 PM
> To: dl-Software
> Cc: Maxim Kozlovsky; Ian Brown
> Subject: Notes from Our Discussion about 20% Effort on=20
> Internal/Improvement Projects
>=20
> Hello all,
>=20
> In our staff meeting today we brainstormed on projects we=20
> might do as part of the Delorean release (probably for March=20
> or April) to attend to areas of our code we think need=20
> attention. These projects won't likely add immediate features=20
> that are visible to customers, but rather will add strength,=20
> robustness or solid foundation to our overall product. The=20
> payoff is long term, but it's work that must go on constantly.
>=20
> Here (in no particular order) are the items we mentioned:
>=20
> * Encryption and/or compression for DM-IP
> * Improve upgrade time and upgrade reliability
> * Vol Create /8 LUNs
> * Eliminate SendAgile
> * Rewrite tpl/fp
> * Rewrite/clean up/refactor sanmd
> * eee buffer tagging/descriptor tagging
> * Mirror repair or resynchronize
> * Verify the file system log
> * Authentication for DM-IP
> * eek for clusDB
> * Shrink the clusDB
>=20
> Some of the best ideas come in a second round, after seeing a=20
> set of notes such as these. So please do review them and if=20
> you have thoughts, comments or new ideas, please send them to=20
> the entire list. As we develop the details of Delorean=20
> planning in the next 3 weeks, we will try to include some set=20
> of these projects along with the feature development and=20
> committed file system hardening.
>=20
> jay
>=20
