AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20061214110350.44040f1e@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<wencheng.chai@onstor.com>,<maxim.kozlovsky@onstor.com>,<jay.michlin@onstor.com>,<dl-software@onstor.com>,<ian.brown@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E01400EA9@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Thu, 14 Dec 2006 11:04:07 -0800
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Wencheng Chai" <wencheng.chai@onstor.com>
Cc: "Maxim Kozlovsky" <maxim.kozlovsky@onstor.com>, "Jay Michlin"
 <jay.michlin@onstor.com>, "dl-Software" <dl-software@onstor.com>, "Ian
 Brown" <ian.brown@onstor.com>
Subject: Re: Notes from Our Discussion about 20% Effort on
 Internal/Improvement Projects
Message-ID: <20061214110407.32da6557@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E01400EA9@onstor-exch02.onstor.net>
References: <BB375AF679D4A34E9CA8DFA650E2B04E01C0AB9B@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E01400EA9@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

What about just linking in a memory allocation debugging library, like
mpatrol?  With a little bit of work we were able to link with this at
my last embedded project and it found all kinds of crazy errors that
probably never would have been found.

Obviously we would only use something like this at a "problem" site and
never in normal production code, but it's probably the easiest/closest
we will get to a useful tool given our context.

Cheers,

a

 On Thu, 14 Dec 2006 10:15:53 -0800 "Wencheng Chai"
<wencheng.chai@onstor.com> wrote:

> 
>     I second Max's comments regarding error tracing mechanism in the
> code.
>     Windows has a tracing mechanism called WPP tracing which is a 
>     very useful debugging tool, it would be very helpful for
> troubleshooting
>     if we have similar tool.
> 
>     Wencheng
> 
> 
> -----Original Message-----
> From: Maxim Kozlovsky 
> Sent: Thursday, December 14, 2006 9:31 AM
> To: Jay Michlin; dl-Software
> Cc: Ian Brown
> Subject: RE: Notes from Our Discussion about 20% Effort on
> Internal/Improvement Projects
> 
> Here are some items - if you need more, just ask.
> 
> Eliminate sendAgile - should be at the top of the priority order.
> There is no need to do the whole thing at once, just pick a piece of
> code and make it use RMC. 
> 
> Global memory statistics, runtime type information for FP/TXRX/FC -
> must have to be able to solve the memory leaks problem happening in
> the field in timely manner instead of working for months on a single
> problem like Motorola's buffer leak.
> 
> Support NFS mount of the EFS volumes from the SSC through the
> loopback - should be easy to implement and this takes care about the
> number of problems as it provides the SSC with lots of disk storage.
> The examples are: 1) flash wear off problem - logs can be stored
> directly on the management volumes so there is no need to write to
> the flash anymore except during upgrades and boot up 2) upgrades -
> fixes the failures from the lack of temporary storage 3) Space to
> write coredumps without overflowing /var partition
> 
> Memory leak detector - Find the slow memory leaks proactively instead
> of waiting for the customers to come up with the workload which makes
> slow leaks fast
> 
> Parallel make system - the builds take way too much time.
> 
> Allow separate reboot of FP/TxRx/FC on bobcat, make it the same as
> cheetah - the compile/run cycle is way too long on bobcat.
> 
> Error traceability - in the current product it is sometimes hard to
> trace the error reported by various daemons to the origin of the
> problem. The requests originated by the clients or created internally
> should have a unique identifier carried through the RPCs executed on
> the behalf of the request, the error logs and the CLI logs should
> include that identifier so the failures can be related to the command
> that failed. 
> 
> Tracing of the internal messages - this will help in diagnosing the
> problems, performance tuning and understanding how the system works.
> For example, the recent discussion about what happens in the system
> during the failover could be made much shorter if we would be able to
> look at the message trace. Make Charissa do a nice GUI to display the
> traces. 
> FC coredump - or may be wait for Cougar so the problem goes away. 
>  
> 
> > -----Original Message-----
> > From: Jay Michlin 
> > Sent: Wednesday, December 13, 2006 4:50 PM
> > To: dl-Software
> > Cc: Maxim Kozlovsky; Ian Brown
> > Subject: Notes from Our Discussion about 20% Effort on 
> > Internal/Improvement Projects
> > 
> > Hello all,
> > 
> > In our staff meeting today we brainstormed on projects we 
> > might do as part of the Delorean release (probably for March 
> > or April) to attend to areas of our code we think need 
> > attention. These projects won't likely add immediate features 
> > that are visible to customers, but rather will add strength, 
> > robustness or solid foundation to our overall product. The 
> > payoff is long term, but it's work that must go on constantly.
> > 
> > Here (in no particular order) are the items we mentioned:
> > 
> > * Encryption and/or compression for DM-IP
> > * Improve upgrade time and upgrade reliability
> > * Vol Create /8 LUNs
> > * Eliminate SendAgile
> > * Rewrite tpl/fp
> > * Rewrite/clean up/refactor sanmd
> > * eee buffer tagging/descriptor tagging
> > * Mirror repair or resynchronize
> > * Verify the file system log
> > * Authentication for DM-IP
> > * eek for clusDB
> > * Shrink the clusDB
> > 
> > Some of the best ideas come in a second round, after seeing a 
> > set of notes such as these. So please do review them and if 
> > you have thoughts, comments or new ideas, please send them to 
> > the entire list. As we develop the details of Delorean 
> > planning in the next 3 weeks, we will try to include some set 
> > of these projects along with the feature development and 
> > committed file system hardening.
> > 
> > jay
> > 
