SE - Eric Crutchlow
DEV - Andy Sharp
We found ourselves in a building roughly the size of ours but turned over entirely to being a "machine room." It was filled with the usual stuff, except interestingly, no Intel-ish servers from Dell, IBM, etc., that I would have expected to see there. All Sun equipment. Most of the newer servers were Sun's AMD Opteron based systems.
The customer told us later that the reason ONStor peaked their interest was that they already had the storage, they just wanted to ditch the obsolete and featureless Novell NFS servers that fronted that storage. We found the two 2260 Bobcats mounted together at the bottom of a rack, completely wired up, and powered up. Apparently they had been that way, untouched otherwise, since they bought them possibly as far back as September 2006. Yeesh.
Eric quickly whipped out his laptop and connected via serial cable to the front console port, all the while maintaining an easy banter with Patrick. He expressed concern privately to me about the machines sitting there for so long unconfigured. Something about how things crash when that is allowed to happen. This is exactly the kind of lore the SEs carry around with them to try and accomplish their task as thoroughly as possible, and I find this sort of thing very annoying. We [Dev] need to make the product far less fragile so that this kind of thing can be relegated to the trash heap of distant memories.
Despite Eric's concern, and despite the fact that the filers were running version 1.3.1.1, he went through the initial config menu on the first filer -- utilising heavily the pre-filled out configuration worksheet from the customer -- and replaced the secondary flash with a pristine 2.2.2.7 flash he made himself. He proceeded to copy the newly created config information to the secondary flash with the system copy config command, followed by a system reboot -s. He repeated the process on the second filer.* Both rebooted without problems. I was impressed with the way CS has created and operates their processes regarding the customer configuration worksheet and related. Very smooth.
During this time both Eric and myself asked Patrick a few questions here and there -- we had at least an hour to kill while Eric went through the initial config menus on both nodes and rebooted both nodes. Especially since we had to wait several minutes for each filer to have some DNS queries associated with emrs time out before the boot would proceed. Since DNS had not been configured, there is no way these queries can be resolved, and the boot process shouldn't be made to wait for them anyway. *Bug 19557 filed.
There are two things that are not handled in the initial config menu which make life unfortunate for the installer. DNS and NTP should both absolutely be handled in the initial config menu. Because they aren't the installer has to bring the system up and then configure them, and there are multiple infelicities associated with this. One problem is that NTP needs to be started correctly so that all the filers in a cluster have a synchronised time, otherwise clusterdb and other cluster related operations will screw up because of a heartbeat/time mismatch. This does not happen when NTP is configured through the CLI. The correct NTP startup sequence is only performed at boot time. Currently the installer either sits and waits for NTP to slowly sync itself, or reboots (again) to get the correct startup sequence for NTP to be executed. But any service that has a hostname as part of it's configuration information is trouble until DNS is configured. This would likely include NIS, email (sendmail), automount, NTP, and so on. The problem associated with EMRS has already been mentioned. Bug 19558 filed against NTP.
After the initial config phase was over, we returned to the original building and went to the desk of the sys admin who would drive the WebUI for the rest of the configuration: vservers, volumes, shares, oh my. Thierry, who, judging by the 8 foot high Zidane poster on the wall behind him, is a grumpy expatriot Frenchman, was nevertheless a very competent and knowledgeable sys admin. After some confusing moments at first involving vservers, he ripped through the config and had NFS and CIFS shares up in relatively short order. The phrase "oh -- sweet" was heard to pass his lips more than once during the process.
Patrick mentioned that he wasn't sure that all the network routes and things had been set up correctly, but that he guessed we would find out soon enough. Apparently, there are multiple IT related groups that sometimes have trouble talking to each other: storage, networking, lab manager(s), and then of course there is the end user. This can often result in the "ya can't get thar from hyar" syndrome: the average user or filer administrator can't actually directly access the management interface of the filer from his or her desktop. Our product design and our processes need to take this kind of thing into consideration wherever possible because I strongly suspect it is quite common especially at overly large companies. For example, Thierry could not directly access the WebUI from his desktop -- the two firewalls in between are run by two different groups, and the routes and pin holes would have to be coordinated ... a multi-day task at the very least. He was able to circumvent by logging into a machine that was halfway there and using a remote desktop technology. This had a very definite effect on the way the WebUI behaved. It made it look more sluggish than it really is, and certain components like the active mouse icon meant to indicate "processing..." didn't function properly. In fact it looked like a large fat hook that followed the mouse around. Obviously the ONStor employees knew what it was, but the customer had no idea at first.
One of the things I realized while noticing the three NetApps in the racks
around there that seemed to be unused or rarely used (thanks, NetApp,
for putting an ops/sec meter on the front bezel!), is that the Bobcat
is a very decent product that we have not realized the full potential
of yet in terms of ease of use and reliability. While sounding like a
back-handed compliment, the good news in that is if we can execute on the
notion of incrementally increasing ease of use and reliability over time,
without backsliding, this product, and future products based on the same
software, should be extremely tough competitors in the NAS market.
That's an exciting feeling indeed.