Hardware Architecture of GeneNetwork (2007-now)
The Genenetwork rack resides in 211 Link. It consists of:
- 5 Dell 860's (mirroring and testing)
- 2 Dell 1950's (main server and development)
- 1 Dell 2950 (backup and storage)
- 1 Tape drive (backup)
- 1 Cisco switch (managed by UT's IT department)
- 1 APC UPS
Hardware Architecture of GeneNetwork (2003-2007)
Most of GeneNetwork runs on a small computer cluster on the fifth floor of the Johnson Building at UTHSC. There are now five machines connected directly to the public network (100 Mbps connections) and eight Apache-Python compute nodes behind a 1 Gbps switch. The rack also has a 16-port Belkin KVM switch (Omniview Pro2 16-port) and an (unused) SNAP NAS server from Quantum (now outdated). The year 2003-2006 GN rack (001A) was bought for electrophysiology in 1989. See the attached static diagram of the GN cluster (late 2006) and this live
diagram of
MySQL components.
- WebqtlMachine (webqtl) is the main GN production site web server (1U HP Proliant DL360 Xeon system).
- WebtwoqtlMachine (web2qtl) is a development site server (1U HP Proliant DL360 Xeon system).
- Opteron is the production MySQL database server (2U dual processor AMD Opteron 280 from Monarch Computer with 16 Gb RAM and 755 GB effective hard drive capacity).
- Opteron2 is a mirror of the production MySQL database server (2U dual processor AMD Opteron 280 from Monarch Computer with 1.9 TB effective hard drive capacity).
- GeneNetwork Headmaster node is our LVS and LDAP server (see GeneNetwork Application Architecture) and the head node of a small GN compute cluster (1U Intel P4 2.2 GHz system from Supermicro).
- GeneNetwork Cluster nodes (ClusterNode1, ClusterNode2, ClusterNode3, ClusterNode4, ClusterNode5, ClusterNode6, ClusterNode7, ClusterNode8) are production LVS nodes that run Python and Apache. All nodes are 1U Supermicro Intel P4 2.2 GHz computers.
The GeneNetwork Hardware Architecture Diagram attached to this page provides an overview of how these components are configured. (It does not show Opteron 2 and the public switch.)
In addition, the QTLReaper cluster, shown below, is an Apple Workgroup Cluster for Bioinformatics. It consists of eight dual-processor G5s, each with 2 Gb RAM. The system is equipped with an Asante GX5-800 Gb Ethernet switch and is mounted in an
XtremeMac? XrackPro? . (purchased by KFM early in 2004).
--
StephenPitts - 31 Jul 2006
- Bill Bug note (2006-09-03): The network diagram should be checked against what is in the 001A rack. In particular, I'm not certain what role the Asante 10/100 Mb/s switch plays.
Plans to Update Hardware (May 2007-Aug 2007)
Given the reliance of many users on
GeneNetwork, our hardware, rack, and network infrastructure needs to be improved. We now need better performance, reliability, extensibility, and interoperability. Key components are aging, and the layout of components and computers on the rack is disorganized and difficult to maintain and upgrade. The system is still fragile, and failure of single components can take down the system. (For example a failure of GN on June 23, 2007 was caused by a full hard drive on opteron.)
On July 3, 2007 we (Hongqiang Li and Arthur Centeno) received a large order from Dell (see attachment 5:
NewGNRackSystems? .html). The order consists of 2x 42U racks (to be labeled "North" and "South") and a 24U rack that may eventual house test and development equipment. For the next month we will move servers into the 24U rack for initial set up and configuration. Once we have gained enough competence in rack set up we will configure one of the 42U racks fully. We will need to buy cable management paraphernalia (color-coded Velcro, coded Cat 5 cable, custom-length power cords, snap-in panels, cable management arms). We may want to install 2x 1U 24-port gigabit switches for redundancy (see the
TechRepublic? .com thread on "High density rack wiring - best practices?" (post 17).
Kev Adler will be helping to set up the 24U rack.
Dr. Fan Zhang from the University of Surrey will be joining our GN group, summer 2007. In addition to his background in bioinformatics, Dr. Zhang has extensive background in cluster hardware configuration and operations. In anticipation of his joining the group, we are now planning a significant upgrade of the GN hardware and rack infrastructure. One goal is to move toward network equipment that is compliant with the Biomedical Informatics Research Network (BIRN) rack. In particular this involves us of a sophisticated managed switch (Cisco Catalyst 3750E).
RW Williams has attached an Excel spreadsheet with a list of proposed equipment to be purchased for two GN equipment racks. This spreadsheet also includes equipment for a proposed system for Dr. Guus Smit and the
NeuroBSIK? consortium in Amsterdam. Finally, this spreasheet also includes information on the
BIRN rack specifications.
RW Williams modified the final order to make it more compliant with equipment supported by BIRN. Compliance is not perfect, primarily because the equipment that we have purchased is about 1 year more modern than BIRN recommendations.
In brief, the new GN hardware will initially consist of two parallel and essentially identical racks of computers (GN West and GN East) both of which will be placed in the 2nd floor of the Link Building. Each rack will have
- 1 file server (2x Quad Core Xeon E5345 8 GB RAM, Intel PRO 1000PT Cu, dual port PCIe NIC, 6x 750GB 7.2 K RPM SATA2 drives, DRAC5 card, 24x CD-RW/DVD, redundant power supply) (supplied with 16 GB RAM, please move extra 8 GB to MySQL server below)
- 1 apache server (2x Quad Core Xeon E5345 8 GB RAM, Intel PRO 1000PT Cu, dual port PCIe NIC, 2x300GB 10 K RPM SAS drives, DRAC5 card, 24x CD-RW/DVD, redundant power supply)
- 1 MySQL server (2x Quad Core Xeon E5345 16 GB RAM, Intel PRO 1000PT Cu, dual port PCIe NIC, 2x300GB 10 K RPM SAS drives, DRAC5 card, 24x CD-RW/DVD, redundant power supply)
- 5 compute nodes (PowerEdge? 860 with a single Dual Core 3050 Xeon, 2 GB RAM, 80 GB SATA drives).
- Each 42U rack (Dell 4210) will be equipped with a 3000VA UPS, a 16 port KVM digital switch (2161DS/2 PowerEdge? ), and a 1U console tray computer with touchpad keyboard and 15 inch flat panel monitor.
I have attached the equipment list from Dell with prices.
Bill Bug comment 2007-May-12
Hi Rob,
I've had a chance to read through your machine specs. They sound fantastic to me. With Dr. Zhang on board, you should be able to pull this together quickly. The KEY will be how well the GN code has been refactored to separate out data from code, so that replication through the subversion server can be made a trivial process.
I think its a good idea to have the computational nodes configured as single CPU (quad core) machines. This way, you can add more nodes at less cost, should the computational demand increase. We should really investigate setting you up to use the generic pipeline wrappers we run from CRON, which make it very easy to assemble a loose confederation of processing nodes that can pickup jobs of a central queue as they (the computers) become available. One can think of this as a "poor mans" way of using a cluster, but it's a helluva lot easier than refactoring the underlying code to both be multi-threaded (in an intelligent way) AND to use something like MPI to pass thread jobs out to a cluster. Even when using the more advanced GRID APIs (such as Condor or Globus), you need a very sophisticated programmer working on this on a regular basis to make that effort pay off. With the approach we've taken, you stick with commodity machines - well provisioned with computational heft but not too pricey - then you just think about how to set up your computations so they run as fairly limited, small jobs. These jobs are placed in a queue, and with the generic scripts running on each node via CRON, when a computational node becomes ready, it picks up the next job on the queue, pulls the required data over from your NAS, runs, puts the results back on the NAS, then iterates to the next job on the queue. This way, if one node goes down, it doesn't compromise the overall processing farm throughput much. The only caveat is you need to be able to chop your computational task up into moderately distinct, uncoupled jobs, where there are no data dependencies between jobs.
Anyway - just a though for the future.
I would add one of the things
CentOS comes with is a package management system that deals with system library dependencies and managing collections of machines (for easy mirroring and backup). The system is called YUM:
<http://linux.duke.edu/projects/yum/>http://linux.duke.edu/projects/yum/
<http://www.centos.org/docs/4/html/yum/>http://www.centos.org/docs/4/html/yum/
The next step that I don't think has been addressed yet is to determine where all the database calls are in the GN Python code. With a list like this in hand, you'd then be able to better evaluate where and when it would be appropriate to make changes to the datamodel to add more normalization and introduce other modules such as pieces of biowarehouse or the GMOD CHADO sub-schemas to better accommodate handling particular types of data (mapping info, raw expression data, etc.).