GN (GeneNetwork) Restructuring handbook

The purpose of GN restructuring is:

  • To formalize GN development. Implement appropriate project management strategy to the development team (including console team and remote team). The top priority is to isolate GN from personal influences and to make sure that the development process can carry on, whoever is involved.

  • To develop a match to cooperate with current BIRN grid, to expand current capacity. (waiting for maintenance on the BIRN rack, as of 7/30)

  • To increase the degree of certain characteristics, such as capacity, efficiency, robustness, security, integrity, usability, and etc.

  • To push GN as common welfare for the community.

  • To leave spaces for potential development relevant to AI implementation.

Our current status is that we are moving toward modularizing the current python code and/or rewriting it in Java (defunct as of 7/30). Right now we have a successful Bundle Server setup GNBundleSetup on http://132.192.47.28/ (this machine currently serves the Genenetwork main site, 7/30). At the same time we have a monitoring system installed on http://132.192.47.21/zabbix/ (this got taken down, 7/30). Although the issue tracker has not been used very often, hopefully in future we can have a more integrated management system installed.

FYI: Current tools which are implemented to maintain the development infrastructure is: GNToolBox

One of the main reasons of pushing the restructuring procedures is to cooperate with BIRN infrastructure. NIH/BIRN facility provides API/tools for members of GRID to integrate their own resources into BIRN therefore can be accessed in public. Right now BIRN is able to provides storage, computational rounds, in a technical way. In future ideally BIRN is able to provide services in abstract conceptual way.

Right now we are in the process of ontology description (BIRNLex) to our database structure;

"Data will be discoverable via a concept-based query interface and further queried and explored via database-specific query interfaces. The integration of data across multiple sources and domains requires common vocabularies to locate relevant information, to associate similar data types, and to discover knowledge through a higher level semantic network. All data sources will be annotated using a common ontology, BIRNLex, to enable intelligent exploration of the BDR data resources." -----http://nbirn.net/bdr/bdr_preview_data.shtm


On the time of my arrival ( around 11/10/2007 - 17/10/2007), the GN was hit by an lighting strike. The electric peak caused power supplies shock to protect the system. What exactly happened we dont know, but it appears to be:

  • Certain nodes are dead. (NIC dead, kernel shock, producing error message saying I/O error and memory error).
  • Database Opteron2 (used to be backup machine, was acting as database of production platform at the time) is dead. NFS is lost, NIC is dead even after reup the module.
  • There was no backup for the database. Opteron2 used to be the backup. While the Opteron (Main database) is dead, the Opteron2 was used directly, without making a further backup.
  • Nodes can not ping each other. 192.168.1.128(supposed to be the gateway) not acting as the gateway. as it was supposed to be.
The rescue of GN after the disaster was a chaotic job. During the rescue time it was messed up, for example finding the appropriate tool to export out data(hardware, software), to rebuild an identical database on another machine and later on rebuilt database server is on Fedora Core 7 operation system.

Since then, the work of optimization of old GN platform has been divided into consideration according to:

  1. reliability:
  2. Emergency recovery
  3. Robustness
  4. Capacity
  5. Integrity and Security
  6. Project Management, and Source Code Control
  7. Efficiency

Emergency Recovery

The first issue that we need to sort out is Emergency recovery schedule. For example we are working on the procedures of how to identify problems, how to replace the faulty components of infrastructure, and how to make sure the system is back online in a short time. (more or less done, 7/30). The designed procedures should be:

  1. Physically backup nodes and infrastructure are not located in the same building. This is an ideal configuration for emergency recovery, and usually can not be satisfied because of lack of office space. We currently put working GN infrastructure in fifth floor and backup nodes in second floor of Link building.
  2. The GN infrastructure should be fault tolerant. There should have no super nodes in the system, which may causes the whole system unusable. The plan is to use Load balance servers with bundle nodes to ensure LB server never distribute request to dead node. (Reasons of set nodes as bundle are listed later.)
  3. The backup bundle servers are stored in second floor, ready to be "plug and run" in case of emergency. The reliability of LB server is ensured by two servers which are cross configured. (everything is on the second floor now, 7/30)

Reliability

Concerns about reliability includes:
  • Reliability of Hardware; Includes testing of hardware
  • Reliability of Infrastructure: the proposed infrastructure is not therotically approved and it is a design based on experiences and latest techniques along with economic consideration.
  • Reliability of Data Resources; Arthur is responsible for validating authentication of data resources ( in future)
  • Reliability of Algorithm; (not decided yet).

Robustness

Robustness is a complicated concept. there is no practical definitions of a system to be robust.

Efficiency

Until now the most work was focused on increasing the efficiency of the GN calculation including interval mapping and calculation correlation maps. The interval mapping requires calculation over thousands of records in database, therefore optimizing database performance helps to increase the efficiency. In the bundle machine, most of the database access was through localhost sock, which is faster than communicating over the network after moving to the bundle server.

Capacity

With the implementation of Bundle server it is quite obvious that although there is an increase in efficiency, the capacity of GN decreases. ( the old GN platform has six nodes, and the GN bundle has only 1 node). Here if we introduce the Load Balance Server to connect N bundle servers, the capacity will increase dramatically.

Project Management, and Source Code Control

We have no project management and source code control systems implemented yet. But we are looking for a most appropriate software engineers management method and implement appropriate project management and source code control system to assist software development process. We agreed that to split the current platforms into development and production and update production platform every month. Rob has several brilliant ideas of this issue and we are looking for a match. (currently we use subversion, hosted on tyche.utmem.edu, 7/30)

Integrity and Security

This issue requires very detailed knowledge about OS and services security. We have corrected several major security issues such as html code directory world wide read/write, database world wide read/write without presenting username and passwords.

We use mysql tools and other open source tools to maintain the database synchronization and unique database input method, to make sure that the database is working under control.


Dehardcoding part:

Things have been quite successful in dehardcoding the GN bundle code library (previously written by Jintao, and others, maintained by HongQiang? ). The general purpose of this document is to produce an instruction of installing GN bundle in centos 5 Linux operation system. For some reason (So far maybe Mod_python session problem), the code is not working properly in Fedora Core (6 and 7) operation system. (See the link of: http://www.dscpl.com.au/wiki/ModPython/Articles/IssuesWithSessionObjects .

Topic revision: r18 - 02 Sep 2008 - 20:06:57 - RobWilliams
GeneNetwork.GNReStructuring moved from GeneNetwork.GNResStructuring on 28 Nov 2007 - 22:02 by FanZhang - put it back
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback