Conference Call 7/31/2006 Concerning Future of GeneNetwork
Participants:
BillBug? ,
RobWilliams? ,
KenManly? ,
StephenPitts? ,
ZhaoHuiSun? ,
HongQiangLi?
Meeting agenda covered high-level architectural overview and codebase.
ActionItems are on the
ActionItems page.
Architecture Overview (see GnHardware for more info)
HongQiangLi? summarized the current system: eight nodes on
GnCluster (
ClusterNode1? to
ClusterNode8? ),
HeadmasterMachine (master node),
WebqtlMachine,
WebtwoqtlMachine, and
OpteronMachine (database server). There was some confusion over the exact specifications of the machines.
RobWilliams? commented that the nodes are only necessary when he teaches a
GeneNetwork class because of the strain it puts on the servers.
KenManly? and
RobWilliams? noted that concurrent read requests (SELECT statements) tax the database, especially when generating correlation matrices.
RobWilliams? ,
StephenPitts? , and
HongQiangLi? noted that database updates lately lock the Data table, which effectively locks the entire database because most database queries use data from the Data table.
BillBug? suggested that
MySQL could be reconfigured to use row-level locking.
KenManly? confirmed that the current implementation uses the ISAM backend. The
InnoDB? backend might be a better choice.
A discussion ensued as to the best option for improving database performance.
RobWilliams? agreed to look into buying a second Opteron server to complement the existing machine.
JintaoWang? may have investigated
MySQLCluster? in the past, but someone will investigate it again once the new machine arrives. It was a consensus that a formal or informal requirements analysis should precede any architecture changes.
MySQL query logs on
OpteronMachine would be good source of data to figure out how the system is being used now.
Finally, all parties agreed that root passwords should be changed on all GN project machines as a procedural matter.
Codebase
StephenPitts? provided an overview of the current codebase. All agreed that the best option is to treat the
BetaBranch? as the current codebase. A Subversion Repository will be set up for all of the code (see
GnRepository? ) using three branches:
BetaBranch? ,
ProductionBranch? , and
StagingBranch? , based on the advice in an email from
BillBug? . Development will proceed on the
BetaBranch? , and periodically beta copies will be tagged and rolled over to the
StagingBranch? , where after internal testing they will be released to the
ProductionBranch? .
Here is an excerpt from
BillBug? 's email:
It should be possible to go to the "root" of the GeneNetworks code on any given implementation machine - "old", "beta", etc. - and issue an 'svn log' command to find out when the most recent file was committed to the repository in any given installation. This will tell us which revision node in the single path SVN tree is currently installed on each machine. With this knowledge, it should be possible to create "tagged" releases of each of those revisions that would be locked forever in time (by convention, tags are a separate root branch you create in a SVN project tree used for fixed "version" releases). So for instance, if the most recent SVN revision on the current "old production" machine is 434, we'd create a tagged release for rev 433 (calling it "gn-prod-v1.0", for instance). We could then completely reconstitute the "prod" configuration by simple check-out a working copy of that tagged release to a new machine. When a new "prod" release has been fully tested on the "dev" box, it could be created as a tagged release in SVN which could then be merged with previous "prod" working copy to bring it up-to-date. Merges can be a problem under conditions where multiple folks are developing the same file simultaneously, but in the scenario given about they are quite trivial to carry out. Since the new release is intended to "replace" the old, if you run into conflicts during the merge, you simply over-write with the new files.
Also,
StephenPitts? will attempt to document the
GnCodebase on a file level.
KenManly? said that he is available to answer specific questions.
Additional Comments
BillBug? spoke briefly about
CentOS, a Linux distribution focusing on enterprise-level clustering. Also, he indicated that he would like to explore the possibilities of exporting
GeneNetwork technology in RDF format to make use of Semantic Web technology.
Post-Meeting Consultation
After the meeting,
ZhaoHuiSun? ,
StephenPitts? ,
RobWilliams? , and
HongQiangLi? discussed a specific timetable for implementing the changes.
StephenPitts? will work on setting up a
SubversionRepository? and reconfiguring Apache to create three separate installations of
GeneNetwork -- beta, production, and staging -- running on two different hosts.
HongQiangLi? will install
FedoraCore5? on
ClusterNode1? to work on a new node image and on web2qtl to work on a new database image, documenting any changes from the base installation.
The first goal is to upgrade
ClusterNode1? through
ClusterNode8? ,
HeadmasterMachine, and finally
WebqtlMachine to Fedora Core 5 as new web servers. In conjunction with this, the second goal is to bring
Web2qtlMachine? online as a second database node running Fedora Core 5 and to use the documentation from configuring it to reconfigure
OpteronMachine. A third database server, if purchased, could be used in conjunction with
OpteronMachine to create a
MySQL cluster. Thus, the new system would have three separate web code bases and two different database servers: one for production and one for beta.
Next, the group suggested a new
FeatureRequest? regarding precomputed
CorrelationMatrices? (
RobWilliams? , please write more on this).
In addition,
ZhaoHuiSun? said that he would work on becoming familiar with the
GeneNetwork system and with working on a set of use cases as part of a formal specification of
Finally, the group discussed backup options. At present, cron jobs on
WebqtlMachine dump the data in
OpteronMachine to
WebqtlMachine and the
QTLReaperCluster? . The
GeneNetwork codebase is only backed up to
WebqtlMachine. It is unclear where home directory data is backed up, and it is also unclear what, if anything, is being backed up on the
RetrospectServer? .
Regarding off-site backup,
RobWilliams? suggested that the group purchase three 500 GB hard drives and use them for off-site archiving.
StephenPitts? ,
ZhaoHuiSun? , and
HongQiangLi? will meet again on Thursday at 10 AM to discuss ongoing progress.
--
StephenPitts - 31 Jul 2006