The problem presented by the hard coding of GN is serious. Almost everything has been hard bound to the domain name www.genenetwork.org.
For example, www.systemsgenomics.com maps to www.genenetwork.org but doesn't work, since the GN bundle code allows only www.genenetwork.org to work properly. This set up really confuses me. There are no special benefits to doing that.
The dehardcoding procedures will modify the code to accomplish the following:
- The GN html and python code can be put any directory. The latest Linux system apache default directory is "/var/www/html", and with specification of httpd it can be put anywhere. The current GN setup only allows the bundle code to be put into /gnshare/gn/web/webqtl, which is not good. For example, for a modern server configuration, the OS is usually put into / by RAID 1, and the functioning code base is put into a RAID 10 if there is big data requirement. Another benefit is that if the server crashes the data will still work.
- Foreign databases will be synchronized to one server.
- The distribution of the GN mirror requires GN to not be hard bound into certain server or directories. otherwise every mirror setup will require a large amount of work.
Record of Dehardcoding
Kev has been working on commenting the code, and Fan has been working on ripping the database connection modules off from the code, and using a unified connection to deal with all DB I/O.
Most of the comments are on the machine 132.192.47.13. They're labeled Kev Adler, or just KA. For most files, I tried to summarize what they do and, in relevant cases, what they're called by. A few files were already well commented, but by and large that wasn't the case. Although I don't understand how all the functions work, I've tried to summarize those as well, in terms of what input/output a programmer can expect.
Taking off the 777 privileges
It is ridiculous to make the whole GN code base directory world wide accessible!!! It is a miracle that the GN code base was not hacked or modified accidentally.
On the bundle exp machine the GN code base is set to 644 and privileged by apache. After the dehardcoding process is over, a privilege table will be presented.
Unbind code connection with the URL www.genenetwork.org
Almost all html code has a
tag in the code forcing the base directory for all files to www.genenetwork.org. Currently, when a mirror is set up, a python script is run to replace the old URL with the new one. Better would be to comment it out; if the significant directories are structured appropriately, the default, machine-dependent base directory should work.
- SNAG-0002.jpg:
Taking out the hard link "/"
The "/" needs to be removed and the appropriate directories relocated to "DocumentRoot", defined in httpd.conf. By taking this out we allow the GN code base to be put in any directory which http can access.
- SNAG-0003.jpg:
Additionally, the folders /images, /genotypes and etc need to be taken care of, as well as all the /*.html file. A related problem is that the html is itself generated by code. There are some really strange html commands. Those tags need to be replaced by the most common ones to maintain compatibility.
Separate the Beta code from Production code
In the production code base the beta code is removed. The production code has only been updated from beta code. Beta code will be put only in an experimental platform.
For example accountX.html has been removed.
dbdoc problem
Many files are not in the appropriate directory. For example most of the links in
http://www.genenetwork.org/advancedSearch3.html are not working. The info file needs to be collected from the database and reproduced.
- SNAG-0004.jpg:
The hard link to genenetwork.org has also been removed.
blatinfo problem
This is either a hard coding problem or a serious directory missing issue.
We need to find out what exactly it is.
- SNAG-0005.jpg:
Unified Glossary.html and References.html.
The references of Genenetwork will be put on www.genenetwork.org/references.html if it is a bundle server, if not, put a references.html so that we need to update only one references.html to be synced world wide.
- SNAG-0006.jpg:
Directory Setup
For example the images/upload is a directory for important genotype and phenotype information. image/upload is temp directory contains mid-process files. I reckon that images/upload is to be changed to another directory which makes sense.
- SNAG-0007.jpg:
Dynamic Content Management
The following code shows that the selection on GUI are hard coded in the html file, which is hard to extend. Using a dynamic content management is an optimal choice. but big job.
- SNAG-0008.jpg:
list of Base URL which is taken off:
*www.genenetwork.org
*www.webqtl.org
*web2qtl.utmem.edu
*webqtl.utmem.edu
There are totally 613 hard links removed from GN code base.
Unifying DB Link to MySQL database
In the old code, there are 5 types of connection set to
MySQL database which are totally unnecessary. some of the connections even does not allow username and password authorization.
here, we unify all DB link to one, and set them into webqtlConfig file. in future we just need to modify webqtlConfig file once if we need to change database configuration.
- SNAG-0009.jpg:
- SNAG-0010.jpg:
Analysis of .htaccess file
Python code requires detailed configurations set to the code base directory. This section analyses the current .htaccess file and makes changes to it to make sure that GN codebase can be moved around.
Another purpose is to simplify the .htaccess setting.
In the home directory there is a .htaccess file like the following:
- SNAG-0000.png:
Options + Includes override the previous httpd.conf setting to allow CGI to be run. GN code base httpd.conf is complicated and each directory is set to certain permissions.
Options -Indexes disable indexing and directory browsing, which is useful.
XBitHack? on: xbithack tells Apache to parse files for SSI directives if they have the execute bit set. So, to add SSI directives to an existing page, rather than having to change the file name, you would just need to make the file executable using chmod. I have doubts on this particular setting.
The .htaccess file of webqtl directory(python code directory) is set as following:
Each piece of python code(file) has to inform the apache of its
PythonHandler? . otherwise Apache does not know what it is happening.(a bit inmature)
- SNAG-0001.png:
- SNAG-0002.png:
Pythonpath defines the directory of library which mod_python search through. it will be replaced as not absolute path and sys.path
like this:
PythonPath? "['../webqtl'] + sys.path"
One thing is that the required libraries have to be installed in sys.path directory.
then GN code base is able to be moved around.