BDF2 genotype dataset (this can be used as a sample for adding new genotype data sets):

  1. InbredSet? Table: insert a record to explain the new dataset
  2. GenoFreeze? Table: insert a record to create the relationship between the Inbred and the Genotype file
  3. Strain Table: insert the strains
  4. StrainXRef? Table: insert the relationship of Strain and InbredSet?
  5. GenoXRef? Table: insert the relationship of GenoFreeze? , Genotype, and Data
  6. Data Table: insert the marker's value of each strain
  7. Create a Genotype file under /node_apps/www/html/webqtl/genotypes

-- HongqiangLi - 28 Sep 2006

Illumina microarray dataset (this is an example for adding a new type of microarray data set):

1. GeneChip? Table: insert a record to explain the new array type or chip

2. ProbeFreeze? Table: insert a record to explain the probe or probe set data set. Note: even we don't have probe data actually, we also need add these records (these records do not relate with real data, since we do not use the Illumina's single bead/probe level data)

3. ProbeSetFreeze? Table: insert a record to explain the Microarray data set (ProbeSetData? set), the ProbeFreezeId? item should be the value that step 1 inserted.

4. ProbeSetXRef? and Data Table: insert the expression values to Data Table and the statistic values(mean, lrs, pvalue, etc) in ProbeSetXRef? , the ProbeSetFreezeId? item should be the value that step 2 inserted.

Ensembl Probe Location Tables:

At first, download Ensembl's dataset (it should be mus_musculus_core_*, * means the version), insert them to the 'test' database using the included sql script files), then use the follow scripts to create the Tables in 'db_webqtl' database

1. use db_webqtl;
2. create table EnsemblChip (Id int not NULL, ProbeSetSize int not NULL, Name varchar(40) not NULL, Type enum('AFFY', 'OLIGO'));
3. insert into EnsemblChip (select oligo_array_id, probe_setsize, name, type from test.oligo_array);
4. create table EnsemblProbe (Id int not NULL, ChipId int not NULL, ProbeSet varchar(40), Name Varchar(40), length int not NULL);
5. insert into EnsemblProbe (select oligo_probe_id, oligo_array_id, probeset, name, length from test.oligo_probe);
6. create table EnsemblProbeLocation (Chr char(2) not NULL, Start int not NULL, End int not NULL, 
                                     Strand int not NULL, MisMataches int , ProbeId int not NULL);
7. insert into EnsemblProbeLocation(select name, seq_region_start, seq_region_end, seq_region_strand, mismatches, oligo_probe_id 
                                   from test.seq_region as seq_region, test.oligo_feature as oligo_feature 
                                   where seq_region.coord_system_id=1 and seq_region.seq_region_id=oligo_feature.seq_region_id);
8. update EnsemblProbe 
    set name=concat(right(concat('XX', left(name,instr(name,':')-1)),3), 
                    right(concat('YY', mid(name, instr(name, ':')+1, instr(name, ';')-instr(name, ':')-1)),3)) 
    where length(left(name,instr(name,':')-1))<=3 and 
          length(mid(name, instr(name, ':')+1, instr(name, ';')-instr(name, ':')-1))<=3;
9. update EnsemblProbe 
    set name=concat(left(name,instr(name,':')-1), 
                    right(concat('YY', mid(name, instr(name, ':')+1, instr(name, ';')-instr(name, ':')-1)),3)) 
    where length(left(name,instr(name,':')-1))>3;
10. update EnsemblProbe 
    set name=concat(right(concat('XX', left(name,instr(name,':')-1)),3), 
                    mid(name, instr(name, ':')+1, instr(name, ';')-instr(name, ':')-1)) 
    where length(mid(name, instr(name, ':')+1, instr(name, ';')-instr(name, ':')-1))>3;
11. create table GeneChipEnsemblXRef (GeneChipId int not null, EnsemblChipId int not null);
12. insert into GeneChipEnsemblXRef values(1,3);
13. insert into GeneChipEnsemblXRef values(2,11);
14. insert into GeneChipEnsemblXRef values(2,12);

Note: this script is based on ensembl's dataset version 41_36b, for the other version, maybe the table name and fields be changed.

A sample script of getting Ensembl Probe Location:

SELECT EnsemblProbeLocation.*
FROM EnsemblProbeLocation, EnsemblProbe, EnsemblChip, GeneChipEnsemblXRef, ProbeFreeze
WHERE EnsemblProbeLocation.ProbeId=EnsemblProbe.Id and 
           EnsemblProbe.ChipId=GeneChipEnsemblXRef.EnsemblChipId and
           GeneChipEnsemblXRef.GeneChipId=ProbeFreeze.ChipId and 
           EnsemblProbe.Name=%s1 and 
           EnsemblProbe.ProbeSet=%s2 and
           ProbeFreeze.Name=%s3 
           group by Chr, Start, End

Note:
%s1: the value match the field 'Name' of the table Probe 
%s2: the value match the field 'Name' of the table ProbeSet, 
         if the name formats '*****_at_A' or '*****_at_B' (ProbeSet is M430AB), 
         the value should cut '_A'/'_B', it would just be '*****_at'

A The scripts of triggers on table DBList, let it match the updating of table ProbeSetFreeze?

delimiter |

CREATE TRIGGER DBListInsert BEFORE INSERT ON ProbeSetFreeze
  FOR EACH ROW BEGIN
    INSERT INTO DBList SET DBTypeId=1, FreezeId=NEW.Id, Name=NEW.Name, Code=NEW.Name;
  END;
|

CREATE TRIGGER DBListDelete BEFORE DELETE ON ProbeSetFreeze
  FOR EACH ROW BEGIN
    DELETE FROM DBList where DBTypeId=1 and FreezeID=OLD.Id;
  END;
|

CREATE TRIGGER DBListUpdate BEFORE UPDATE ON ProbeSetFreeze
  FOR EACH ROW BEGIN
    UPDATE DBList set FreezeId=NEW.Id, Name=NEW.Name, Code=NEW.Name 
    where DBTypeId=1 and FreezeID=OLD.Id;
  END;
|

delimiter ;

Note: the scripts only on local machine and root privilege

-- HongqiangLi - 24 DEC 2007

Strain name correction

  1. Data table, update old id to new id
  2. SE table, update old id to new id
  3. StrainXRef? table, update old id to new id if new id doesn't exist, delete old id if new id exist

old name old id new name new id
BTBR T+ tf/J 1872 BTBRT<+>tf/J 940
BTBR_T+_tf/J 2155 BTBRT<+>tf/J 940
CAST/Ei 914 CAST/EiJ 947
BALB/cBy 347 BALB/cByJ 913
C57BL? /6By 1890 C57BL? /6ByJ 348
ILSXISSF1 2156 LSF1 1098
ISSXILSF1 2157 SLF1 1099
ILS/Bg 1930 ILS 369
ISS/Bg 1931 ISS 370
B6D2F1? /J 1864 B6D2F1? 1
AXB14 164 AXB13a 164

Add new strain, BXD101, BXD102, BXD103

-- HongqiangLi - Jan04 2008

Automatically update

1. GeneRIF? , run the scripts file gn/scripts/generif/addRIF.py periodicly to update the content.

-- Main.Hongqiang Li

Upload Human Data (this can be used as a sample for adding new kinkd of data sets):

  1. GeneChip? Table: insert a record to descript the Chip platform
  2. InbredSet? Table: insert a record to descript the Inbred Set
  3. Strain Table: insert the strains
  4. StrainXRef? Table: insert the relationship of Strain and InbredSet?
  5. ProbeSet Table: insert the ProbeSets
  6. ProbeSetXRef? Table: insert the summary of traits
  7. Data Table: insert the expression values
  8. Create a Genotype file under gn/web/webqtl/genotypes

-- HongqiangLi - Oct 17, 2008

Topic revision: r12 - 17 Oct 2008 - 16:48:13 - HongqiangLi
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback