Data files to integrate

All data required to build MalariaMine is included in trunk/bio/tutorial/malariamine/malaria-data.tar.gz. Copy this to somewhere and extract from the archive.

cp bio/tutorial/malariamine/malaria-data.tar.gz my_data_dir
cd my_data_dir
tar -zxvf malaria-data.tar.gz

Edit 'malariamine/project.xml' so that all occurances of my_data_dir point to the selected location. For example:

  <sources>
    <source name="malaria-gff" type="malaria-gff">
      <property name="gff3.taxonId" value="36329"/>
      <property name="src.data.dir" location="my_data_dir/malaria/malaria-genome/gff"/>
    </source>
    ...

The data included is:

/malaria-genome

The malaria genome as gff3 and fasta, originally downloaded from (...)

/uniprot

UniProt XML with protein information and sequences from SwissProt and Trembl. Downloaded from: http://www.ebi.ac.uk/uniprot/database/download.html and filtered on taxon id 36329.

/psi/intact

All protein interaction data available from IntAct in PSI format. Downloaded from: ftp://ftp.ebi.ac.uk/pub/databases/intact/current/psi1/species.

/inparanoid

InParanoid orthologues between P. falciparum and S. pombe. Downloaded from: http://inparanoid.cgb.ki.se/download/current/sqltables/.

/gene_ontology

The Gene Ontology structure. Downloaded from http://www.geneontology.org/

/go_annotation

GO term assignments for P. falciparum and S. pombe. Downloaded from http://www.geneontology.org/