Tutorial: MalariaMine

If you have any problems running this tutorial please contact support [at] flymine.org.

This tutorial explains how to make a new biological data mine using the InterMine.bio system. The example used is Plasmodium falciparum - MalariaMine. It shows how to configure existing InterMine.bio sources for a new organism and experimental data. The configuration files and data required to build malariamine are all found in trunk/bio/tutorial/malariamine. The tutorial steps through copying files from here to explain the purpose of each, alternatively you could copy the whole trunk/bio/tutorial/malariamine directory to trunk.

For an example of the completed MalariaMine see http://www.flymine.org/malaria.

One - Building the data warehouse

  • If you have checked out the trunk directory, you will see (at least) these directories:
       bio/
       flymine/
       imbuild/
       intermine/
       stemcellmine/
       testmodel/
    
  • Create a trunk/malariamine directory to run this tutorial from (i.e. at same level as flymine/).
  • Create PostgreSQL databases for temporary items and for the final production database (as specified in the malariamine.properties file):
       createdb common-src-items
       createdb common-tgt-items
       createdb production-malaria
    
  • Create directories for the sub-projects in the new mine that are required for building the data warehouse. These sub-projects are:
    • dbmodel - deals with merging model additions from selected sources and creating the production database schema
    • integrate - runs targets to build the data warehouse from source data
    • postprocess - operations to run on the completed data warehouse, such as setting sequences for genome features
      malariamine/dbmodel/
      malariamine/dbmodel/lib/
      malariamine/dbmodel/resources/
      malariamine/dbmodel/src/
      malariamine/integrate
      malariamine/integrate/lib/
      malariamine/integrate/resources/
      malariamine/integrate/src/
      malariamine/postprocess
      malariamine/postprocess/lib/
      malariamine/postprocess/resources/
      malariamine/postprocess/src/
      
  • Create a genomic_priorities.properties that describes how to resolve conflicting data when integrating. Copy the example from bio/tutorial/malariamine/dbmodel/resources to malariamine/dbmodel/resources.
  • Create a genomic_keyDefs.properties that lists the identifiers used when integrating new data. Copy the example from bio/tutorial/malariamine/dbmodel/resources to malariamine/dbmodel/resources.
  • Make a new, empty InterMine objectstore. The PostgreSQL database to use is specified in the malariamine.properties file and will need to be created first (see above). This will remove any existing data from production-malaria and needs to be done each time the integration is started from scratch.
       # in malariamine/dbmodel:
       ant clean
       ant build-db
    

This step reads the list of sources from 'malariamine/dbmodel/build.xml' and merges the list of model additions (specified in '*_additions.xml' files to the core data model. Each source can add classes and fields to the model.

  • Run postprocessing steps on the integrated database. NOTE - this is done automatically after integration if using project_build. Postprocessing operations are those performed on the integrated data before releasing a webapp. For example, setting sequences of LocatedSequenceFeatures, filling in additional references and collections or retrieving publication details from PubMed.

Two - deploying the web application

See also InterMine webapp documentation.

  • Copy the malariamine webapp configuration from bio/tutorial/malariamine/webapp to malariamine/webapp. You should get these files and directories:
      malariamine/webapp/build.xml
      malariamine/webapp/lib
      malariamine/webapp/project.properties
      malariamine/webapp/resources
      malariamine/webapp/src
    
  • Create a properties file in your home directory that configures webapp settings and deployment. Copy bio/tutorial/malariamine/build.properties.malariamine to your home directory. Edit this with details of your local tomcat installation (see file for details).
  • Create a postgres userprofile database, this database is used while the webapp is running to store templates, saved queries and login information. The database name is configured in malariamine.properties.
    createdb userprofile-malaria
    
  • Create tables in the userprofile database, load some example template queries and create the superuser login. In malariamine/webapp run:
    ant build-db-userprofile
    
  • Compile and build the webapp .war file. This fetches the model from the database, compiles model java code and summarises the contents of the database. Summarising runs several queries to, for example, find totals of each class type, find empty fields and calculate values to appear in dropdowns. This can take some time. In malariamine/webapp run:
    ant
    
  • If the database is modified in any way, you should force a resummarisation on the next webapp compilation. Do this by:
    cd malariamine/postprocess
    ant -Daction=summarise-objectstore
    
  • Deploy the webapp to tomcat at the path defined in build.properties.malariamine. If a webapp has already been deployed to that url then you will need to remove it first. In malariamine/webapp:
    [ant remove-webapp]
    ant release-webapp
    

  • Test the released webapp by accessing tomcat_server:port/malariamine, e.g. localhost:8080/malariamine. Note that the first time you access the server after deployment the response is slower than normal as jsp pages are being compiled.

See also: GettingStarted, MineHowTo