Tutorial: MalariaMine
If you have any problems running this tutorial please contact support [at] flymine.org.
This tutorial explains how to make a new biological data mine using the InterMine.bio system. The example used is Plasmodium falciparum - MalariaMine. It shows how to configure existing InterMine.bio sources for a new organism and experimental data. The configuration files and data required to build malariamine are all found in trunk/bio/tutorial/malariamine. The tutorial steps through copying files from here to explain the purpose of each, alternatively you could copy the whole trunk/bio/tutorial/malariamine directory to trunk.
For an example of the completed MalariaMine see http://www.flymine.org/malaria.
One - Building the data warehouse
- Check out a copy of InterMine and FlyMine
- If you have checked out the trunk directory, you will see (at least) these directories:
bio/ flymine/ imbuild/ intermine/ stemcellmine/ testmodel/
- Create a trunk/malariamine directory to run this tutorial from (i.e. at same level as flymine/).
- Add some default InterMine properties. Copy bio/tutorial/malariamine/default.intermine.integrate.properties into your malariamine directory.
- Create a properties file in your home directory with postgres database login details. Copy bio/tutorial/malariamine/malariamine.properties file to your home directory and edit the serverName, user and password properties to your postgres login details.
- Create PostgreSQL databases for temporary items and for the final production database (as specified in the malariamine.properties file):
createdb common-src-items createdb common-tgt-items createdb production-malaria
- Create directories for the sub-projects in the new mine that are required for building the data warehouse. These sub-projects are:
- dbmodel - deals with merging model additions from selected sources and creating the production database schema
- integrate - runs targets to build the data warehouse from source data
- postprocess - operations to run on the completed data warehouse, such as setting sequences for genome features
malariamine/dbmodel/ malariamine/dbmodel/lib/ malariamine/dbmodel/resources/ malariamine/dbmodel/src/ malariamine/integrate malariamine/integrate/lib/ malariamine/integrate/resources/ malariamine/integrate/src/ malariamine/postprocess malariamine/postprocess/lib/ malariamine/postprocess/resources/ malariamine/postprocess/src/
- Create a project.xml file in malariamine. Copy the example file from bio/tutorial/malariamine/.
- Create a genomic_priorities.properties that describes how to resolve conflicting data when integrating. Copy the example from bio/tutorial/malariamine/dbmodel/resources to malariamine/dbmodel/resources.
- Create a genomic_keyDefs.properties that lists the identifiers used when integrating new data. Copy the example from bio/tutorial/malariamine/dbmodel/resources to malariamine/dbmodel/resources.
- Create a properties file to be used when summarising the ObjectStore. Copy bio/tutorial/malariamine/dbmodel/resources/objectstoresummary.config.properties to your malariamine/dbmodel directory.
- Make a new, empty InterMine objectstore. The PostgreSQL database to use is specified in the malariamine.properties file and will need to be created first (see above). This will remove any existing data from production-malaria and needs to be done each time the integration is started from scratch.
# in malariamine/dbmodel: ant clean ant build-db
This step reads the list of sources from 'malariamine/dbmodel/build.xml' and merges the list of model additions (specified in '*_additions.xml' files to the core data model. Each source can add classes and fields to the model.
- Build MalariaMine from source data files. This is done by running ant in the integrate directory. On a dual Xeon linux machine with 4Gb RAM (running postgreSQL and Java on the same machine) this takes about 90 minutes to complete. Alternatively use the project_build script.
- Dump the finished database (if not done so with the project_build script).
- Run postprocessing steps on the integrated database. NOTE - this is done automatically after integration if using project_build. Postprocessing operations are those performed on the integrated data before releasing a webapp. For example, setting sequences of LocatedSequenceFeatures, filling in additional references and collections or retrieving publication details from PubMed.
- Dump the finished database (if not done so with the project_build script).
Two - deploying the web application
See also InterMine webapp documentation.
- Copy the malariamine webapp configuration from bio/tutorial/malariamine/webapp to malariamine/webapp. You should get these files and directories:
malariamine/webapp/build.xml malariamine/webapp/lib malariamine/webapp/project.properties malariamine/webapp/resources malariamine/webapp/src
- Add some default InterMine properties for the webapp. Copy bio/tutorial/malariamine/default.intermine.webapp.properties into your malariamine directory.
- Create a properties file in your home directory that configures webapp settings and deployment. Copy bio/tutorial/malariamine/build.properties.malariamine to your home directory. Edit this with details of your local tomcat installation (see file for details).
- Create a postgres userprofile database, this database is used while the webapp is running to store templates, saved queries and login information. The database name is configured in malariamine.properties.
createdb userprofile-malaria
- Create tables in the userprofile database, load some example template queries and create the superuser login. In malariamine/webapp run:
ant build-db-userprofile
- Compile and build the webapp .war file. This fetches the model from the database, compiles model java code and summarises the contents of the database. Summarising runs several queries to, for example, find totals of each class type, find empty fields and calculate values to appear in dropdowns. This can take some time. In malariamine/webapp run:
ant
- If the database is modified in any way, you should force a resummarisation on the next webapp compilation. Do this by:
cd malariamine/postprocess ant -Daction=summarise-objectstore
- Deploy the webapp to tomcat at the path defined in build.properties.malariamine. If a webapp has already been deployed to that url then you will need to remove it first. In malariamine/webapp:
[ant remove-webapp] ant release-webapp
- Test the released webapp by accessing tomcat_server:port/malariamine, e.g. localhost:8080/malariamine. Note that the first time you access the server after deployment the response is slower than normal as jsp pages are being compiled.
- Take a tour of the webapp to become familiar with the available functionality.
See also: GettingStarted, MineHowTo
