How to make a Mine

Making a Mine

A "Mine" in InterMine terminology is a complete data warehousing system, including a user interface (a webapp) and the data integration system. The first step in making a new mine is to check out a copy of the FlyMine and InterMine code.

If you have checked out the trunk directory, you will see (at least) these directories:

   bio/
   flymine/
   imbuild/
   intermine/
   sources/
   stemcellmine/
   testmodel/
   www/

InterMine

The core ObjectStore, data integration and webapp code for all mines comes for the InterMine project in the intermine directory. The imbuild (InterMine build) directory contains the framework used for compiling code, running targets and controlling dependencies between different parts of the build system.

Projects

Most parts of InterMine and FlyMine are structured as small sub projects, each with its own ANT build.xml file.

Sources for a Mine

The sources directory contains the code and configuration for each possible integration source. A source in InterMine is potentially a source of code, data and parts of the final model.

Examples of possible sources are fasta, go, interpro and uniprot.

A mine is made by integrating the data from a selection of the sources available in the sources directory. The list of sources is configured in the project.xml file in a mine. malariamine/project.xml and flymine/project.xml are examples. See also the MalariaMine project.xml example for explanation.

Properties for a mine

Integration

Each Mine should have an integrate project (see trunk/malariamine/integrate for an example). Tasks of merging models and loading/integrating data sources are executed by running ant in an integrate project. Each configured data source is loaded in the order given in the project.xml and merged with previously loaded data.

Postprocessing

Some operations are performed on the integrated data before the webapp is released - postprocessing. For example, setting sequences of LocatedSequenceFeatures, filling in additional references and collections or retrieving publication details from PubMed. There are several post-processing operations you may wish to re-use.

Running a Build

To run a complete build including integration and post-processing there is project_build script. This makes dumps during the process and allows restarting from dumps after any problems.

Tutorial: MalariaMine

The MalariaMine tutorial is a step-by-step guide using an example organism.

See: IntegratingData