How to make a Mine
Making a Mine
A "Mine" in InterMine terminology is a complete data warehousing system, including a user interface (a webapp) and the data integration system. The first step in making a new mine is to check out a copy of the FlyMine and InterMine code.
If you have checked out the trunk directory, you will see (at least) these directories:
bio/ flymine/ imbuild/ intermine/ sources/ stemcellmine/ testmodel/ www/
InterMine
The core ObjectStore, data integration and webapp code for all mines comes for the InterMine project in the intermine directory. The imbuild (InterMine build) directory contains the framework used for compiling code, running targets and controlling dependencies between different parts of the build system.
Projects
Most parts of InterMine and FlyMine are structured as small sub projects, each with its own ANT build.xml file.
Sources for a Mine
The sources directory contains the code and configuration for each possible integration source. A source in InterMine is potentially a source of code, data and parts of the final model.
Examples of possible sources are fasta, go, interpro and uniprot.
A mine is made by integrating the data from a selection of the sources available in the sources directory. The list of sources is configured in the project.xml file in a mine. malariamine/project.xml and flymine/project.xml are examples. See also the MalariaMine project.xml example for explanation.
- See: How to make a source
Properties for a mine
Integration
Each Mine should have an integrate project (see trunk/malariamine/integrate for an example). Tasks of merging models and loading/integrating data sources are executed by running ant in an integrate project. Each configured data source is loaded in the order given in the project.xml and merged with previously loaded data.
- See: Running integrate
Postprocessing
Some operations are performed on the integrated data before the webapp is released - postprocessing. For example, setting sequences of LocatedSequenceFeatures, filling in additional references and collections or retrieving publication details from PubMed. There are several post-processing operations you may wish to re-use.
Running a Build
To run a complete build including integration and post-processing there is project_build script. This makes dumps during the process and allows restarting from dumps after any problems.
- See: Running a build
Tutorial: MalariaMine
The MalariaMine tutorial is a step-by-step guide using an example organism.
See: IntegratingData
