Frequently Asked Questions

Below please find our most commonly asked questions. We also have a list of the most common errors and their fixes.

If you still don't find what you need, please contact us.

  1. Data warehouse
    1. My data takes too long to load into the database. How long should it …
    2. Where I can find a listing of all the existing data formats that can be …
    3. What are the other models do we have besides the genomic model and how …
    4. How do you define a primary key for a model?
    5. When we define a new model (e.g., MY-NEW_model.xml), in which directory …
    6. Once a new model is defined, how do we include it Intermine and use it?
    7. How and where can I set information for an organism?
    8. Since FASTA sequences can either be in nucleotide or protein, is there a …
    9. Beside 'protein', what are other values can be assigned to …
    10. There are several post processing tasks listed, what do they do?
    11. Do we have an ant build-all target that do build-db, integrate all the …
    12. What database schema is used for InterMine?
  2. Webapp
    1. How do I make templates and lists show up on the templates/lists page?
    2. How do I make a public template or list show up on the homepage?
    3. How can I set which fields are links/used to create bags on the results …
    4. How can I customise how data is displayed on the report page?
    5. How can I add my own logo and change the colour scheme?
    6. If I rebuild a mine, all user profiles and their saved info (queries, …
    7. Where can I set the list of default templates?
    8. What arguments does the quick search expect?
    9. Is summarisation neccessary if we don't create webapp? It took about 6 …
    10. How can I customise the data categories on the main page?
    11. Where can I set the password for the superuser.account?
  3. IQL
    1. My fields have been renamed to 'intermine_from' and 'intermine_to' in the …
    2. Is the order important in WHERE clause in IQL?
    3. My query is taking too long. How long should queries take? How can I …

See also: GettingStarted, InterMineOverview, FlyMineFAQ


Data warehouse

My data takes too long to load into the database. How long should it take? How can I make it faster?

There are improvements you can make. Mainly, setting "ignoreDuplicates=true" switches off a lot of performance enhancements that are not compatible with it, and makes the build run much slower. So if possible, you should make sure that there are no duplicated objects at all in each data source, and then switch off "ignoreDuplicates". It is alright for objects to be duplicated across data sources, because then the objects will merge, but each object must appear only once in each data source. The new release branch will contain code that will tell you if it sees any duplicated objects, and which objects they are.

As far as Postgres settings go, we have a set of settings that seem to serve us pretty well. You don't say what version of Postgres you are using, but we would recommend version 8.2, as it contains features that help quite a bit. Some of the settings we change are:

  • shared_buffers: Set to around 150MB
  • temp_buffers: Set to around 80MB
  • work_mem: Set to around 1500MB
  • maintenance_work_mem: Set to around 500MB
  • default_statistics_target: Set to around 250
  • random_page_cost: Set to around 2.0, rather than 4.0
  • effective_cache_size: Set to about 2/3 the amount of RAM in the computer

Obviously these settings should be adjusted to how much RAM there is in the computer - the work_mem shouldn't be more than a third of the RAM in the computer or so.

Where I can find a listing of all the existing data formats that can be loaded into InterMine?

BioSources gives an overview of the data formats we already have parsers for. Each format is loaded by a 'source', see bio/sources. Many of these can easily be re-used for other organisms and data files. There isn't a document yet that lists the properties that each source takes but you can see how they are used in the FlyMine project.xml.

What are the other models do we have besides the genomic model and how would I use them?

Currently all biological mines call their model "genomic". That's a bit confusing bit it's necessary because the model name is used to create the Java package name and we need to have the same package in all mines so that we can reuse code.

We do have one non-"genomic" model that might be useful for you to look at. It's called "testmodel" and as expected it's used for testing. It's defined in this file: testmodel_model.xml

Unlike the biological mines, we define the testmodel in that one file, rather than having many additions.xml files.

How do you define a primary key for a model?

Currently, if you're building on the "genomic" model (ie. you have

<property name="target.model" value="genomic"/> 

in your project.xml, all primary keys are defined in the file: genomic_keyDefs.properties

We realise that having all keys in one place isn't very scalable but it's the only solution we have at the moment.

You would need to add a line like: Staff.key_identifier=identifier or: Staff.key_name=name (or both) to that file.

There can be multiple primary keys for each class (examples with many keys are Gene and Protein) so each source must configure which key to use when merging. As an example you could put Staff=key_identifier in your oicr_keys.properties file.

See: PrimaryKeys

When we define a new model (e.g., MY-NEW_model.xml), in which directory should we put it under? In bio/sources/MY-NEW?

Do you mean a new source? If so, then bio/sources/MY-NEW is correct.

When you say "define a new model" do you mean that you would like a complete new data model (ie. without Gene, Protein etc. but with your classes) or you would like to add to/modify the existing model?

Starting from scratch will take a lot of work. All of the mines we work on are based on the model in bio/core/core.xml and bio/core/genomic_additions.xml which define basic classes like "Organism" and "Chromosome". We recommend that you build on those to make your model.

All of the mines call their model by the same name "genomic", which is specified in the project.xml using the target.model property. We suggest you name your model "genomic" too because a lot of code (eg. in the bio/sources directory) expects the Java package for the generated model code to be org.flymine.model.genomic.

See: AnatomyOfAModel?, AnatomyOfASource

Once a new model is defined, how do we include it Intermine and use it?

Have you created your own mine? If so there is only one thing you need to do to use your new model additions - add the additions file to dbmodel/build.xml and run ant build-db in the dbmodel directory.

The new line in dbmodel/build.xml would be something like:

  <merge-additions file="bio/sources/newsource/newsource_additions.xml"/> 

The order the additions file appear in the dbmodel/build.xml isn't usually important but probably adding yours to the end is best.

See: SourceHowto

How and where can I set information for an organism?

There is a source called entrez-organism. This looks for all organism taxon ids in the database and contacts the NCBI web service to fill in the rest of the information. This is why we just use taxon ids in all sources.

Just run the source last and it should get filled in.

See: BioSources

Since FASTA sequences can either be in nucleotide or protein, is there a way that I can set this?

Yes, there is a property that can be passed to the fasta source - fasta.sequenceType. The default is dna, but it can be set to protein. Here's an example:

    <source name="flybase-dmel-translation-fasta" type="fasta">
      <property name="fasta.taxonId" value="7227"/>
      <property name="fasta.className" value="org.flymine.model.genomic.Translation"/>
      <property name="fasta.classAttribute" value="organismDbId"/>
      <property name="fasta.includes" value="dmel-all-translation-*.fasta"/>
      <property name="fasta.sequenceType" value="protein"/>
      <property name="src.data.dir" location="/shared/data/flybase/dmel/release_5_1/fasta"/>
    </source>

Beside 'protein', what are other values can be assigned to fasta.sequenceType?

The InterMine fasta loader uses the fileToBiojava() method in the BioJava SeqIOTools package. It looks like the options are "dna", "rna" or "protein".

There are several post processing tasks listed, what do they do?

See: PostProcessing

Do we have an ant build-all target that do build-db, integrate all the data sources, build-db-userprofile, create the war file, remove the war file, and deploy the war file?

Sorry, there's no target that does all that. Probably a small script would do the trick for you.

What database schema is used for InterMine?

InterMine is an object-based query system that uses a PostgeSQL database. InterMine can work with any data model, for FlyMine we have a biological model that is based on the sequence ontology.

InterMine is designed to allow you to adapt the data model easily for your own data. It also includes code to import data from several standard bioinformatics file formats and will soon be compatible with the chado schema.

Webapp

How do I make templates and lists show up on the templates/lists page?

  1. Log into your site's super user account.
  2. Tag the template or list as "im:public".

See: AboutTagging

How do I make a public template or list show up on the homepage?

  1. Log into your site's super user account.
  2. Tag the template or list as "im:frontpage".

See: AboutTagging

See: Webapp Configuration

How can I customise how data is displayed on the report page?

See: Adding Long Displayers

How can I add my own logo and change the colour scheme?

See: Webapp branding

If I rebuild a mine, all user profiles and their saved info (queries, lists, etc.) associated with that mine are deleted. Is this the case? If so, how can I save the profiles and their info and import them into the newly rebuilt mine?

No, all the data will be saved unless you do ant build-db-userprofile in the webapp directory. However, saved lists work with internal ids which change when a new build of the mine is done. To solve this you write the userprofile to XML first and re-import it.

  1. While you still have your old build do ant write-userprofile-xml in webapp and copy the userprofile.xml file somewhere.
  2. When the new build is ready copy userprofile.xml back into the build directory.
  3. Run ant read-userprofile-xml to read it back in, this should do queries to update the lists to new ids.

I would check this works before you risk losing a userprofile database. Of course, if you only have a couple of lists you can just re-import them from the webapp.

Where can I set the list of default templates?

See: User Profile

What arguments does the quick search expect?

At the moment the quick search is configured to run a particular template query. We use a query called A_IdentifierSynonym_Object this is configured in webapp/resources/web.properties.

All biological feature classes in the model have a collection of Synonyms objects to represent alternative identifiers. We also create synonyms for each object identifier, e.g for a Gene with identifier 'eve' we also create a Synonym with value 'eve'. This means we can just search through the synonym table to find any feature type.

Is summarisation neccessary if we don't create webapp? It took about 6 hours.

If you are not releasing you don't need to do summarise.

How can I customise the data categories on the main page?

  1. Customise your categories in MODEL_NAME/webapp/resources/webapp/WEB-INF/aspects.xml
  2. Run ant default remove-webapp release-webapp to update the customized categories. The "default" makes forces a re-build of the WAR file, before releasing.

See: DataCategories

Where can I set the password for the superuser.account?

I'm not sure there is a way to do that outside the webapp. In the past I've used the "Forgotten password" link on the log in page to retrieve the default password then "Change Password" on the MyMine page to set a new one (after logging in).

IQL

My fields have been renamed to 'intermine_from' and 'intermine_to' in the database. Why?

"from" is a reserved word in IQL. You need to surround it with double quotes, in a rather bizarre manner, like this:

SELECT object."from" FROM object

See: QueryPackage

Is the order important in WHERE clause in IQL?

No.

See: IQL

My query is taking too long. How long should queries take? How can I make the queries faster?

The database sits on top of Postgres, and the methods by which Postgres answers queries are deep magic that can cause all sorts of unexpected timing phenomena.

Adding a constraint to reduce the amount of results can make the query slower, because Postgres may have to read just as much data from the database, but it has to do more work to filter the results by the constraint. On the other hand, an extra constraint could also make the query faster, if it allows the database to make use of an index or choose a faster algorithm.

If there are a lot of rows in the results, then it is worth trying to use a large batch size on the Results object, if you are running this from Java, by calling Results.setBatchSize(10000) or so. I believe 1000 is the default batch size, so 10000 should help a bit.

Lastly, make sure the database has been analysed properly (which should be done automatically as part of the build process).