Post-processing

These are steps that run after the data loading is completed. They are used to set calculate/set fields that are difficult to do when data loading or that require multiple sources to be loaded.

Common post-processing tasks

create-chromosome-locations-and-lengths

For genome features this will set the chromosome, chromosomeLocation and length fields which are added to make querying more convenient. Some parts of the webapp specific to genome features expect chromosomeLocation to be filled in.

Should I use it? - Yes, if you have loaded genome annotation.

transfer-sequences

Where a Chromosome has a sequence this will find genome features located on it that don't have sequence set this will calculate and set the sequence for those features. This will no longer be needed when InterMine works correctly with clobs.

Should I use it? - Yes, if you have loaded genome annotation without sequence set for all features.

create-references

Create shortcut references/collections to make querying more obvious. We are trying to eliminate the need to use this.

Should I use it? - Yes, for the moment if you are using standard InterMine sources.

create-intergenic-region-features

Looks at gene locations on chromosomes and calculates new IntergenicRegion features to represent the intergenic regions. These are useful in combination with overlaps for working out, e.g. binding sites that overlap the upstream intergenic region of a gene. Each Gene gets a reference to its upstream and downstream intergenic regions.

Should I use it? - Yes, if you have loaded genome annotation and think IntergenicRegions sound useful.

create-overlap-relations-flymine

Search through genome features and find any that overlap one another - e.g. to find P-element insertions overlapping exons or binding sites overlapping upstream intergenic regions. Creates OverlapRelation objects between those features that overlap. The configuration of which classes to include in calculations specified in postprocess/resources/overlap.config.

Should I run it? - Yes, if you have genome annotation and would like to query overlaps.

do-sources

This searches through all sources included in project.xml and runs post-processing steps if any exist. Looks for the property postprocessor.class in the project.properties of each source, the class specified should be a subclass of org.intermine.postprocess.PostProcessor.

Should I use it? - Yes, if you are using standard InterMine sources, they may have post-processing steps.

create-intron-features

If you have loaded genome annotation that include exons but does not specify introns this will create !Intron objects and name them appropriately.

Should I use it? - If genome annotation you have loaded does not include introns.

synonym-update

Set the isPrimary flag on Synonyms. This is to distinguish between entries in the Synonym between those that are current identifiers of an object and those that are 'real' Synonyms.

Should I run it? If you want list upload and LOOKUP constraints to work as they do in FlyMine and haven't set isPrimary when creating Synonyms.

create-attribute-indexes

Create indexes on all attributes to help speed up queries.

Should I use it? - Always. It should be run last of all post-processing steps.

summarise-objectstore

Counts of the number of objects of each class and for class fields that have a small number of value, a list of those values. See ObjectStoreSummaryProperties for more information.

Should I use it? - Always.

create-autocomplete-index

Creates the indexes for the fields set to be autocompleted in the ObjectStoreSummaryProperties file.

Should I use it? - Yes, if you have a webapp.

Post-processing steps that you probably don't need

make-spanning-locations

When e.g. Exons are located on a Chromsome and their parent Transcript is not, calculate a location for the Transcript from the start of the first Exon to the end of the last Exon.

Should I use it? - Probably not.

calculate-locations

When genome annotation is loaded with e.g. features located on contigs and contigs located on chromosomes, this will calculate chromosome locations for the features. This used to be the case for data loaded from Ensembl.

Should I use it? - Probably not.

create-utr-references

Creates references between mRNAs and UTRs when loading genome annotation from FlyBase (gff). This should be moved to a specific source.

Should I use it? - Probably not.


See: RunningABuild