Data Integration

InterMine is designed to multiple types of data into a single data warehouse. Each type of data to be loaded is defined as a 'source', sources are directories that contain everything needed to parse and integrate a particular type of data. There are some common sources to include several biological data types and you can create your own.

Each source loads objects in the order they appear in the project.xml file. If an object is loaded and there is already an object representing the same entity on the database they should be merged. Each class can be configured with one or more integration keys which define how merging is performed. In the case where multiple sources provide values for the same field, priorities must be defined to determine the outcome of potential conflicts.

See Also: PrimaryKeys, PriorityConfig