This page describes how to load Ensembl into your InterMine-bio database.
1 Get the Ensembl database (Optional)
First you will need the data from Ensembl, which are available via MySQL databases. Ensembl has a publicly available MySQL database you can use. If you think you are going to be retrieving a lot of data from Ensembl or reliability is very important, it will likely be in your best interest to have a local database.
The following are instructions on how to load a local copy of an Ensembl database. You must have MySQL installed and correctly configured.
1.1 Download the Ensembl MySQL database
- ftp://ftp.ensembl.org/pub/current_mysql/
- Or you can use our Perl script
- We use get_ensembl_mysql to download and unzip data
- To use this script, you need to install the appropriate Perl modules. See InterMinePerl.
- The script requires three parameters - download_directory organism_name which_database eg:
# in bio/scripts ./get_ensembl_mysql /MY_DATA_DIR/ensembl homo_sapiens core
1.2 Create the database
# in mysql create database homo_sapiens_core_59_37d;
1.3 Load the database structure
mysql -h HOST -u USERNAME -p homo_sapiens_core_59_37d < /MY_DATA_DIR/ensembl/homo_sapiens/homo_sapiens_core_59_37d/homo_sapiens_core_59_37d.sql
1.4 Load the data
Run this command in the same directory as the data you just downloaded:
mysqlimport -h HOST -u USERNAME -p homo_sapiens_core_59_37d -L *.txt
See also: MySQL
2 Install Perl modules
InterMine's Ensembl converter uses Ensembl's Perl API. Follow Ensembl's instructions for how to install the necessary Perl modules:
You will also need to install InterMine's Perl modules. Follow these instructions:
3 Update properties files
3.1 Update <MINE>.properties
You'll need one entry for every organism. The perl script run in Step 4.1 uses these entries to ascertain the location of the databases. For example:
# core database db.ensembl.9606.core.datasource.serverName=SERVER_NAME db.ensembl.9606.core.datasource.databaseName=homo_sapiens_core_59_37d db.ensembl.9606.core.datasource.species=homo_sapiens db.ensembl.9606.core.datasource.user=DB_USER db.ensembl.9606.core.datasource.password=DB_PASSWORD # variation database db.ensembl.9606.variation.datasource.serverName=SERVER_NAME db.ensembl.9606.variation.datasource.databaseName=homo_sapiens_variation_59_37d db.ensembl.9606.variation.datasource.species=homo_sapiens db.ensembl.9606.variation.datasource.user=DB_USER db.ensembl.9606.variation.datasource.password=DB_PASSWORD
3.2 Add Ensembl to the list of datasources to be integrated.
This is located in the project.xml file, and it should look something like:
<source name="ensembl" type="ensembl">
<property name="src.data.dir" location="/MY_DATA_DIR/ensembl"/>
</source>
When you run a database build, every XML file in this directory will be loaded into the database. Currently FlyMine loads Ensembl data for Anopheles gambiae. See FlyMine's project.xml
4 Load data
4.1 Generate XML file
- Run this command in /bio/scripts
./ensemblAPI.pl MINE_NAME TAXONID /MY_DATA_DIR/ensembl
for example:./ensemblAPI.pl flymine 7165 /data/ensembl/current
4.2 Load XML file into database
- Run a build. The entry in the project.xml will instruct the build process to load the XML files you created in Step 1 into the database. For example, run this command in MINE_NAME/integrate:
ant -v -Dsource=ensembl
