Last modified 3 years ago
Last modified on 25/06/09 14:32:07
BioSources > fasta
The fasta source loads features and their sequences and will create a feature for each entry in a fasta file and set the sequence.
To configure a fasta source, add an entry to the project.xml file, like so:
<!-- example from flymine/project.xml -->
<source name="uniprot-fasta" type="fasta">
<property name="fasta.taxonId" value="7227 7237 6239 7165 7460 4932 9606 10090"/>
<property name="fasta.className" value="org.intermine.model.bio.Protein"/>
<property name="fasta.classAttribute" value="primaryAccession"/>
<property name="fasta.dataSetTitle" value="UniProt data set"/>
<property name="fasta.dataSourceName" value="UniProt"/>
<property name="src.data.dir" location="/data/uniprot/current"/>
<property name="fasta.includes" value="uniprot_sprot_varsplic.fasta"/>
<property name="fasta.sequenceType" value="protein" />
<property name="fasta.loaderClassName"
value="org.intermine.bio.dataconversion.UniProtFastaLoaderTask"/>
</source>
| attribute | content | purpose |
| taxonId | space-delimited list of taxonIds | only features with the listed taxonIds will be loaded |
| className | fully-qualified class name | determines which feature will be loaded |
| classAttribute | identifier field from className | determines which field from the feature will be set |
| dataSetTitle | name of dataset | determines name of dataset object |
| dataSourceName | name of datasource | determines name of datasource object |
| src.data.dir | location of the fasta data file | these data will be loaded into the database |
| includes | name of data file | this data file will be loaded into the database |
| sequenceType | class name | type of sequence to be loaded |
| loaderClassName | name of Java file that will process the fasta files | only use if you have created a custom fasta loader |
datasets and datasources
Proteins, genes, and chromsomes have a datasets collection. A dataset is set of results or data from a datasource. A dataset has a reference to a datasource, which is from which organisation the data came from.
See FlyMine's project.xml file for more examples.
Back: BioSources
