Perl Item XML generation modules

Introduction

In the source:trunk/intermine/perl directory we provide a Perl library for creating files in InterMine "Item XML" format. Files in this format can be loaded into an InterMine database by creating a "source".

The Perl code depends on the XML::Parser::PerlSAX for parsing the model file and on the XML::Writer class for output (install these with: sudo cpan -i XML::Parser::PerlSAX XML::Writer).

See also: ItemsAPIJava

Usage

Most code using these modules will follow this pattern:

  1. make a model:
     my $model = new InterMine::Model(file => $model_file);
    
  2. make an item factory:
     my $factory = new InterMine::ItemFactory(model => $model);
    
  3. make an item:
     my $gene = $factory->make_item("Gene");
    
  4. set some attributes, references and/or collections:
     $gene->set("identifier", "CG10811");
    
  5. repeat 4 as necessary
  6. output with XML::Writer

FlyMine example

Example using the FlyMine model:

  use XML::Writer;
  use InterMine::Model;
  use InterMine::Item;
  use InterMine::ItemFactory;

  my $model_file = $ARGV[0];
  die unless defined $model_file;
  my $model = new InterMine::Model(file => $model_file);
  my $factory = new InterMine::ItemFactory(model => $model);

  my $gene = $factory->make_item("Gene");
  # set an attribute
  $gene->set("identifier", "CG10811");

  my $organism = $factory->make_item("Organism");
  $organism->set("taxonId", 7227);

  # set a reference
  $gene->set("organism", $organism);

  my $pub1 = $factory->make_item("Publication");
  $pub1->set("pubMedId", 11700288);
  my $pub2 = $factory->make_item("Publication");
  $pub2->set("pubMedId", 16496002);

  # set a collection
  $gene->set("publications", [$pub1, $pub2]);

  # write as InterMine Items XML
  my @items_to_write = ($gene, $organism, $pub1, $pub2);
  my $writer = new XML::Writer(DATA_MODE => 1, DATA_INDENT => 3);
  $writer->startTag("items");
  for my $item (@items_to_write) {
    $item->as_xml($writer);
  }
  $writer->endTag("items");

Output:

  <items>
     <item id="0_1" class="" implements="http://www.flymine.org/model/genomic#Gene">
        <attribute name="identifier" value="CG10811" />
        <collection name="publications">
           <reference ref_id="0_3" />
           <reference ref_id="0_4" />
        </collection>
        <reference name="organism" ref_id="0_2" />
     </item>
     <item id="0_2" class="" implements="http://www.flymine.org/model/genomic#Organism">
        <attribute name="taxonId" value="7227" />
     </item>
     <item id="0_3" class="" implements="http://www.flymine.org/model/genomic#Publication">
        <attribute name="pubMedId" value="11700288" />
     </item>
     <item id="0_4" class="" implements="http://www.flymine.org/model/genomic#Publication">
        <attribute name="pubMedId" value="16496002" />
     </item>
  </items>

Longer example

In source:trunk/bio/scripts there is a longer example: intermine_items_example.pl

The script has three arguments:

  • a string describing a DataSet
  • a taxon id
  • the path to a genomic model file

If you install XML::Writer, the script should run as is from the bio/scripts/ directory

Example command line:

 ./intermine_items_example.pl "FlyMine" 5833 ../../flymine/dbmodel/build/model/genomic_model.xml