Introduction | Description | Tests | Summary | References | FAQ | Program Interface

DESCRIPTION OF THE PROGRAM

System requirements and inputs

This application accepts as input a multiple alignment file saved in Clustal (14), NEXUS (15), Emboss (16), PHYLIP (17) or numerous other alignment formats. The software employs a BioPerl-based executable file, which runs as a typical CGI script on an Apache-based web server. The requirements for running the application are a standard Perl 5.8.0 installation, a small number of Comprehensive Perl Archive Network (CPAN) Perl modules, the BioPerl 1.4.0 set of modules and version 0.9 of the Primer3 software (12).

Use of our software and the interface is intuitive. Users are prompted to upload a file and specify the format of the alignment contained within the file (Fig. 1). The user then sets his/her parameters. Three major parameters can be specified for the user to control the generation of primers. First is the maximum number of degenerate base pairs allowed for the discovered primers, which can be set from zero to a maximum of five degenerate nucleotides per primer. Second, the program can be instructed to ignore a specified number of gapped sequence lines in the alignment file. If Primaclade finds a primer that meets all of the other user criteria but lies in an area of the alignment with gaps in one or more sequences, the primer can be retained by adjusting this parameter. Primers that bind to areas in the alignment with a multiple gaps in multiple sequences, however, are obviously undesirable. Third, the user may specify a single region of the alignment to exclude. This feature is most useful in excluding areas that are so conserved that they would be shared by many paralogous genes, as in the MADS box example below.

The Primaclade web application is written in Perl and utilizes a number of standard CPAN Perl modules. Specifically, the script uses the CPAN CGI modules for generating the HTML forms and the CPAN File and Sort modules for internal file and data manipulation. The BioPerl 1.4.0 bundle of modules is used for formatting and processing the end-user alignment files. While a BioPerl module is currently available for execution of and parsing of Primer3 (12) output, we encountered a number of software weaknesses while working with this module and it was not used.

Structure of the program

To determine a set of primers for an aligned clade, the user alignment file is read and split into individual sequences. To find as large a number as possible of unique primers, the script runs Primer3 (12) eleven times for each sequence of the alignment starting initially with a search for an 18-mer primer (PRIMER_OPT_SIZE=18), and incrementing, each time by one bp up to a 28-mer. The output file from each run of Primer3 is then parsed and both the left-handed and right-handed primers are saved into a unique array for each line of sequence data in the alignment. Each run of Primer3 is also set to return a minimum of 20 primers (PRIMER_NUM_RETURN=20), resulting in a list of 13x20 (260) possible primers for each line of sequence data. The Primaclade script sorts the array of primers and removes any duplicates that might have been generated. The final array of possible sequence primers is then saved in a data hash of arrays. The melting temperatures and percent GC content that the user specified on the Primaclade main page are also input into Primer3 for each run, or the main page default values are included (PRIMER_MIN_TM=55, PRIMER_OPT_TM=60, PRIMER_MAX_TM=65, PRIMER_MAX_GC=80, PRIMER_MIN_GC=20, PRIMER_OPT_GC_PERCENT=50)

Once each line of sequence data in the alignment file has been iteratively run through Primer3, the program checks each primer in the hash of primer arrays. Gaps in the alignment file cause the position of the primers, as reported by Primer3, to be different from their actual column positions in the alignment. Therefore, the location at which a primer will bind in the alignment has to be accurately calculated for every individual sequence. The primer starting location and length is calculated and the primer sequence is compared to the corresponding nucleotides in the alignment consensus sequence. If the corresponding area of the consensus sequence contains the correct number of degenerate nucleotides then the primer is saved for further analysis, otherwise the primer is discarded. The primers that pass the test for degeneracy are then screened to determine the number of gap sequences that occur at their positions within the alignment. The alignment area defined by the primer is reviewed to ensure the appropriate number sequences are complete or have gaps. Primers having both the correct number of degenerate nucleotides and the corresponding appropriate (non-gapped) areas of the alignment are saved into a final results array. The array is sorted, any duplicates are removed, and a final results HTML document is generated.

Outputs

A typical Primaclade output page contains the original alignment file displayed horizontally across the top of the page (Fig. 2). This is followed by a single line showing the consensus sequence. The consensus sequence is color-coded and shows highly conserved regions in colorized capital A, C, T, and G's, while less conserved areas are displayed in black text. At the bottom of the page, the list of generated primers is printed corresponding to their correct position within the alignment display. The output list of primers is also color-coded, with green primers indicating a primer with no degenerate base pairs, orange indicating a primer with one or two degenerate bases, and red indicating a primer with three or more degenerates. The reverse complement for the 3' primers is provided to enable easy ordering of primers. The alignment and primer html page can be saved in plain text format and edited using a text manipulation program, such as BBEdit.

The output of Primaclade displays Tm and %GC; these are taken directly from a single primer predicted from Primer3 that corresponds to that area of the alignment. Because they are based on a single sequence, these numbers provide only estimates if number of degenerates in the primer is greater than zero (for any primer color coded as either orange or red).

The BioPerl-based program can be run on any web server, although we have not tested the portability of the application.

Introduction | Description | Tests | Summary | References | FAQ | Program Interface