SyMAP is a system for computing, displaying, and analyzing syntenic alignments between genomes.
Its features include the following (for a pictorial introduction, see the
GUI manager to run synteny computations and view results.
Multiple display modes (dot plot, circular, side-by-side, closeup, 3D).
Draft sequence ordering by synteny.
Construction of cross-species gene families.
Complete annotation-based queries.
All displays accessible through the web.
Can align FPC maps to sequenced genomes.
The back-end processing of SyMAP, including the synteny block
and anchor filtering algorithms, is described in the following
two publications. A sketch of the algorithms is also provided here.
C. Soderlund, W. Nelson, A. Shoemaker and A. Paterson (2006)
SyMAP: A System for Discovering and Viewing Syntenic Regions of FPC maps
Genome Research 16:1159-1168.
C. Soderlund, M. Bomhoff, and W. Nelson (2011)
SyMAP: A turnkey synteny system with application to plant genomes.
Nucleic Acids Research 39(10):e68.
SyMAP is freely distributed software, however
if you use SyMAP results in published research, you must cite one
or both of these articles.
To run SyMAP, the only things you must have are:
Mac or Linux computer with 1G RAM.
The SyMAP downloadable package.
Sequence files, in FASTA format.
Follow the steps below to get started with SyMAP.
If you are working with FPC files, see
also here; for problems, see
troubleshooting. Note that
each SyMAP application window has a separate Help
button providing further details on the use of its functions.
Use a Linux or Mac machine.
It needs to have Java v1.6 or later, and sufficient processing power.
See system requirements.
Prepare sequences and annotation.
Sequences can be in one or many files and can be masked or unmasked.
See preparing the sequences.
Annotation format is gff3; see annotation files.
Installation is a simple unzip. See
Run the demo.
Highly recommended. See
running the demo.
Set up MySQL.
SyMAP can run on its own but we recommend
that you use a separate MySQL installation, which can be
on any machine. See SyMAP and MySQL.
Import your files.
The Manager interface makes this easy; see
creating a new project.
Compute alignments and synteny.
This also easy through the Manager interface. See
runtime and memory.
These functions are located on the Manager interface. Detailed
description of the user interface is in the User Guide.
Share through the web.
A web install script is provided;
see web display.
SyMAP runs primarily on Intel systems with Linux or Mac OSX (tested on OSX 10.6.8 and 10.8.3).
Windows is supported for viewing and querying only.
For performing large alignments (e.g. 1Gb genomes or more) it is essential
to have multiple CPUs, a 64-bit computer, and at least 5Gb of RAM
for each CPU that you intend to use. (Note that you can set the
number of CPUs for SyMAP to use).
For viewing alignments, CPU and memory needs are typically negligible, unless
you are performing queries on more than 4-5 genomes at once.
Installation simply consists of unzipping the download package, using
> tar -xzvf symap_40.tar.gz
This can be done anywhere and creates a directory called symap_40. You can
move this directory later if desired.
To run SyMAP, change into the symap_40 directory and run the command
Running the Demo
If you have not used SyMAP before, it is essential to run the demos.
You can run them immediately after unzipping the package;
it is not necessary to install MySQL because a Java-controlled version of the database
is included in the package.
Change into the symap_40 directory.
- If you have mySQL on your machine, then edit symap.config and enter database
and host information (see SyMAP and MySQL).
- Else, just Enter ./symap
Many lines of text will print to the console, as SyMAP launches the
supplied MySQL database. When this is done,
the Project Manager window opens, shown at right.
(Click Images for larger version)
There are four demo projects listed on the left. Check "Demo-Seq" and "Demo-Seq2".
A link "Load All Projects" will be displayed in the top of the right panel; select it to
load the projects, which will take several minutes; when done, the Manager will look as
shown in the image.
In the "Available Syntenies" table, click the cell for the
"Demo_Seq2" row and the "Demo_Seq" column. If you have two CPUs
available, set the "CPUs" setting to "2". Then click the
"Selected Pair" button to start the alignment.
The alignment will take about 30 minutes (15 with two CPUs).
When done, the table will have a checkbox, signifying that
the synteny is available for viewing. Click the cell
again which will enable the viewing buttons, e.g "Dot Plot".
Click "Dot Plot" and you will see the dot plot shown
here. By clicking and/or selecting regions you can
zoom into certain regions and bring up detailed views
of the alignments. The Help button (question mark)
provides full information on the functions.
Return to the Manager, and click the "Chromosome Explorer"
button. This brings up the Explorer, shown at right.
Here you can pick different
sets of chromosomes, using the small icons at left, and see them in different views.
At first you only see the reference chromosome, which is initially chr3 from Demo-Seq
(the reference has a red box around its number).
Click on the icons for Demo-Seq2 chr1 and chr3, to see the 3D view at right.
The ribbons represent synteny blocks (green for inverted).
Click the "Circle" button to see a Circos-style3 display
of the same chromosomes:
Click the 2D button to see a side-by-side view of
the same chromosomes. Note that the reference is in the middle.
Brown lines show the individual anchors (see
how symap works).
Selecting a region on one of the sequence tracks using the
mouse zooms to that region.
Now the annotation icons (blue) for individual genes can be seen
in the center of each sequence track.
If you zoom in even closer, then you can click the "Sequence Filter"
button and the top of a sequence track (or right click in the sequence),
and check "Show Descriptions
for Annotations", and you will see the annotation text
for each gene.
Returning to the Manager, click the "SyMAP Queries" button. This brings up
the SyMAP Query window in Overview mode. Click the "Query Setup" option
on the left-hand side and you will see the Query Setup window:
The query does two basic things:
A. Locate syntenic regions based on annotation
B. Create putative gene families across the
species by grouping the genes (or regions) which
are connected by anchors.
Enter "glycosyl" for the "Annotation String Search" and press "Do Search".
55 results are returned. Click the "PgFSize" column header to sort by
this column, giving the table at right:
Each row is an anchor connecting two of the chromosomes.
At the top of the table are 7 anchors grouped into a putative
family (PgeneF=25 in the image, but it may
be numbered differently when you run it).
The rows with a non-empty "BlockNum"
are anchors involved in synteny blocks. You could restrict the query
to only these anchors, if desired. Synteny anchors are more likely
to represent a true ancestral relationship; however, synteny blocks
can not always be detected in sparser regions.
If you query with more than two species, you can ask interesting
questions such as "show me the glycosyl-related gene families which
are present in species A and B but not in species C".
The remainder of the demo relates to draft sequence, i.e. unanchored
shotgun sequence. If you aren't working with draft sequence,
you can skip to Creating a New Project.
Return to the Manager and load the Demo-Draft project, following
the same steps used to load the previous two projects.
On the Summary List, under the Demo-Draft listing, you will see
the parameter "Order Against: demo_seq". With this setting,
the Demo-Draft contigs will be ordered using synteny to
Demo-Seq, as soon as that alignment is run.
Use the "Selected Pair" button to align Demo-Draft and Demo-Seq,
as before. The alignment will take about 20 minutes.
When done, open the dot plot for this pair and you will
see that the draft contigs have been ordered and oriented
to agree with the Demo-Seq. (For real projects the agreement will
not be so good!)
The ordering on Demo-Draft is in the database only; it does
not change the sequence files on disk.
However, SyMAP does write out ordered "pseudomolecule" files
created from the draft contigs. These are put in the form
of a new SyMAP project which can be loaded and aligned. You
will see this new project in the Projects panel, as shown
A text file showing the ordering information is also written to
the Demo_Draft project directory, which you can find under the
data/pseudo subdirectory of the symap directory.
This is the end of the demo.
At this point you will probably want to proceed by reading the next
section to learn how to create your own project in SyMAP.
Creating a New Project
If mySQL is installed, edit symap.config as described in SyMAP and MySQL.
To create a new project, start symap (i.e. ./symap and press the "Add Project" button
at the lower left. Enter the name and type of the project. The Help
button on the dialog provides further information.
After saving the new project, it appears in the Projects list on the left, but
it is still an empty shell. Check its box and it will appear in the Summary
section (right hand side) where you will then click the "Parameters" link
to open the Parameters window.
On the Parameters window you will add the filenames for the sequences and annotations for the project,
as well as setting other parameters if desired. The Help button on this window
provides the necessary details.
After setting parameters, the project is ready to be loaded and aligned using
the same steps as in the demos.
Preparing the Sequences
The first decision with whole-genome sequence is whether or not to apply
repeat masking. Masking reduces alignment time and false-positive hits,
but also runs a risk of concealing true hits due to inaccurate masking.
Masking also requires considerable time.
Masking is not really necessary unless the genome is highly repetitive
and those repeats are shared with other genomes being aligned.
(Repeats cause particular trouble for self-alignments,
see self-alignments in SyMAP).
Another masking option which is available if you have gene
annotation is to mask out everything but the annotated genes. You
can enable the "mask_genes" option on the Parameters window
for your project (turn it on before doing
Note that sequence files should be in FASTA format and the name of a sequence
is the string immediately following the ">", e.g.
>chr3 oryza sativa
Here the sequence name is "chr3", and the sequence follows. The additional information
"oryza sativa" is ignored.
Two things are important in naming sequences for SyMAP:
If you are able to meet condition B, then you should set the parameter
"grp_prefix" to the prefix of the sequences. SyMAP will then remove
the prefix and use the shortened names in the displays, saving space. (If
you set "grp_prefix", and some sequences don't have the prefix, they
will keep their full names.)
The sequence names must exactly match those
used in the annotation files (first column), or the
annotations will not be loaded.
If possible, use a consistent prefix such as "chr" for all sequences, followed
by a short number.
Sequence names can contain only letters, numbers, and underscores.
Annotation files should be in
The first column (seqid) must exactly match the sequence names in
the fasta files. The third column (type) determines how SyMAP
uses the entry. Types "gene","exon","CDS","centromere", and "gap"
are recognized (other entries are ignored). "CDS" and "exon" are
treated equivalently in SyMAP.
The last column (attributes) contains "tag=value" pairs describing
the annotation. You can set which attributes to use, or use all
those occurring more than a certain number of times (open the
Parameters window for the project, look for parameter "annot_keywords").
Working with FPC Files
SyMAP can also align genomes represented by an FPC physical
map, by first aligning the BACs using BAC-end sequences or marker
sequences. If you plan to work with FPC alignments, the first
step is to run the provided demo "Demo-FPC". Align it to "Demo-Seq2"
using the same steps as described above, and
explore the various displays.
Creating an FPC project is the same as for a
sequence project except that you choose
the type "fpc", and then the Project Parameters window
has some different paramaters. The Parameters window is where
you will enter the FPC file, and your fasta files of marker
and BAC-end sequences.
Note that the BAC-end sequence names must be exactly the clone
names used in FPC, with extension "r" or "f" labeling the
strand. In other words if the FPC map has a clone
"a0435B26" then the BES for that clone can be named
"a0435B26f" or "a0435B26r".
The BES and marker alignments in an FPC project are performed using
BLAT1, in contrast
to MUMmer2 for sequence projects. The running time is typically
several times longer than that of MUMmer (described here), but
the memory usage is much lower.
SyMAP and MySQL
SyMAP stores all data in a MySQL server. The package comes with a
version of MySQL that is suitable for the demo or a small project, but
not for large projects or for web display. For any substantial
work we recommend using a standalone MySQL installation.
(Note, if you are using MacOSX 10.7+,
the pre-packaged version of MySQL does not work, and cannot be
upgraded since the Java/MySQL module is no longer supported. You will
need to install MySQL; see below.)
The MySQL installation does not need to be on the machine where
you will do the computations or view the results, as long as
it is on an accessible network. Once the server is ready, fill out
the database parameters in the "symap.config" file in the
main SyMAP directory, as described here.
Note that it is a good idea to have separate
admin and client usernames, where the client has read-only
access. If you set up the web displays, they will use
the client username. The "admin" user needs to have sufficient
permissions to create a new database.
Note that the default settings of MySQL are poorly suited for large-scale
data storage. You will want to adjust the parameters
innodb_buffer_pool_size,innodb_flush_log_at_trx_commit as described
MySQL on the Mac:
Download and install the "MySQL Community Server" from
Also install the Preferences Panel, and use that to start the server.
In symap.config, use
db_name = symap
db_server = localhost
db_adminuser = root
db_clientuser = root
For large projects you will want to adjust the parameters as described
Runtime and Memory
The largest component of SyMAP execution time is in running
MUMmer2 (or BLAT1). The typical runtime is one CPU-hour per
target sequence (or group of chromosomes, since SyMAP
will group shorter sequences together for efficiency).
For example, to align maize (10 chromosomes, 2Gb) to
rice (12 chromosomes, 370Mb) required 1 hour, 3 minutes using 8 CPUs
with 2.3Ghz speed. SyMAP grouped the shorter rice chromosomes
into 8 groups and each processor handled one.
The memory usage of MUMmer is typically 5G per CPU, however it can
be as high as 10G for very long or repetitive chromosomes.
SyMAP's Java-based architecture makes it easy to share results
over the web. There are three steps:
Create directories under the htdocs and cgi-bin roots where
SyMAP files will be placed.
Fill out the parameters in the symap.config file.
Run the supplied install script.
Run the install script from the main SyMAP directory using the command
> perl scripts/install.pl
Note that the web directories must be accessible from the machine
where you run the install script. If you've been working on a machine
different from the web host, all you need to do us unzip a new copy
of the SyMAP distribution on the web host, fill out the symap.config,
and run the install script there. Then you can delete those SyMAP
distribution files, if desired.
If you need to re-do the web install, simply delete the SyMAP html and cgi-bin
directories (or their contents) and run the install script again.
Important: SyMAP uses an unsigned applet by default. If your
web host and MySQL host are different machines, then you will need to
use the signed applet. You will find this in the SyMAP htdocs directory
and you just need to rename it to "symap.jar".
The default web page simply shows a list of all projects in the database,
however you can easily customize this. For an example,
Note that the web install includes some perl CGI scripts which
need several perl packages (CPAN modules) installed on the web server:
Most of the perl script functionality is now included in the Java applet,
so you can, if you wish, modify the web display to leave out the
CGI display options..
Database and Web Parameters
Parameters for accessing the MySQL database, and for setting up the web displays,
should be set in the "symap.config" file in the main symap directory, as follows:
Name of the MySQL database. SyMAP will create it, if it does not exist yet.
If it does exist, it should either already be a SyMAP database, or it should be empty.
The machine hosting the MySQL database, e.g. "myserver.myschool.edu".
MySQL username of a user with sufficient privileges to create a database. Needed for
creating and updating alignments.
Password of the admin user.
MySQL username of a user with read-only access. Used in the web displays, if installed, or
if symap is launched with the "-r" parameter.
Password of the client user.
Web Install Parameters - Necessary
Directory location of the html directy where SyMAP files will be served from.
Directory location of the cgi directy where SyMAP CGI scripts will be served from.
URL for the SyMAP HTML files (i.e., the URL which accesses the
directory specified by "html_path").
URL for the SyMAP CGI files (i.e., the URL which accesses the
directory specified by "cgi_path").
Web Install Parameters - Optional
Logo to use on the web page. You can specify a full URL, or an image file
which should be placed in the "html_path" directory.
URL to your main homepage.
URL to a search page for your website.
URL for a contact information page.
Email address for user feedback.
Self Alignments and SyMAP
Because of its reliance on MUMmer, SyMAP has to follow a slightly different
procedure in carrying out a self-alignment. MUMmer ordinarily
seeds its alignments with unique matches, which eliminates the possibility
of off-diagonal seeds in the alignment of a chromosome to itself. To overcome
this problem, SyMAP v4.0 runs the individual chromosome self-alignments using
the MUMmer parameter -maxmatch, which removes the uniqueness requirement at
the cost of greatly increased noise. The extra noise is then filtered to
a large extent by the default SyMAP filters, but the diagonal squares of
the dot plot will still have more noise visible than the off-diagonal.
An example of such chromosome self-synteny blocks
can be seen on chromosome 12 of soybean in the
fabaceae section of SymapDB.
Arabidopsis chromosome 1 also has good examples.
How SyMAP Works
This section provides a brief overview of the SyMAP processing steps; for
more, see the SyMAP published papers4,5. The processing
has four phases:
The sequences are written to disk*
, with gene-masking
if desired. In the alignment, one species is "query" and the other is
"target". If one project is FPC, that is the query; if both are sequence,
the query is the one with alphabetically the first name. The query sequences
are written into one large file, while smaller target sequences are grouped
into larger fasta files of size up to 70Mb, for more efficient processing
The raw anchor set consists of the hits found by MUMmer or BLAT.
These are first clustered into gene, or putative-gene hits. This is
done by clustering the hit regions on each sequence, and then defining new "gene"
hits which connect these regions. For example if three separate
exons hit between two genes, they will be clustered into one "gene"
hit having a combined score equal to the sum of the raw hit scores.
Clustering is by gene if the hits overlap annotation, otherwise, it uses
a max separation 1kb, creating "putative gene" regions.
The clustered "gene anchors" are now filtered using a version of
reciprocal-best filtering which is adapted for retaining duplications and
gene families. For each pair of genes (or putative genes) which is
connected by a clustered anchor, the retained anchors must be among
the top two anchors by score on both sides (top-2 allows for one
ancestral whole-genome duplication). An anchor will also be retained if its
score is at least 80% of that of the 2nd-best anchor on each side (this
allows for retention of gene family anchors). These filter parameters
may be adjusted through the Alignment & Synteny Parameters window.
Synteny Block Detection:
After the clustered anchors are loaded into the database, the synteny
synteny block algorithm runs. This algorithm looks for approximately-collinear
sequences of anchors, subject to several parameters including A) Number
of anchors; B) Collinearity of the anchors; C) Amount of "noise" in the
surrounding region (to help reject false-positive chains). Criterion A can
be adjusted in the Alignment & Synteny Parameters window.
* Note that the sequences are re-written from the database to the
disk for three reasons:A)To allow re-grouping for efficiency; B) To ensure elimination
of invalid characters; C) To mask non-gene regions, if desired. This also ensures that
sequences names will match those in the database, and prevents problems caused by
moving the source sequences on disk.
Kent, J. (2002) BLAT--the BLAST-like alignment tool, Genome Research 12:656-64.
Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., Salzberg, S.L.
(2004) Versatile and open software for comparing large genomes, Genome Biology, 5:R12
Krzywinski, M., J. Schein, I. Birol, J. Connors, R. Gascoyne, D. Horsman, S. Jones, M. Marra (2009)
Circos: An information aesthetic for comparative genomics. Genome Research doi:10.1101/gr.092759.109.
Soderlund, C., Nelson, W., Shoemaker, A., and Paterson, A.(2006)
SyMAP: A system for discovering and viewing syntenic regions of FPC maps.
Genome Res. 16:1159-1168.
Soderlund, C., Bomhoff, M., and Nelson, W. (2011)
SyMAP: A turnkey synteny system with application to multiple large duplicated plant sequenced genomes.
Nucleic Acids Res V39, issue 10, e68.