Multi-species TCW (mTCW) is the comparative module of Transcriptome Computational Workbench.
Note, familiarity with singleTCW is essential, as MultiTCW
projects are created by merging existing sTCW projects.
- Two or more singleTCW databases (nucleotide or protein). The
sequences, their annoDB hits, RPKM and DE p-values are imported to the multiTCW database.
- For nucleotide singleTCWs, either input the corresponding translated sequences (i.e. from
runSingleTCW) or a SMAT file so TCW can run
ESTSCAN3 to generate the translated sequences.
- (Optional) A file of clusters.
- The results are the best if:
- The singleTCW databases are annotated the same.
- The conditions names are exactly the same (when applicable). For example, if two species both
have counts for the tissue type 'leaf', the condition name provided in runSingleTCW must be the same for both (e.g. leaf), though the name is case-insensitive.
- The DE column names are exactly the same (when applicable), as in the previous point.
- Compare the sequences using BLAST 1,2.
- Compute one or more sets of clusters using the BBH (bi-directional best hit), Transitive,
or user-supplied clusters.
- View and Query the results.
- If the annotations are the same, then clusters can be annotated with the majority hit.
- If the condition names are the same, then a given conditions can be compared across species, i.e. the Pearson's Correlation Coefficient is computed using RPKM values of conditions with the same name.
- If the DE names are the same, then DE values can easily be compared across datasets.
- Run pairwise alignment or
MUSCLE5 to view the alignment of the clusters.
The multiTCW was installed when you installed TCW (see Installation).
Software Requirements and Installation
|Go to top |
|For protein or transcript comparison:
||Used for annotation and assembly. Legacy |
BLAST+ may be used.
|Additional packages (supplied with TCW)1:
|2.0||OrthoMCL||Clusters orthologous transcripts. |
uses Perl and MySQL, which requires DBI::mysql
|3.0.3||ESTScan||Extracts protein sequences from transcripts.|
|3.8.31||Muscle||Multiple alignment of transcript cluster.|
1 Linux binaries are supplied under the external directory and the Mac binaries are supplied under the external_osx. You do not need to do anything special as TCW will find the packages here if
they are not in your path. We cannot guarantee the packages will work on every linux/mac machine; if they do not,
you will need to replace them.
This section is essential for learning how to use
runMultiTCW, and describes how to make
a multiTCW project starting from the three singleTCW demos which are included in the package.
runSingleTCW, create the pre-configured
sTCW_demoTra, sTCW_demoPro and sTCW_demoAsm, i.e.
- Select the project, execute Load Data, Instantiate and Annotate Sequences.
- (Optional) All three have the conditions Tip and Zone, and demoPro and demoTra have condition Root.
runDE, compute pairwise differential expression.
Building a multiTCW database is as simple as creating the demo above. The following are some details:
To create the multiTCW project, start by running the mTCW Manager,
This brings up the Manager interface, shown at right.
Click Add Project to create a new project. Enter
"demo" in the entry box, and click ok.
The Manager interface will have DB Name filled in as mTCW_demo. This
will be the MySQL database name.
Step1: Click the Add button
next to Single TCW databases. This brings up the sTCW Selection dialog on the lower right.
(Click to see larger image)
Click Select sTCW Database produces a popup of existing sTCW databases (you'll have to click on the
server name first, probably "localhost"). Choose sTCW_demoTra.
MultiTCW uses proteins for its alignments, hence you will need to generate
the amino acid sequences. For this demo:
- Select Generate protein file
- Select "..." and the file chooser will take you to external/ESTScan/smat directory,
which contains the file embr.smat -- select it.
If you use this option, the results will come out slightly different from those below.
Follow the same process to add the demoAsm and demoPro projects, though
demoPro is a protein database, so no SMAT file or translated file is necessary. The resulting
Manager interface is shown on the right.
Select Build Database to create the mTCW database. This takes a few minutes
as sequences and annotation are transferred from the sTCW databases to the new mTCW
ESTScan is run to create the amino acid sequences.
Note: a limited number of hits are transferred for each sequence, the number is generally the 3 top hits,
though it does ensure that the 'Best Annotation' is transferred.
Many messages print to the console during loading, ending with a summary, shown below:
Project: demo multiTCW 1.6.8
Sequence datasets: 3 Created 03-Jan-17
#aaSeq #ntSeq #annoSeq #annoDB Created Remark
asm 104 104 102 7 03-Jan-17 assembly demo
tra 211 211 208 7 03-Jan-17 transcript demo
pro 128 0 128 7 03-Jan-17 protein demo
TOTAL 443 315 438
Sanger Tip Zone Root
asm 98 120,361 45,776 --
tra -- 473,150 210,795 971,049
pro -- 22,235 21,355 12,355
Differential Expression (number with p-value < 0.05:
TiZo RoTi RoZo
asm 0 0 0
tra 1 41 52
pro 31 37 6
Complete mTCW_demo at 03-Jan-17 19:34:42 Elapse time 0m:07s
Step2: Select Run Blast. The step blasts the amino acid sequences against each other
to find the basic similarities used for clustering.
Step3: You can create multiple sets of clusters with different
methods or parameters.
Click Add in section "Cluster Methods" to add a new clustering method; this brings up the
Method dialog, shown at right.
- Selected Add, BBH is the default, select Keep; control will return to the main panel.
- Then select Add again, in the dropdown beside Method, select Transitive, select Keep.
- Select Add again, select OrthoMCL, select Keep.
Select the Add New Clusters button to perform all clustering.
It will runs
OrthoMCL, which is supplied with TCW, and run the built-in
transitive method. Status messages are printed to
the console ending with the summary shown below.
Method =2 3-5 6-10 11-20 21-30 31-40 41-50 >50 Total #Seqs
BB 62 0 0 0 0 0 0 0 62 124
TR 59 9 0 0 0 0 0 0 68 145
OM 55 17 11 4 2 0 1 0 90 413
Complete adding 3 methods for mTCW_demo at 03-Jan-17 19:41:47 Elapse time 0m:03s
You can add more clusters trying different parameters; just select Add, change the parameters, make the prefix unique and Keep; then select
'Add New Clusters'. You may also remove clusters by selecting the name for the cluster table, followed by remove; first it will remove
it from the database, a second remove will remove it from the table.
Transcripts to proteins
If your sTCW databases were created with protein sequences, skip this section.
- You may supply your own protein sequences, where the sequences identifiers (i.e.
on the ">" description line) must correspond to the sequence identifiers in the sTCW
database. runSingleTCW produces a file in the project directory called
ORFbestTranslated.fasta, which can be used as input (see TCW ORF finder).
- You may have
ESTscan, but you must supply the SMAT file (or use the existing
one, which is very old). Go to estscan.sourceforge.net for the source code and instructions for building a SMAT file.
Using the Settings, you may request that is run a self-blast with the nucleotide sequences in the database; these can be used
in the BBH or Transitive algorithm.
You may request that it filter the Blast file to remove near identical sequences. The Setting "Help" page provides more detail.
The Help page for clustering provides more detail, but the following is an overview.
The BBH finds the reciprocal best match based on blast e-value. Hence, it makes clusters strickly of size 2.
Transitive builds nearest-neighbor clusters (i.e., "transitive closure"),
based on blast alignment parameters such as %similarity and bases of overlap.
OrthoMCL requires numerous steps to run, and uses a temporary MySQL database; TCW organizes all these details.
OrthoMCL produces the following message, but it works anyway:
Error: acquiring genes from Combined.fasta
For this you create a file specifying the groupings, and the interface simply uploads that file. Blast results
are not used. The group file has the following format:
D26: tra|tra_030 tra|tra_184 tra|tra_094 pro|pro_100
D27: tra|tra_045 tra|tra_209 pro|pro_011
Each line starts with "DN", where N is the group number, and then has a space-separated list of the
sequences in the group, prefixed by the project prefix that you entered when you set up the mTCW.
A directory is created is created for the mTCW project called projcmp/<project name>.
A file called mTCW.cfg is created that contains all the information about the project,
and will be reloaded into
runMultiTCW when you select the project.
runMultiTCW is not very forgiving if datasets or cluster methods are entered wrong.
Its easiest to just to Remove the offending dataset or cluster and re-enter it.
A file called mTCW.error.log is created if there is an error. If its not clear how to
fix the problem, send the file to email@example.com.
The following can be used to display viewSingleTCW on the web with a given TCW database:
The clusters can be viewed by either:
There is Help on all the
- Click the Launch viewMultiTCW button in the
- Execute './viewMultiTCW' and a window of existing mTCW databases will be displayed, where
databases can be selected for display.
- Execute './viewMultiTCW <database name>', e.g. ./viewMultiTCW demo
displays the window on the right.
viewMultiTCW views, and Tour shows snapshots of some of the
Running viewMultiTCW (please wait)
<param name="ASSEMBLY_DB" value="mTCW_your_db_name">
<param name="DB_URL" value="www.your URL">
<param name="DB_USER" value="your username (read-only)">
<param name="DB_PASS" value="your password">
Unable to display the viewMultiTCW applet. Please verify your Java installation.
Applet users may have to adjust their applet memory; see instructions
in the Troubleshooting Guide.
- Zhang, Z., S. Schwartz, L. Wagner, and W. Miller (2000) A greedy algorithm for aligning DNA sequences. J Comput Biol 7:203-214.
- Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., Madden, T. (2009)
BLAST+: architecture and applications. BMC Bioinformatics 10:421.
- Iseli, C., Jongeneel, C.V. and Bucher, P. (1999)
ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences.
Proc Int Conf Intell Syst Mol Biol, 138-148.
- Li, L., Stoeckert, C.J., Jr. and Roos, D.S. (2003)
OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res, 13, 2178-2189.
- Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792-1797.
- UniPROT Consortium (2007) The Universal Protein Resource (UniProt). Nucleic Acids Res 35: D193-197.