MultiTCW (mTCW) is the comparative module of Transcriptome Computational Workbench.
This module takes as input two or more singleTCW databases.
Note, familiarity with singleTCW is essential, as MultiTCW
projects are created by merging existing sTCW projects.
|mTCW|| multiTCW database
|sTCW|| singleTCW database
|NT|| Nucleotide (transcript, gene)
|AA|| Amino acid (translated ORF, protein)
|NT-sTCW|| singleTCW created from NT sequences.
|AA-sTCW|| singleTCW created from AA sequences.
|NT-mTCW|| multiTCW build from "only" NT-sTCW.
|AA-mTCW|| multiTCW build from only AA-sTCW or a mix of AA-sTCW and NT-sTCW.
- Two or more sTCW databases. The
sequences, their annoDB hits, RPKM and DE p-values are imported to the multiTCW database.
- For NT-sTCW, the translated sequences need to be input, using one of the following two approaches:
- A best frame translated ORF file was output during
annotation and was written to the projcmp/AAfiles directory.
- Alternatively, you can have
to generate the translated sequences, but you have to supply a SMAT file (see their documentation).
- (Optional) A file of clusters.
- The results are the best if:
- The sTCW databases are annotated the same.
- The conditions names are exactly the same (when applicable). For example, if two species both
have counts for the tissue type 'leaf', the condition name provided in runSingleTCW must be the same for both (e.g. leaf), though the name is case-insensitive.
- The DE column names are exactly the same (when applicable), as in the previous point.
- Compare the sequences using BLAST 1,2.
- Compute one or more sets of clusters using the BBH (bi-directional best hit), Closure,
or user-supplied clusters.
- For a NT-mTCW database created from only NT-sTCW databases,
statistics such as Ka/Ks5, synonymous, etc are computed.
The multiTCW executables were installed when you installed TCW (see Installation).
Software Requirements and Installation
|Go to top |
|Required for protein or transcript comparison:
||Used for self-blast of the protein sequences in |
Used for blast with
|Optional package for transcript comparison:
(google for other download sites).
|For KaKs analysis for |
|Additional packages (supplied with TCW)1:
||Extracts protein sequences from transcripts for |
||Clusters orthologous proteins in |
Uses Perl and MySQL, which requires DBI::mysql
||Multiple alignment of protein cluster in |
1 Linux binaries are supplied under the external directory and the Mac binaries are supplied under the external_osx. You do not need to do anything special as TCW will find the packages here if
they are not in your path. We cannot guarantee the packages will work on every linux/mac machine; if they do not,
you will need to replace them.
This section is essential for learning how to use
and describes how to make a multiTCW project starting from the two singleTCW demos (exBar and exFoo)
which are included in the package.
runSingleTCW, create the datasets
sTCW_exBar and sTCW_exFoo, as follows:
- Create the annotation files:
- Fill out the panel as shown on the right, and select Tax Download;
when that completes, select GO Download.
- Create sTCW_exBar:
- Select "exBar" from the Project dropdown; the panel will be populated with the data
to annotate with the UniProt and GO you just created. You only need to execute the
three steps to load data, instantiate and annotated.
- Create sTCW_exFoo (same steps as for exBar)..
To create the multiTCW project, start by running the mTCW Manager,
This brings up the Manager interface, shown on the lower right (though all fields will be blank).
The following continues to use exBar and exFoo as examples of sTCW databases, but will work for
any set of sTCW databases.
Click Add Project to create a new project. Enter
"ex" in the entry box, and click ok.
The Manager interface will have mTCW database filled in as mTCW_ex,
which will be the MySQL database name.
The following are the steps to take, where more detail is provided below.
The overview will look similar to this Overview.
- Using the Add beside the "single TCW databases":
- Add the exBar
database with its translated ORF file (projcmp/AAfiles/exBar_aaORFs.fasta)
- Add exFoo database with its
translated ORF file (projcmp/AAfiles/exFoo_aaORFs.fasta).
- Select Build Database.
- Select Run Blast/Filter.
- Select Add Pairs from Blast.
- Using the Add button beside the "Cluster Methods" table, add the three
different methods with default parameters. The Cluster Methods table should look like it does on the left.
- Select Add New Clusters. Once they are added,
runMultiTCW will look like the image on the right;
the added methods in the table will be italiced. You may
add additional methods at any time.
- Select Run Stats. When it completes, the label will change to "No action selected".
runMultiTCW in order to run
- Download the
- Change directory to projcmp. Put the
KaKs_calculator executable in this
- Change directory to ex/KaKs. Execute "sh runKaKs".
- Start up
runMultiTCW again. The label on the 4th section should say
"Read KaKs", if it does not, select Settings and select it. Then Run Stats.
- Select Launch viewMultiTCW to query the results.
The log file in projcmp/ex/log will look similar to this log file.
Top three rows
|Add Project||A popup window will appear where you enter the project name. On 'OK', the
(1) A project directory will be created under projcmp with the project name.
(2) A file called mTCW.cfg is created and written to the project directory.
(3) The database
will be the same name with the prefix 'mTCW_' added.
|Help||A pop-up window that provides similar information to this UserGuide.
|Project||The drop-down lists all sub-directories under projcmp. When you
select one, the projects mTCW.cfg file will be read and values entered into the interface.
|Save||Everything you enter gets saved into mTCW.cfg every time you make
a change. However, you can initiate the save with this button if you want to be sure the
new information is save.
|Overview||Once you have selected a project, you can select 'Overview' to see
|mTCW database||By default, the mySQL name will be mTCW_<project-name>. You
can change the name, though it must start with mTCW_.
Remove: Select one or more options. When you select 'Ok', you will be prompt to verify each removal.
|Pairs.. from database|| Removes the pairs and clusters so you
can start over without running Build Database again,
e.g. if you want to use a different blast file, create all new clusters, etc. Once the pairs are removed,
you can change the blast settings (e.g. use filtered blast) and then re-add the Pairs.
|mTCW database|| Remove the database but leave the project on disk.
|Pairs and method files|| If you remove the pairs and clusters from
database, it is a good idea to remove all associated files from disk using this option.
|Blast files|| If you recreate the database and you think there may
be changes to the sequences in it, you definitely want to remove the blast files so that
it allows you to re-blast. Or, if you want to re-run blast, you need to first remove the blast files using
|All files|| If you no longer are using the project, you can delete
the database (above) and the all relevant files here.
1. single TCW databases
Click the Add button next to Single TCW databases.
This brings up the sTCW selection panel shown on the right.
Click Select sTCW Database produces a popup of existing sTCW databases.
MultiTCW uses proteins for its alignments.
When you select Keep, it will take you back to the main panel and
this database will be shown in the "single TCW databases" table.
- Select Use the existing protein file.
- Select "...", which will take you to the directory AAfiles in
the projcmp directory. The file exBar_aaORFs.fasta (translated ORFs) was
created during the
runSingleTCW; select it followed by "open".
- If you want to use ESTscan to generate the translated ORFs, see transcripts to
Repeat the above step for sTCW_exFoo.
The Run Blast puts the results in blastAA.tab and blastNT.tab;
these names cannot be changed.
Typically, the Run Blast/Filter step is run without changing the settings.
By default, blast is run with "-use_sw_tback" for the nucleotide self-blast,
which is very important as it runs dynamic programming for the final score.
Once blast is run, you can no longer change the parameters; if you want to change the
parameter and re-run blast, remove the blast files using the main panel Remove....
Filtering is only required if you think you have many transcripts that are basically the same;
the filtering removes them from the blast result file, but not from the database. When you
change the parameters %Identity and Max non-align, the name of the filter file
will automatically change to reflex the parameters. See the
Help for a description of
these two parameters. If blast is already run but the pairs are not loaded, you can change these
and rerun the Run Blast/Filter, which will only generate the new filter file.
Add Pairs from blast loads the pairs from the blast result file or filtered file.
3. Cluster Methods
You can create multiple sets of clusters with different
methods or parameters.
Click Add in section "Cluster Methods" to add a new clustering method; this brings up the
Method panel, shown at right.
- BBH is the default, select Keep; control will return to the main panel.
- Select Add again, select Closure in the Method dropdown, select Keep.
- Select Add again, select OrthoMCL in the Method dropdown, select Keep.
When the multiTCW database is created from nucleotide sTCW databases, it is
advantagous to have BBH be at least one of the methods as these pairs are used for the overall
summary and for the KaKs pairs.
More information is provided in the section Cluster Methods.
4. Run Stats
The statistics are broken into three sections:
- Run on all pairs in the database (e.g. 459):
- The PCC (Pearson Correlation Coefficient) is only relevant if there are
shared conditions, as it is used to determine how similar the RPKM values of the
conditions are. It is run on all pairs in the database.
- This is "only" relevant for a mTCW database created from only nucleotide sTCW databases:
- BBH pairs only (e.g. 233 pairs):
- The summary statistics shown on the Overview for "Pairs".
- Outputs the Ka/Ks files for input into
- For all pairs in clusters (e.g. 292) :
- Synonymous codons, nonsynonymous codons, %match, #gaps, GC content, etc.
- Only if Ka/Ks input files exist.
- Run Stats write files for input to
- Run the
KaKs_calculator from a terminal window.
- Execute Run Stats again.
- See KaKs_calculator for more details.
Nucleotide and/or protein singleTCW databases as input
SingleTCW databases can be created from nucleotide (NT-sTCW) or proteins (AA-sTCW). A multiTCW
database can be created with a mix of NT-sTCW and AA-sTCW databases. If the multiTCW is created
with on AA-sTCW or a mix, only the PCC statistics are available.
For a NT-sTCW, the nucleotide sequences are loaded into the mTCW database. This requires a file
of translated ORFs (amino acid sequences) corresponding with the sequences in the database.
There does not have to be a translated ORF for every NT sequence in sTCW database, in fact, it
can be blank (you can use Nucleotide blast for clustering). However, since singleTCW provides
the translated ORF files, there is no reason not to use them.
Transcripts to proteins
If your sTCW databases were created with protein sequences, skip this section.
The Help page for clustering provides more detail, but the following is an overview.
- You may supply your own protein sequences, where the sequences identifiers (i.e.
on the ">" description line) must correspond to the sequence identifiers in the sTCW
runSingleTCW produces a translated ORF file called
and puts a copy in projcmp/AAfiles,
which can be used as input (see TCW ORF finder).
- You may have
but you must supply the SMAT file (or use the existing
one, which is very old).
Go to estscan.sourceforge.net for the source code and instructions for building a SMAT file.
Put the resulting SMAT file in external/ESTscan/smat (or external_osx on Mac).
All methods need a unique prefix, which is used to prefix the cluster names, e.g.
a method with prefix "BB8" will have cluster names BB8_00001, BB8_00002, etc. The
prefix can only be 3 characters, but make it a meaningful 3 characters.
The BBH finds the bi-directional best hit based on blast e-value.
Hence, it makes clusters strictly of size 2. It uses the blast hits that were loaded
into the database with Add Pairs from Blast. There are 3 parameters:
- Amino acid or nucleotide (for NT-mTCW only).
- %Similarity - the Blast similarity (Identity).
- %Overlap - the alignment length is divided by the length of the sequence times 100 to get the
%Overlap for each sequence of the pair (Olap1 and Olap2). You can choose "Either", which requires
that either Olap1>%Overlap OR Olap2>%Overlap. If you choose "Both", then Olap1>%Overlap AND Olap2>%Overlap.
For example, for an alignment:
This will pass the filter %Overlap>=80 if "Either" is selected, but not "Both".
Closure has the following requirements: (1) All sequences in a cluster must have a blast hit
with all other sequences in the cluster. (2) Each sequence must pass the filters with at least
one other sequence in the cluster, where the filter parameters are exactly as described for BBH.
The algorithm also uses the blast hits from the database.
OrthoMCL requires numerous steps to run, and uses a temporary MySQL database;
TCW organizes all these details.
OrthoMCL uses the blast file blastAA.tab. It does not guarantee that
all sequences in a cluster have a blast hit with each other.
OrthoMCL produces the following message, but it works anyway:
Error: acquiring genes from Combined.fasta
OrthoMCL occassionally fails -- every time this has happened to me, I rerun and it works.
For this you create a file specifying the groupings, and the interface simply uploads that file. Blast results
are not used. The group file has the following format:
D26: tra|tra_030 tra|tra_184 tra|tra_094 pro|pro_100
D27: tra|tra_045 tra|tra_209 pro|pro_011
Each line starts with "DN", where N is the group number, and then has a space-separated list of the
sequences in the group, prefixed by the project prefix that you entered when you set up the mTCW.
Details on running
After the KaKs files have been created using Run Stats:
runMultiTCW is not very forgiving if datasets or cluster methods are entered wrong.
Its easiest to just to Remove the offending dataset or cluster and re-enter it.
A file called mTCW.error.log is created if there is an error. If its not clear how to
fix the problem, send the file to email@example.com.
The clusters can be viewed by either:
There is Help on all the
- Click the Launch viewMultiTCW button in the
- Execute './viewMultiTCW' and a window of existing mTCW databases will be displayed, where
databases can be selected for display.
- Execute './viewMultiTCW <database name>', e.g. ./viewMultiTCW demo
displays the window on the right.
viewMultiTCW views, and Tour shows snapshots of some of the
viewMultitTCW can be used as an applet, as follows:
- The mtcw.jar file must be signed with a code signing certificate, e.g. using Digicert (which is easy, but costs), or your university may have an account with InCommon, which provides code-signing' certificates.
- You need mysql-connector-java-5.0.5-bin.jar (contact tcw at agcol.arizona.edu if you would like us to provide you a copy).
- Modify the following code as appropriate, and put it in your cgi-bin.
Running viewMultiTCW (please wait)
<param name="ASSEMBLY_DB" value="mTCW_your_db_name">
<param name="DB_URL" value="www.your URL">
<param name="DB_USER" value="your username (read-only)">
<param name="DB_PASS" value="your password">
Unable to display the viewMultiTCW applet. Please verify your Java installation.
Applet users may have to adjust their applet memory; see instructions
in the Troubleshooting Guide.
- Zhang, Z., S. Schwartz, L. Wagner, and W. Miller (2000) A greedy algorithm for aligning DNA sequences. J Comput Biol 7:203-214.
- Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., Madden, T. (2009)
BLAST+: architecture and applications. BMC Bioinformatics 10:421.
- Iseli, C., Jongeneel, C.V. and Bucher, P. (1999)
ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences.
Proc Int Conf Intell Syst Mol Biol, 138-148.
- Li, L., Stoeckert, C.J., Jr. and Roos, D.S. (2003)
OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res, 13, 2178-2189.
- Zhang Z, Li J, Xiao-Qian Z, Wang J, Wong, G, Yu J (2006) KaKs_Calculator: Calculating Ka and Ks through model selection and model averaging. Geno. Prot. Bioinfo. Vol 4 No 4. 259-263.
- Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792-1797.
- UniPROT Consortium (2007) The Universal Protein Resource (UniProt). Nucleic Acids Res 35: D193-197.