MultiTCW (mTCW) is the comparative module of Transcriptome Computational Workbench.
This module takes as input two or more singleTCW databases (sTCWdb). It has been tested with input of four sTCWdbs (134k total sequences);
though it can probably handle more input, the
viewMultiTCW queries get very slow.
Note, familiarity with singleTCW is essential, as MultiTCW
projects are created by merging existing sTCW projects.
|mTCW|| multiTCW database
|sTCW|| singleTCW database
|NT|| Nucleotide (transcript, gene)
|AA|| Amino acid (translated ORF, protein)
|NT-sTCW|| singleTCW created from NT sequences.
|AA-sTCW|| singleTCW created from AA sequences.
|NT-mTCW|| multiTCW build from "only" NT-sTCW.
|AA-mTCW|| multiTCW build from only AA-sTCW or a mix of AA-sTCW and NT-sTCW.
- Two or more sTCW databases. The
sequences, their annoDB hits, RPKM and DE p-values are imported to the multiTCW database.
- For NT-sTCW, the mTCWdb will contain for each seqID the nucleotide, CDS, and protein sequences. The protein
sequence is created from the CDS sequence, which is created from the TCW computed ORF.
- (Optional) A file of clusters.
- The results are the best if:
- The sTCW databases are annotated the same.
- The conditions names are exactly the same (when applicable). For example, if two species both
have counts for the tissue type 'leaf', the condition name provided in runSingleTCW must be the same for both (e.g. leaf), though the name is case-insensitive.
- The DE column names are exactly the same (when applicable), as in the previous point.
- Compare the AA sequences using one of the supported Search programs, and the NT sequences with blastn.
- Compute one or more sets of clusters using the BBH (bi-directional best hit), Closure,
or user-supplied clusters.
- For a NT-mTCW database created from only NT-sTCW databases,
statistics such as Ka/Ks2, synonymous, etc are computed.
The multiTCW executables were installed when you installed TCW (see Installation).
Software Requirements and Installation
|Go to top |
|Optional package for transcript comparison:
(google for other download sites).
|For KaKs analysis for |
|Additional packages (supplied with TCW)+:
||Clusters orthologous proteins in |
Uses Perl and MySQL, which requires DBI::mysql
||For computing multiple alignment of clusters in
runMultiTCW for scoring and viewing
||For scoring multiple alignment of clusters
||Multiple alignment of protein cluster in |
+ Linux binaries are supplied under the external directory and the Mac binaries are supplied under the external_osx. You do not need to do anything special as TCW will find the packages here if
they are not in your path. We cannot guarantee the packages will work on every linux/mac machine; if they do not,
you will need to replace them.
This section is essential for learning how to use
and describes how to make a multiTCW project starting from the two singleTCW demos. You can use demoTra, demoAsm, demoPro
for a three way comparison that includes an AA-sTCW. However, the three 'ex' demos, which are included in the package, have more homology so make a better example, so that is what the webpage will use.
Create sTCW_exFoo as above. If you want to include a 3rd sTCW, create sTCW_exFly.
runSingleTCW, create sTCW_exBar, as follows:
- Select "exBar" from the Project dropdown.
- The 'ex' demos use the same UniProt_demo and GO database as the 'demo' examples. To quickly create these three databases,
- Only sp_invertebrates and tr_invertebrates are checked.
- You may want to select the search program
diamond using the the "Edit" button.
- You may want to set the GO database to "none" in "Options" -- it can be added later.
- Execute the three steps to load data, instantiate and annotated.
runDE to had to add the differential expression p-value for each pair of conditions.
To create the multiTCW project, start by running the mTCW Manager,
This brings up the Manager interface, shown on the lower right (though all fields will be blank).
The following continues to use exBar and exFoo as examples of sTCW databases, but will work for
any set of sTCW databases.
Click Add Project to create a new project. Enter
"ex" in the entry box, and click ok.
The Manager interface will have mTCW database filled in as mTCW_ex,
which will be the MySQL database name.
The following are the steps to take, where more detail is provided below.
The overview will look similar to this Overview.
- Using the Add beside the "single TCW databases":
Add the exBar, exFoo and exFly databases.
- Select Build Database.
- Select Run Search.
- Select Add Pairs from Hits.
- Using the Add button beside the "Cluster Methods" table, add one or more methods.
Execute "Add New Clusters";
the added methods in the table will be italized. You may
add additional methods at any time.
- Select Run Stats. When it completes, the label will change to "No action selected".
runMultiTCW in order to run
- Download the
- Change directory to projcmp. Put the
KaKs_calculator executable in this
- Change directory to ex/KaKs. Execute "sh runKaKs".
- Start up
runMultiTCW again. The label on the 4th section should say
"Read KaKs", if it does not, select Settings and select it. Then Run Stats.
- Select Launch viewMultiTCW to query the results.
Top three rows
|Add Project||A popup window will appear where you enter the project name. On 'OK', the
(1) A project directory will be created under projcmp with the project name.
(2) A file called mTCW.cfg is created and written to the project directory.
(3) The database
will be the same name with the prefix 'mTCW_' added.
|Help||A pop-up window that provides similar information to this UserGuide.
|Project||The drop-down lists all sub-directories under projcmp. When you
select one, the projects mTCW.cfg file will be read and values entered into the interface.
|Save||Everything you enter gets saved into mTCW.cfg every time you make
a change. However, you can initiate the save with this button if you want to be sure the
new information is save.
|Overview||Once you have selected a project, you can select 'Overview' to see
|mTCW database||By default, the mySQL name will be mTCW_<project-name>. You
can change the name, though it must start with mTCW_.
Remove: Select one or more options. When you select 'Ok', you will be prompt to verify each removal.
|Pairs.. from database|| Removes the pairs and clusters so you
can start over without running Build Database again,
e.g. if you want to use a different hit file, create all new clusters, etc. Once the pairs are removed,
you can change the settings and then re-add the Pairs.
|mTCW database|| Remove the database but leave the project on disk.
|Pairs and method files|| If you remove the pairs and clusters from
database, it is a good idea to remove all associated files from disk using this option.
|Hit files|| If you recreate the database and you think there may
be changes to the sequences in it, you definitely want to remove the hit files so that
it allows you to re-search. Or, if you want to re-run a search program, you need to first remove the hit files using
|All files|| If you no longer are using the project, you can delete
the database (above) and the all relevant files here.
1. single TCW databases
SingleTCW databases can be created from nucleotide (NT-sTCW) or proteins (AA-sTCW). A multiTCW
database can be created with a mix of NT-sTCW and AA-sTCW databases. If the multiTCW is created
with on AA-sTCW or a mix, only the PCC statistics are available (see Step 4. Pair Statistics).
Click the Add button next to Single TCW databases.
This brings up the sTCW selection panel shown on the right.
Click Select sTCW Database produces a popup of existing sTCW databases.
Choose the sTCW from the list.
The 'prefix' is only used in the Method files, so it does not matter what it is as long as its unique.
The remark can be anything, and can be added/changed after the database is created. Avoid special characters such as quotes.
When you select Keep, it will take you back to the main panel and
this database will be shown in the "single TCW databases" table.
Repeat to add all the sTCWdbs you want to compare.
2. Compare sequences
The Run Search puts the results in hitsAA.tab and hitsNT.tab;
these names cannot be changed.
On the "Settings" page, you can change the search program (see Search).
By default, blast is run with "-use_sw_tback" for the nucleotide self-blast,
which is very important as it runs dynamic programming for the final score.
Once the search is run, you can no longer change the parameters; if you want to change the
parameter and re-run the search, remove the hit files using the main panel Remove....
Add Pairs from hits loads the pairs from the hits result file.
3. Cluster Methods
Click Add in section "Cluster Methods" to add a new clustering method; this brings up the
Method panel. The drop-down beside "Method" shows BBH, Closure, Ortholog, and User defined.
You can add any number of cluster methods. You can add the same method multiple times with different parameters,
where only the "Prefix" has to be different.
All methods need a unique prefix, which is used to prefix the cluster names, e.g.
a method with prefix "BB8" will have cluster names BB8_00001, BB8_00002, etc. The
prefix can only be 5 characters, but make it a meaningful 5 characters.
When the multiTCW database is created from nucleotide sTCW databases, it is
advantagous to have BBH be at least one of the methods as these pairs are used for the overall
summary and for the KaKs pairs.
The Help page for clustering provides more detail, but the following is an overview.
The BBH finds the bi-directional best hit based on hit e-value.
It uses the hits that were loaded
into the database with Add Pairs from Hits. The following explains the parameters:
- Amino acid or nucleotide (for NT-mTCW only).
- %Similarity - the hit similarity (Identity).
- %Overlap - the alignment length is divided by the length of the sequence times 100 to get the
%Overlap for each sequence of the pair (Olap1 and Olap2). You can choose "Either", which requires
that either Olap1>%Overlap OR Olap2>%Overlap. If you choose "Both", then Olap1>%Overlap AND Olap2>%Overlap.
For example, for an alignment:
This will pass the filter %Overlap>=80 if "Either" is selected, but not "Both".
- The "Select sTCWdbs" will only be present if there are more than two sTCWdbs loaded into the mTCWdb.
The rules are as follows:
- Select two sTCWdbs for the standard BBH of one pair per cluster.
- Select N (N>2) sTCWdbs, and clusters of exactly size N will be created, where each pair in the cluster is a BBH pair.
- Do not select any sTCWdbs, and one cluster set will be created from all pairs of sTCWdbs.
Closure has the following requirements:
The algorithm uses the hits from the database.
- All sequences in a cluster must have a hit
with all other sequences in the cluster.
- Each sequence must pass the filters with at least
one other sequence in the cluster, where the filter parameters are exactly as described for BBH.
OrthoMCL requires numerous steps to run, and uses a temporary MySQL database;
TCW organizes all these details.
OrthoMCL uses the hit file hitsAA.tab. It does not guarantee that
all sequences in a cluster have a hit with each other.
OrthoMCL produces the following message, but it works anyway:
Error: acquiring genes from Combined.fasta
OrthoMCL occassionally fails -- every time this has happened to me, I rerun and it works.
For this you create a file specifying the groupings, and the interface simply uploads that file. Hit results
are not used. The group file has the following format:
D26: tra|tra_030 tra|tra_184 tra|tra_094 pro|pro_100
D27: tra|tra_045 tra|tra_209 pro|pro_011
Each line starts with "DN", where N is the group number, and then has a space-separated list of the
sequences in the group, prefixed by the project prefix that you entered when you set up the mTCW.
4. Run Stats
After adding clusters and running stats, you can add more clusters. In order to update the stats after adding more clusters:
The statistics are broken into four sections:
- Run on the total hit pairs in the database:
- The PCC (Pearson Correlation Coefficient) is only relevant if there are
shared conditions, as it is used to determine how similar the RPKM values of the
conditions are. It is run on all pairs in the database.
- Alignment of hit pairs in clusters.
This is "only" relevant for a mTCW database created from only nucleotide sTCW databases:
- For each alignment, the following is performed:
- Synonymous codons, nonsynonymous codons, %match, #gaps, GC content, etc.
- The summary statistics shown on the Overview for "Pairs".
- Outputs the Ka/Ks files for input into
- Only if Ka/Ks input files exist.
- Run Stats with Write selected to output the files for input to
- Run the
KaKs_calculator from a terminal window.
- Execute Run Stats again with Read selected.
- Multiple alignment of clusters.
- Align all clusters using MAFFTA3.
- Compute consensus length, standard deviation of length, sum-of-pairs score, and Trident score using MstatX4.
- Select Compute Statistics will align any new unaligned pairs in clusters and update the summary.
- Select KaKs Write will align ALL pairs in clusters and update the summary.
Details on running
After the KaKs files have been created using Run Stats:
The following times are from the log files for building an mTCW database with three NT-sTCWdbs.
|Build Database||5h:0m:36s||138,907 sequences
|Add Pairs||2h:3m:04s||454,568 pairs
|Add New Clusters||1h:23m:05s||46,831 clusters
|Run Stats||1h:33m:15s||116,109 alignments
The longest task is to Add GOs (timing not shown); this task can be done at anytime, so it is recommended to wait until everything else is finalized before adding the GOs.
The search program (e.g. blast) is run on #CPU, but all mTCW tasks only use one CPU.
runMultiTCW is not very forgiving if datasets or cluster methods are entered wrong.
Its easiest to just to Remove the offending dataset or cluster and re-enter it.
A file called mTCW.error.log is created if there is an error. If its not clear how to
fix the problem, send the file to firstname.lastname@example.org.
The clusters can be viewed by either:
There is Help on all the
- Click the Launch viewMultiTCW button in the
- Execute './viewMultiTCW' and a window of existing mTCW databases will be displayed, where
databases can be selected for display.
- Execute './viewMultiTCW <database name>', e.g. ./viewMultiTCW demo
displays the window on the right.
viewMultiTCW views, and Tour shows snapshots of some of the
viewMultitTCW can be used as an applet, as follows:
- The mtcw.jar file must be signed with a code signing certificate, e.g. using Digicert (which is easy, but costs), or your university may have an account with InCommon, which provides code-signing' certificates.
- You need mysql-connector-java-5.0.5-bin.jar (contact tcw at agcol.arizona.edu if you would like us to provide you a copy).
- Modify the following code as appropriate, and put it in your cgi-bin.
Running viewMultiTCW (please wait)
<param name="ASSEMBLY_DB" value="mTCW_your_db_name">
<param name="DB_URL" value="www.your URL">
<param name="DB_USER" value="your username (read-only)">
<param name="DB_PASS" value="your password">
Unable to display the viewMultiTCW applet. Please verify your Java installation.
Applet users may have to adjust their applet memory; see instructions
in the Troubleshooting Guide.
- Li, L., Stoeckert, C.J., Jr. and Roos, D.S. (2003)
OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res, 13, 2178-2189.
- Zhang Z, Li J, Xiao-Qian Z, Wang J, Wong, G, Yu J (2006) KaKs_Calculator: Calculating Ka and Ks through model selection and model averaging. Geno. Prot. Bioinfo. Vol 4 No 4. 259-263.
- Katoh K, Standley DM (2013) MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Molecular Biology and Evolution Vol 30, Issue 4 772:780
- Guillaume Collet (2012) https://github.com/gcollet/MstatX.
- Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792-1797.