The University of Arizona
runMultiTCW Guide
AGCoL | TCW Home | Doc Index | singleTCW Guide | DE Guide | multiTCW Guide | Tour

MultiTCW (mTCW) is the comparative module of Transcriptome Computational Workbench. This module takes as input two or more singleTCW databases.

Note, familiarity with singleTCW is essential, as MultiTCW projects are created by merging existing sTCW projects.

Contents

Overview

Go to top
Common abbreviations:
mTCW   multiTCW database
sTCW   singleTCW database
NT   Nucleotide (transcript, gene)
AA   Amino acid (translated ORF, protein)
NT-sTCW   singleTCW created from NT sequences.
AA-sTCW   singleTCW created from AA sequences.
NT-mTCW   multiTCW build from "only" NT-sTCW.
AA-mTCW   singleTCW build from multiple AA-sTCW or a mix of AA-sTCW and NT-sTCW.

runMultiTCW

  1. Input:
    1. Two or more sTCW databases. The sequences, their annoDB hits, RPKM and DE p-values are imported to the multiTCW database.
    2. For NT-sTCW, either input the corresponding translated sequences (i.e. from runSingleTCW) or a SMAT file so TCW can run ESTSCAN3 to generate the translated sequences.
    3. (Optional) A file of clusters.
  2. The results are the best if:
    • The sTCW databases are annotated the same.
    • The conditions names are exactly the same (when applicable). For example, if two species both have counts for the tissue type 'leaf', the condition name provided in runSingleTCW must be the same for both (e.g. leaf), though the name is case-insensitive.
    • The DE column names are exactly the same (when applicable), as in the previous point.
  3. Computation:
    • Compare the sequences using BLAST 1,2.
    • Compute one or more sets of clusters using the BBH (bi-directional best hit), Closure, orthoMCL4, or user-supplied clusters.
    • For a NT-mTCW database created from only NT-sTCW databases, statistics such as Ka/Ks5, synonymous, etc are computed.

Software Requirements and Installation

Go to top
The multiTCW executables were installed when you installed TCW (see Installation).
SoftwareVersionSourceAdditional info
Required for protein or transcript comparison:
BLAST 2.x NCBI Used for self-blast of the protein sequences in runMultiTCW.
Used for blast with viewMultiTCW.
Legacy BLAST (and Megablast) or BLAST+ may be used.
Optional package for transcript comparison:
KaKs_Calculator v.1.2. KaKs_calculator
(google for other download sites).
For KaKs analysis for runMultiTCW.
Additional packages (supplied with TCW)1:
ESTScan 3.0.3 ESTScan Extracts protein sequences from transcripts for runMultiTCW.
OrthoMCL 2.0 OrthoMCL Clusters orthologous proteins in runMultiTCW.
Uses Perl and MySQL, which requires DBI::mysql
Muscle 3.8.31 Muscle Multiple alignment of protein cluster in viewMultiTCW.

1 Linux binaries are supplied under the external directory and the Mac binaries are supplied under the external_osx. You do not need to do anything special as TCW will find the packages here if they are not in your path. We cannot guarantee the packages will work on every linux/mac machine; if they do not, you will need to replace them.

Running the Demo

Go to top
This section is essential for learning how to use runMultiTCW, and describes how to make a multiTCW project starting from the two singleTCW demos which are included in the package. Click any image to see the close-up.
Using runSingleTCW, create the datasets sTCW_exBar and sTCW_exFoo, as follows:
  • Create the annotation files:
    1. ./runAS
    2. Set the UniProt_<date> to the UniProt_ex. Check the box for "Swiss" and "invertebrate". Select Tax Download.
    3. Set the GO_tmp<date> to GO_tmpEx and go_<date> to go_ex. Select GO Download.
  • Create sTCW_exBar:
    1. ./runSingleTCW
    2. Select "exBar" from the Project dropdown. Run Steps 1-3.
  • Create sTCW_exFoo (same steps as for exBar)..

To create the multiTCW project, start by running the mTCW Manager,

./runMultiTCW
This brings up the Manager interface, shown on the lower right (though all fields will be blank).
Click Add Project to create a new project. Enter "ex" in the entry box, and click ok. The Manager interface will have mTCW database filled in as mTCW_ex, which will be the MySQL database name.

The following are the steps to take, where more detail is provided below.

  1. Using the Add beside the "single TCW databases", add the exBar database with its translated ORF file, and the exFoo database with its translated ORF file.
  2. Select Build Database.
  3. Select Run Blast/Filter.
  4. Select Add Pairs from Blast.
  5. Using the Add button beside the "Cluster Methods" table, add the three different methods with default parameters. The Cluster Methods table should look like it does on the left.
  6. Select Add New Clusters. Once they are added, runMultiTCW will look like the image on the right; the added methods in the table will be italiced. You may add additional methods at any time.
  7. Select Run Stats. When it completes, the label will change to "No action selected".
  8. Exit runMultiTCW in order to run KaKs_calculator.
    1. Download the KaKs_calculator.
    2. Change directory to projcmp. Put the KaKs_calculator executable in this directory.
    3. Change directory to ex/KaKs. Execute "sh runKaKs".
    4. Start up runMultiTCW again. The label on the 4th section should say "Read KaKs", if it does not, select Settings and select it. Then Run Stats.
    5. Select Launch viewMultiTCW to query the results.
The overview will look similar to this Overview.
The log file in projcmp/ex/log will look similar to this log file.

The four steps

Go to top
The following continues to use exBar and exFoo as examples of sTCW databases, but will work for any set of sTCW databases.

Top three rows

Add ProjectA popup window will appear where you enter the project name. On 'OK', the following occurs:
(1) A project directory will be created under projcmp with the project name.
(2) A file called mTCW.cfg is created and written to the project directory.
(3) The database will be the same name with the prefix 'mTCW_' added.
HelpA pop-up window that provides similar information to this UserGuide.
ProjectThe drop-down lists all sub-directories under projcmp. When you select one, the projects mTCW.cfg file will be read and values entered into the interface.
SaveEverything you enter gets saved into mTCW.cfg every time you make a change. However, you can initiate the save with this button if you want to be sure the new information is save.
RemoveSee below.
OverviewOnce you have selected a project, you can select 'Overview' to see its state.
mTCW databaseBy default, the mySQL name will be mTCW_<project-name>. You can change the name, though it must start with mTCW_.

Remove: Select one or more options. When you select 'Ok', you will be prompt to verify each removal.

Pairs.. from database Removes the pairs and clusters so you can start over without running Build Database again, e.g. if you want to use a different blast file, create all new clusters, etc. Once the pairs are removed, you can change the blast settings (e.g. use filtered blast) and then re-add the Pairs.
mTCW database Remove the database but leave the project on disk.
From disk
Pairs and method files If you remove the pairs and clusters from database, it is a good idea to remove all associated files from disk using this option.
Blast files If you recreate the database and you think there may be changes to the sequences in it, you definitely want to remove the blast files so that it allows you to re-blast. Or, if you want to re-run blast, you need to first remove the blast files using this option.
All files If you no longer are using the project, you can delete the database (above) and the all relevant files here.

1. single TCW databases

Click the Add button next to Single TCW databases. This brings up the sTCW selection panel shown on the right.

Click Select sTCW Database produces a popup of existing sTCW databases. Choose sTCW_exBar.

MultiTCW uses proteins for its alignments.

  • Select Use the existing protein file.
  • Select "...", which will take you to the directory AAfiles in the projcmp directory. The file exBar_aaORFs.fasta (translated ORFs) was created during the runSingleTCW; select it followed by "open".
  • If you want to use ESTscan to generate the translated ORFs, see transcripts to proteins.
When you select Keep, it will take you back to the main panel and this database will be shown in the "single TCW databases" table.

Repeat the above step for sTCW_exFoo.

2. Blast

The Run Blast puts the results in blastAA.tab and blastNT.tab; these names cannot be changed.

Typically, the Run Blast/Filter step is run without changing the settings. By default, blast is run with "-use_sw_tback" for the nucleotide self-blast, which is very important as it runs dynamic programming for the final score.

Once blast is run, you can no longer change the parameters; if you want to change the parameter and re-run blast, remove the blast files using the main panel Remove....

Filtering is only required if you think you have many transcripts that are basically the same; the filtering removes them from the blast result file, but not from the database. When you change the parameters %Identity and Max non-align, the name of the filter file will automatically change to reflex the parameters. See the runMultiTCW Help for a description of these two parameters. If blast is already run but the pairs are not loaded, you can change these and rerun the Run Blast/Filter, which will only generate the new filter file.

Add Pairs from blast loads the pairs from the blast result file or filtered file.

3. Cluster Methods

You can create multiple sets of clusters with different methods or parameters.

Click Add in section "Cluster Methods" to add a new clustering method; this brings up the Method panel, shown at right.

  1. BBH is the default, select Keep; control will return to the main panel.
  2. Select Add again, select Closure in the Method dropdown, select Keep.
  3. Select Add again, select OrthoMCL in the Method dropdown, select Keep.

When the multiTCW database is created from nucleotide sTCW databases, it is advantagous to have BBH be at least one of the methods as these pairs are used for the overall summary and for the KaKs pairs.

More information is provided in the section Cluster Methods.

4. Run Stats

The statistics are broken into three sections:
  1. Run on all pairs in the database (e.g. 459):
    • The PCC (Pearson Correlation Coefficient) is only relevant if there are shared conditions, as it is used to determine how similar the RPKM values of the conditions are. It is run on all pairs in the database.
  2. This is "only" relevant for a mTCW database created from only nucleotide sTCW databases:
    • BBH pairs only (e.g. 233 pairs):
      • The summary statistics shown on the Overview for "Pairs".
      • Outputs the Ka/Ks files for input into KaKs_calculator.
    • For all pairs in clusters (e.g. 292) :
      • Synonymous codons, nonsynonymous codons, %match, #gaps, GC content, etc.
  3. Only if Ka/Ks input files exist.
    • Run Stats write files for input to KaKs_calculator
    • Run the KaKs_calculator from a terminal window.
    • Execute Run Stats again.
    • See KaKs_calculator for more details.

Additional details

Go to top

Nucleotide and/or protein singleTCW databases as input

SingleTCW databases can be created from nucleotide (NT-sTCW) or proteins (AA-sTCW). A multiTCW database can be created with a mix of NT-sTCW and AA-sTCW databases. If the multiTCW is created with on AA-sTCW or a mix, only the PCC statistics are available.

For a NT-sTCW, the nucleotide sequences are loaded into the mTCW database. This requires a file of translated ORFs (amino acid sequences) corresponding with the sequences in the database. There does not have to be a translated ORF for every NT sequence in sTCW database, in fact, it can be blank (you can use Nucleotide blast for clustering). However, since singleTCW provides the translated ORF files, there is no reason not to use them.

Transcripts to proteins

If your sTCW databases were created with protein sequences, skip this section.
  1. You may supply your own protein sequences, where the sequences identifiers (i.e. on the ">" description line) must correspond to the sequence identifiers in the sTCW database. runSingleTCW produces a translated ORF file called <project_name>_aaORFs.fasta and puts a copy in projcmp/AAfiles, which can be used as input (see TCW ORF finder).

  2. You may have runMultiTCW run ESTscan, but you must supply the SMAT file (or use the existing one, which is very old). Go to estscan.sourceforge.net for the source code and instructions for building a SMAT file. Put the resulting SMAT file in external/ESTscan/smat (or external_osx on Mac).

Clustering Methods

Go to top
The Help page for clustering provides more detail, but the following is an overview.

All methods need a unique prefix, which is used to prefix the cluster names, e.g. a method with prefix "BB8" will have cluster names BB8_00001, BB8_00002, etc. The prefix can only be 3 characters, but make it a meaningful 3 characters.

BBH

The BBH finds the bi-directional best hit based on blast e-value. Hence, it makes clusters strictly of size 2. It uses the blast hits that were loaded into the database with Add Pairs from Blast. There are 3 parameters:
  1. Amino acid or nucleotide (for NT-mTCW only).
  2. %Similarity - the Blast similarity (Identity).
  3. %Overlap - the alignment length is divided by the length of the sequence times 100 to get the %Overlap for each sequence of the pair (Olap1 and Olap2). You can choose "Either", which requires that either Olap1>%Overlap OR Olap2>%Overlap. If you choose "Both", then Olap1>%Overlap AND Olap2>%Overlap. For example, for an alignment:
    Seq1    -------------------------------
    Seq2     ----------	
    
    This will pass the filter %Overlap>=80 if "Either" is selected, but not "Both".

Closure

Closure has the following requirements: (1) All sequences in a cluster must have a blast hit with all other sequences in the cluster. (2) Each sequence must pass the filters with at least one other sequence in the cluster, where the filter parameters are exactly as described for BBH. The algorithm also uses the blast hits from the database.

OrthMCL

OrthoMCL requires numerous steps to run, and uses a temporary MySQL database; TCW organizes all these details.

OrthoMCL uses the blast file blastAA.tab. It does not guarantee that all sequences in a cluster have a blast hit with each other.

OrthoMCL produces the following message, but it works anyway: Error: acquiring genes from Combined.fasta

OrthoMCL occassionally fails -- every time this has happened to me, I rerun and it works.

User-defined clusters

For this you create a file specifying the groupings, and the interface simply uploads that file. Blast results are not used. The group file has the following format:
..
D26: tra|tra_030 tra|tra_184 tra|tra_094 pro|pro_100
D27: tra|tra_045 tra|tra_209 pro|pro_011
...
Each line starts with "DN", where N is the group number, and then has a space-separated list of the sequences in the group, prefixed by the project prefix that you entered when you set up the mTCW.

Details on running KaKs_calculator

After the KaKs files have been created using Run Stats:
  • Exit runMultiTCW or use a different terminal window.
  • Change directories to projcmp. Put the KaKs_calculator executable in this directory.
  • Change directories to projcmp/<project-name>/KaKs. There will be multiple files with the name oTCWn.awt where n starts at '1'. They each have pairs of aligned sequences minus the gaps. There is also a file called runKaKs which has the commands to run KaKs_calculator on each file, e.g.
    ../../KaKs_calculator -i oTCW1.awt -o iTCW1.xls -m YN &
    ../../KaKs_calculator -i oTCW2.awt -o iTCW2.xls -m YN &
    ../../KaKs_calculator -i oTCW3.awt -o iTCW3.xls -m YN &
    
    If you prefer to use a different method than 'YN', edit this file to change it to whatever method you want (see the KaKs_calculator documentation).
  • From the command line, type sh runKaKs.
  • Read the KaKs files:
    1. If you have exited runMultiTCW and start it back up, the label beside the pair statistics Setting will say "Read KaKs". Just select Run Stats and the files will be read.
    2. If you had not exited runMultiTCW but ran runKaKs for another terminal window, the label will still say "No action to be performed". Select Settings, select "Read KaKs files", Keep. Now Run Stats.

Trouble shooting

Go to top
runMultiTCW is not very forgiving if datasets or cluster methods are entered wrong. Its easiest to just to Remove the offending dataset or cluster and re-enter it.

A file called mTCW.error.log is created if there is an error. If its not clear how to fix the problem, send the file to tcw@agcol.arizona.edu.

View/Query with viewMultiTCW

Go to top
The clusters can be viewed by either:
  1. Click the Launch viewMultiTCW button in the runMultiTCW interface.
  2. Execute './viewMultiTCW' and a window of existing mTCW databases will be displayed, where databases can be selected for display.
  3. Execute './viewMultiTCW <database name>', e.g. ./viewMultiTCW demo displays the window on the right.
There is Help on all the viewMultiTCW views, and Tour shows snapshots of some of the viewMultiTCW windows.

Running as an applet

Go to top
The following can be used to display viewSingleTCW on the web with a given TCW database:
		<HTML>
		<HEAD>
		<TITLE>viewMultiTCW</TITLE>
		</HEAD>
		<BODY>
		Running viewMultiTCW  (please wait)
  		<applet 
    		CODE="cmp.viewer.MTCWApplet"
        	ARCHIVE="location_of_your_jars/mtcw.jar,jars/mysql-connector-java-5.0.5-bin.jar"
    		width=0 height=0
  		MAYSCRIPT>
  		<param name="ASSEMBLY_DB" value="mTCW_your_db_name">
  		<param name="DB_URL"      value="www.your URL">
  		<param name="DB_USER"     value="your username (read-only)">
  		<param name="DB_PASS"     value="your password">
  		<hr>
  		Unable to display the viewMultiTCW applet. Please verify your Java installation.
  		<hr>
  		</applet>
		</BODY>
		</HTML>
		
Applet users may have to adjust their applet memory; see instructions in the
Troubleshooting Guide.

References

Go to top
  1. Zhang, Z., S. Schwartz, L. Wagner, and W. Miller (2000) A greedy algorithm for aligning DNA sequences. J Comput Biol 7:203-214.
  2. Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., Madden, T. (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421.
  3. Iseli, C., Jongeneel, C.V. and Bucher, P. (1999) ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. Proc Int Conf Intell Syst Mol Biol, 138-148.
  4. Li, L., Stoeckert, C.J., Jr. and Roos, D.S. (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res, 13, 2178-2189.
  5. Zhang Z, Li J, Xiao-Qian Z, Wang J, Wong, G, Yu J (2006) KaKs_Calculator: Calculating Ka and Ks through model selection and model averaging. Geno. Prot. Bioinfo. Vol 4 No 4. 259-263.
  6. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792-1797.
  7. UniPROT Consortium (2007) The Universal Protein Resource (UniProt). Nucleic Acids Res 35: D193-197.

Email Comments To: tcw@agcol.arizona.edu