The University of Arizona
Tour: runMultiTCW
AGCoL | TCW Home | Doc Index | Tour | runSingleTCW | runDE | viewSingleTCW | runMultiTCW | viewMultiTCW
runMultiTCW
Main Panel | Additional Panel | Example summary statistics | References

runMultiTCW takes as input two or more singleTCW (sTCW) databases and builds a multiTCW (mTCW) database of clustered sequences.

Main panel

1. Build database. Define the input sTCW databases (see panel #1), which can created from nucleotide sequences (NT-sTCW) and/or protein sequences (AA-sTCW). For NT-sTCW, translated ORFs must be provided; the translated ORFs from runSingleTCW may be used. Then press Build database, which builds a database of all sequences, RPKM, DE, annotations and GOs.

2. Compare sequences. This creates a file of all sequences and performs a heuristic search 1 against itself to determine similar sequences. The search program and parameters can be changed (see panel #2).

Add Pair from Hits. All pairs from the Hit file are entered into the database.

3. Add new clusters. Add clusters (see panel #3), where the methods are:

  1. BBH - Bi-directional Best Hit TCW algorithm between N sTCWdbs, where each resulting cluster has N sequences and all pairs are BBH.
  2. Closure - TCW algorithm for determining clusters, where each sequence in a cluster has a hit, good similarity and good overlap with all other sequences in the cluster.
  3. OrthoMCL3 - runMultiTCW executes the orthoMCL scripts and then loads the results into the mTCW database.
  4. User Defined - load a file of clusters where the file format is the orthoMCL format.
One or more clusters can be added with each execution of Add New Cluster, and any cluster can be removed with Remove.

4. Add Stats. Add statistics (see panel #4), as follows.

  1. PCC (Pearson Correlation Coefficient): This is only relevant if there are shared conditions, as it is used to determine how similar the RPKM values of the conditions are. It is run on all pairs in the database.
  2. The following two are only relevant for mTCW databases that are built with only NT-sTCW databases, as they use the nucleotide coding regions.
    1. Summary statistics and KaKs files: This uses all pairs from BBH clusters for pair-wise alignments. The KaKs files are written for input to the KaKs_calculator4 along with a script to run from the terminal.
    2. Coding region statistics: Run on all pairs that exist in any cluster. The statistics are those that are shown in the summary statistics (see e.g. summary).
  3. Read KaKs files: This is only relevant if the KaKs_calculator has been executed on the KaKs files. It reads the results into the database.

Additional panels

Go to top
1. Add single TCW database 3. Add a cluster method
2. Run Search 4. Add statistics (before adding statistics)

Example summary statistics

Go to top
PAIRS: 920
     AA   Dataset: Diff    820, Same     85   Similarity 74.3%   Overlap 79.7%   diamond
     NT   Dataset: Diff    663, Same      3   Similarity 87.0%   Overlap 63.6%   blastn

    Aligned pairs   CDS: 701 (941k)   5-UTR: 502 (228k)   3-UTR: 458 (263k)
                                                                               
      Codons                   Amino Acids                 Nucleotides         
      Exact          60.2%     Exact             88.2%     CDS  gap      9.4%  
      Synonymous     28.0%     Substitution  >0   5.6%     CDS  diff    22.7%  
        Fourfold     14.4%     Substitution <=0   6.2%     5UTR diff    62.5%  
        Twofold      11.0%                                 3UTR diff    70.0%  
      Nonsynonymous  11.8%                                                     

      CDS base substitutions: 154.8k               Content: By Nucleotide  By Codon
                    Pos1  Pos2  Pos3  Total                    GC    CpG    CpG  
      Transition     9.0   4.4  38.5   52.0         Both    41.3%   2.2%   1.2%  
      Transversion  10.3   6.8  28.6   45.8         Either  54.1%   5.5%   2.9%  
      ts/tv         0.80  0.63  1.20   1.13         B/E     0.774  0.442  0.478  

   KaKs method: YN    Pairs: 679

      Ka/Ks          Quartiles             P-value       
      0-0.5  660     Q1(Lower)   0.016     <1e-100  415  
      0.5-1   13     Q2(Median)  0.036     <1e-10    95  
      1-1.5    3     Q3(Upper)   0.089     <0.001    51  
      >=1.5    3                           Other    118  

References

Go to top
  1. Supported search programs: any of the following programs can be used for the AA search, and blastn is used for the NT search.
    • BLAST: Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389-3402.
    • Diamond: Buchfink B, Xie C, Huson D (2015) Fast and Sensitive Protein Alignment using DIAMOND, Nature Methods, 12, 59-60 doi:10.1038/nmeth.3176.
    • Usearch: Edgar,RC (2010) Search and clustering orders of magnitude faster than BLAST, Bioinformatics 26(19), 2460-2461.
  2. Li L, Stoeckert CJ, Jr., Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13: 2178-2189.
  3. Zhang Z, Li J, Xiao-Qian Z, Wang J, Wong, G, Yu J (2006) KaKs_Calculator: Calculating Ka and Ks through model selection and model averaging. Geno. Prot. Bioinfo. Vol 4 No 4. 259-263.
Email Comments To: tcw@agcol.arizona.edu