The University of Arizona
Tour: runMultiTCW
AGCoL | TCW Home | Doc Index | Tour | runSingleTCW | runDE | viewSingleTCW | runMultiTCW | viewMultiTCW
runMultiTCW
Main Panel | Additional Panel | Example summary statistics | References

runMultiTCW takes as input two or more singleTCW (sTCW) databases and builds a multiTCW (mTCW) database of clustered sequences. It walks the user through the steps, as shown in the image on the lower right.

Main panel

1. Build database. First define the input sTCW databases (see panel #1), which can created from nucleotide sequences (NT-sTCW) and/or protein sequences (AA-sTCW). For NT-sTCW, translated ORFs must be provided; the translated ORFs from runSingleTCW may be used, or ESTscan1 may be run. Then press Build database, which builds a database of all sequences, RPKM, DE, annotations and GOs.

2. Run BLAST/Filter. This creates a file of all sequences and performs Blast2 against itself to determine similar sequences. The user has the option of filtering the blast results to remove very similar sequences from the same dataset (see panel #2).

Add Pair from Blast. All pairs from the blast file are entered into the database.

3. Add new clusters. Add clusters (see panel #3), where the methods are:

  1. BBH - Bi-directional Best Hit TCW algorithm.
  2. Closure - TCW algorithm for determining clusters, where each sequence in a cluster (1) has a blast hit with all other sequences, and (2) has a user-suppied %Overlap and %Similarity with at least one other sequence in the cluster.
  3. OrthoMCL3 - runMultiTCW executes the orthoMCL scripts and then loads the results into the mTCW database.
  4. User Defined - load a file of clusters where the file format is the orthoMCL format.
One or more clusters can be added with each execution of Add New Cluster, and any cluster can be removed with Remove. This example shows that three sets of clusters have been loaded (the italic cluster name means they are in the database).

4. Add Stats. Add statistics (see panel #4), as follows.

  1. PCC (Pearson Correlation Coefficient): This is only relevant if there are shared conditions, as it is used to determine how similar the RPKM values of the conditions are. It is run on all pairs in the database.
  2. The following two are only relevant for mTCW databases that are built with only NT-sTCW databases, as they use the nucleotide coding regions.
    1. Summary statistics and KaKs files: It is necessary for the BBH clusters to exist as it uses pair-wise alignments. The KaKs files are written for input to the KaKs_calculator4 along with a script to run from the terminal.
    2. Coding region statistics: Run on all pairs that exist in any cluster. The statistics are those that are shown in the summary statistics (see e.g. summary).
  3. Read KaKs files: This is only relevant if the KaKs_calculator has been executed on the KaKs files. It reads the results into the database.

Additional panels

Go to top
1. Add single TCW database 3. Add a cluster method
2. Run Blast and option filter 4. Add statistics (before adding statistics)

Example summary statistics

Go to top
PAIRS: 459
      Blast Diff dataset: AA      349      NT      250
      Blast Same dataset: AA      104      NT        1

  Summary of BBH coding sequences: 233
                                                                      
      Codon                 Amino Acid              Codon Gap         
      Exact          49.4%  Exact            79.7%  Open        1.1%  
      Synonymous     30.3%  Substitution      6.4%  Indel (3)   1.2%  
        Fourfold     15.8%  Nonsubstitution  13.9%  1-2 Gap     1.4%  
        Twofold      11.8%                                            
      Nonsynonymous  20.3%                                            

                             Pos1           Pos2           Pos3  
      Transition      7kb (12.5%)    5kb ( 8.7%)   20kb (32.8%)  
      Transversion    7kb (11.8%)    5kb ( 8.5%)   16kb (25.2%)  
      ts/tv                  1.06           1.03           1.30  

                     %Substitutions    
                     Mean     SE     Kb    ts/tv    %GC   %CpG
    Nonsynonymous    49.9     7.8     65    0.89   65.0   17.2 
      Non-CpG        48.7     7.5     54    0.91   60.6      - 
      CpG            56.2    15.5     11    0.82      -      - 

    Synonymous       34.9     1.0     97    1.53   65.4   15.9 
      Non-CpG        34.6     1.0     82    2.03   61.3      - 
      CpG            37.2     4.2     15    0.39      -      - 

    Fourfold (4d)    33.3     0.0     50    0.60   77.3   24.7 
      Non-CpG        33.3     0.0     38    0.65   74.0      - 
      CpG            33.3     0.0     12    0.49      -      - 

   Summary of BBH Nucleotide  5-UTR: 204  CDS: 233  3-UTR: 202

                    %Differences              %Substitutions       
                     Mean     SE     Kb        Mean      SE     Kb   ts/tv    %GC   %CpG
    5-UTR            40.7     9.6     42       30.2     8.0     36    0.62   50.6    4.8
      Non-CpG        40.1     9.8     40       29.5     8.0     34    0.61   48.1      -
      CpG            62.8    24.3      2       57.2    25.8      2    0.67      -      -

    CDS              20.1     4.5    328       18.9     3.7    322    1.18   56.4    6.3
      Non-CpG        19.5     4.3    307       18.2     3.5    302    1.29   53.4      -
      CpG            30.2     9.7     21       29.3     9.4     20    0.53      -      -

    3-UTR            45.9     7.7     41       34.6     7.5     33    0.50   49.2    4.2
      Non-CpG        45.2     7.9     40       33.7     7.5     31    0.49   46.9      -
      CpG            67.6    21.5      2       61.1    24.3      1    0.61      -      -

  KaKs method: YN                                                  
      Ka/Ks       P-val        Quartiles          
      0-0.5  161  <1e-05  167  Q1(Lower)   0.011  
      0.5-1    4  <1e-04    6  Q2(Median)  0.020  
      1-1.5    5  <1e-03    0  Q3(Upper)   0.050  
      >=1.5   12  <1e-02   12                     

References

Go to top
  1. Iseli C, Jongeneel CV, Bucher P (1999) ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. Proc Int Conf Intell Syst Mol Biol: 138-148.
  2. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389-3402.
  3. Li L, Stoeckert CJ, Jr., Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13: 2178-2189.
  4. Zhang Z, Li J, Xiao-Qian Z, Wang J, Wong, G, Yu J (2006) KaKs_Calculator: Calculating Ka and Ks through model selection and model averaging. Geno. Prot. Bioinfo. Vol 4 No 4. 259-263.
Email Comments To: tcw@agcol.arizona.edu