runMultiTCW takes as input two or more singleTCW (sTCW) databases
and builds a multiTCW (mTCW) database of clustered sequences.
1. Build database. Define the input sTCW databases
(see panel #1), which can created from
nucleotide sequences (NT-sTCW) and/or protein sequences (AA-sTCW).
For NT-sTCW, translated ORFs must be provided; the translated ORFs from
may be used.
Then press Build database, which builds a database of all sequences, RPKM, DE, annotations and GOs.
2. Compare sequences. This creates a file of all sequences and performs a heuristic search 1 against itself
to determine similar sequences. The search program and parameters can be changed (see panel #2).
Add Pair from Hits. All pairs from the Hit file are entered into the database.
3. Add new clusters. Add clusters (see panel #3), where the methods are:
One or more clusters can be added with each execution of Add New Cluster, and any cluster
can be removed with Remove.
- BBH - Bi-directional Best Hit TCW algorithm between N sTCWdbs, where each resulting cluster has N sequences and all pairs are BBH.
- Closure - TCW algorithm for determining clusters, where each sequence in a cluster has
a hit, good similarity and good overlap with all other sequences in the cluster.
- OrthoMCL3 -
runMultiTCW executes the
orthoMCL scripts and then loads the results into the mTCW database.
- User Defined - load a file of clusters where the file format is the orthoMCL format.
4. Add Stats. Add statistics (see panel #4), as follows.
- PCC (Pearson Correlation Coefficient): This is only relevant if there are shared conditions,
as it is used to determine how similar the RPKM values of the conditions are.
It is run on all pairs in the database.
- The following two are only relevant for mTCW databases that are built with
only NT-sTCW databases, as they use the nucleotide coding regions.
- Summary statistics and KaKs files: This uses all pairs from BBH clusters for pair-wise alignments. The KaKs files are written for input to
KaKs_calculator4 along with a script to run from the terminal.
- Coding region statistics: Run on all pairs that exist in any cluster. The statistics
are those that are shown in the summary statistics (see e.g. summary).
- Read KaKs files: This is only relevant if the
KaKs_calculator has been
executed on the KaKs files. It reads the results into the database.
1. Add single TCW database
3. Add a cluster method
2. Run Search
4. Add statistics (before adding statistics)
AA Dataset: Diff 820, Same 85 Similarity 74.3% Overlap 79.7% diamond
NT Dataset: Diff 663, Same 3 Similarity 87.0% Overlap 63.6% blastn
Aligned pairs CDS: 701 (941k) 5-UTR: 502 (228k) 3-UTR: 458 (263k)
Codons Amino Acids Nucleotides
Exact 60.2% Exact 88.2% CDS gap 9.4%
Synonymous 28.0% Substitution >0 5.6% CDS diff 22.7%
Fourfold 14.4% Substitution <=0 6.2% 5UTR diff 62.5%
Twofold 11.0% 3UTR diff 70.0%
CDS base substitutions: 154.8k Content: By Nucleotide By Codon
Pos1 Pos2 Pos3 Total GC CpG CpG
Transition 9.0 4.4 38.5 52.0 Both 41.3% 2.2% 1.2%
Transversion 10.3 6.8 28.6 45.8 Either 54.1% 5.5% 2.9%
ts/tv 0.80 0.63 1.20 1.13 B/E 0.774 0.442 0.478
KaKs method: YN Pairs: 679
Ka/Ks Quartiles P-value
0-0.5 660 Q1(Lower) 0.016 <1e-100 415
0.5-1 13 Q2(Median) 0.036 <1e-10 95
1-1.5 3 Q3(Upper) 0.089 <0.001 51
>=1.5 3 Other 118
- Supported search programs: any of the following programs can be used for the AA search, and blastn is used for the NT search.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389-3402.
Buchfink B, Xie C, Huson D (2015) Fast and Sensitive Protein Alignment
using DIAMOND, Nature Methods, 12, 59-60 doi:10.1038/nmeth.3176.
Edgar,RC (2010) Search and clustering orders of magnitude faster than BLAST, Bioinformatics 26(19), 2460-2461.
- Li L, Stoeckert CJ, Jr., Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13: 2178-2189.
- Zhang Z, Li J, Xiao-Qian Z, Wang J, Wong, G, Yu J (2006) KaKs_Calculator: Calculating Ka and Ks through model selection and model averaging. Geno. Prot. Bioinfo. Vol 4 No 4. 259-263.