Explain mTCW overview

If the mTCWdb was created with only AA sequences, the Pairs statistics will not be present. The top line may have any of the labels "PCC Stats Multi KaKs"; these indicate that the database has the corresponding information.

Names in the "Compute" column refer to table (Cluster, Pair, Seq) Columns unless stated otherwise.
LabelDescriptionCompute
DATASETS This first section describes the data imported from the sTCWdbs.
CLUSTER SETS One or more sets of clusters will be present in the mTCW. The columns are in the Cluster table.
Sizes
  [Range] The range of cluster sizes per cluster set. These may be computed from viewing all Clusters sorted on the 'Count' column. Count
Statistics
  conLenAverage consensus length of the clusterAvg(conLen)1
  sdLenAverage standard deviation of the lengths of the amino acid sequences in the cluster.Avg(sdLen)1
  Score1The MSA scoring method is shown in the PROCESSING section. Avg(Score1)1
  SDStandard deviation of Score1. StdDev(Score1)1
  Score2The MSA scoring method is shown in the PROCESSING section. Avg(Score2)1
  SDStandard deviation of Score2. StdDev(Score2)1
PAIRS Similar pairs were identified from the comparing the sequences using a search program (BLAST or DIAMOND), where AA is from the amino acid search and NT is from the nucleotide search (NT may not exist). The columns are in the Pairs table.
Hits (from hit file) The following is the same for NT.
  Diff The number of hits from different datasets. Pair Filter: Has AA, Different sets.
  Same The number of hits from the same dataset. Pair Filter: Has AA, Same sets.
  Similarity Percent similarity of all alignments
(Average percent similarity of each alignment).
see1b
((Avg(%AAsim))1
  Coverage Percent overlap of all alignments
(Average percent alignment of the length of each sequence in the pair).
see1b
((Avg(%AAcov1)+
Avg(%AAcov2))/2)1
Aligned (aligned using dynamic programming, Filter: Has Stats)
  CDS:The number of aligned CDS bases including gaps but not overhangs.Sum(Align)1
  5UTR:Same as CDS but for 5UTR.see2
  3UTR:Same as CDS but for 3UTR.see2
Codon column
  Codons Number of aligned codons excluding gaps. Sum(Calign)1
  Exact Percent of the number of codons that are exactly the same. %Cexact; see1b
  Synonymous Percent of the number of codons that are synonymous (different codon, same amino acid). %Csyn; see1b
    Fourfold Percent of the number codons that are fourfold (4d), which are two synonymous codons where the ith position allows any of the 4 bases. %C4d; see1b
    Twofold Percent of the number of codons that are fourfold (2d), which are two synonymous codons where the ith position allows any of the 2 bases. %C2d; see1b
  Nonsynonymous Percent of the number of codons that are nonsynonynous (different amino acid). %CnonSyn; see1b
Amino acid column (same number as Codons)
  Exact Percent of the AA characters that are the same. %Aexact; see1b
  Substitution>0 Percent of AA characters that are substitutions with BLOSUM62>0. %Apos; see1b
  Substitution<=0 Percent of AA characters that are substitution with BLOSUM62<=0. %Aneg; see1b
Nucleotides columns (References to CDS, 5UTR, 3UTR are the numbers by 'Aligned:')
  CDS Diff Percent of CDS bases that are not the same, i.e. ((Gap+SNP)/Align)*100.0. %Cdiff; see1b
    Gaps Percent of CDS bases that are Gaps, i.e. (Gaps/Align)*100.0 see1b
    SNPs Percent of CDS bases that are SNPs, i.e (SNP/Align)*100.0. see1b
  5UTR Diff Percent of 5UTR bases that are different. %5diff; see1
  3UTR Diff Percent of 3UTR bases that are not different. %3diff; see1
Columns: Pos1 Pos2 Pos3 Total4
  Transition (ts) Percent of SNPs that are transitions in each of the three codon positions. see2
  Transversion (tv) Percent of SNPs that are transversion in each of the three codon positions. see2
  ts/tv The total number of transitions divided the total number of transversions. ts/tv; see2
Columns: GC CpG-Nt CpG-Cd 4
  Both Percent of CDS bases where both (union) sequences have a GC base. Same for the CpG sites, where CpG-Nt (nucleotide) and CpG-Cd (codon, CpG does not cross codon boundaries). see3
  Either Percent of CDS bases where either or both (intersection) sequences have a GC base. Same for the CpG sites, where CpG-Nt (nucleotide) and CpG-Cd (codon, CpG does not cross codon boundaries). see3
  Jaccard The total number of 'both' divided by the total number of 'either'. There is a pair column for the GC and codon-based CpG-Cd, but not the nucleotide-base CpG-Nt. GC-JI,
CpG-JI; see3
KaKs method5
Ka/KsSelective strength. A score <1 is purifying, >1 is positive, and significantly >1 implies that at least some of the mutations are probably advantageous. It is rare for KaKs to be exactly 1, so the following rules are used:
If KaKs >= 0.995 and KaKs < 1.006, then KaKs~1
If KaKs >= 1.006, then KaKs > 1
If KaKs <  0.995, then KaKs < 1
Set Filter according to rules on the left.
QuartilesOf the KaKs values. It uses the method of splitting the list in half; Q1 is the median of the lower half and Q3 is the median of the upper half. Q2=Median(KaKs)1
Average
  KaNonsynonymous substitution rateAvg(Ka)1
  KsSynonymous substitution rateAvg(Ks)1
  P-valueKaKs p-valueAvg(p-value)1
P-valueThe 'other' count includes pairs where the KaKs_calculator did not provide a p-value though did provide a KaKs value.p-value
SEQUENCES The columns are in the Sequence table; use Sequence Filters to select a dataset to view its corresponding results.
Average Lengths The ORFs were computed for the sTCWdbs and imported along with their translated sequence. 5UTR Len, 3UTR Len, CDS Len; see1b
%GC The average percent of GC for 5'UTR, CDS and 3'UTR. The only %GC column is for the entire sequence. -
CpG O/E The CpG observed/expected for the 5'UTR, CDS and 3'UTR [(#CpG/(#G*#C))*Len]. 5UTR CpG, CDS CpG, 3UTR CpG; see1b
Counts The total raw counts from each dataset. These can be verified from the singleTCWs. The TPM values are in the sequence table, but not the counts, though these can be viewed in the Sequence Detail panel.
Differential Expression The total DE from each dataset. These can be verified from the singleTCWs.
1 For a Pair table, select the "Table.../Show Column Stats" which provides "Avg" and "Sum".
1b Percents for the Overview are from summing the numerator and denominator before dividing.
The Average of a percent column of the "Table.../Show Column Stats" is taking the the average of the percentages.
For example, in the mTCW demo, the Exact number of all aligned bases is 58.6%, whereas the Column stats "Average" is the average of all align to length percentages, which is 60.286.

The only way to view the number of Exact, Synonymous, etc codons is, for a given pair:
(a) Sequence Table, (b) Pairwise..., (c) AA,CDS,NT, (d) Align CDS..., (e) Match (f) Codon Exact (and other counts) at the top.

2 The only way to view the number of aligned UTR bases is, for a given pair:
(a) Sequence Table, (b) Pairwise..., (c) 5UTR,CDS,3UTR, (d) Align 5UTR..., (e) Match (f) CROP is the number of aligned UTR bases.
3 (a) Sequence Table, (b) Pairwise..., (c) 5UTR,CDS,3UTR, (d) Align CDS..., (e) ts/tv, (f) The ts and tv numbers are at the top.
4 (a) Sequence Table, (b) Pairwise..., (c) 5UTR,CDS,3UTR, (d) Align CDS..., (e) CpG, (f) The GC and CpG numbers are at the top, where
"All CpG" is "CpG-Nt" and "By Codon" is "CpG-Cd".
5 The KaKs_calculator (Zhang et al. 2006) will typically be used to compute the KaKs values, where the method used is shown on the "KaKs Method" line.