mTCW overview

Explain mTCW overview

The top line may have any of the labels PCC Stats Multi KaKs; these indicate that the database has the corresponding information. The GOs: label indicates that GOs are in the database.
col:x notation: column x should be selected for viewing in the respective Cluster, Pair, Sequence table.
Stats: Avg(col:x) notation: use the Table... option Show Column Stats and take the Avg column for the resulting table for col:x. Same for Sum and StdDev.

DATASETS

This first section describes the data imported from the sTCWdbs.

CLUSTER SETS

The database will contain one or more cluster sets. The columns are from the Cluster table.

Label	Description	Compute
Statistics
Prefix	The cluster is referred to by this prefix in the various Filters and Columns.	---
Method	The method used to compute the cluster.	See PROCESSING at end of Overview
conLen	Average consensus length of the cluster.	Stats: Avg(col:conLen)
sdLen	Average standard deviation of the sequence lengths (AA Len) in each cluster.	Stats: Avg(col:sdLen)
Score1	Average of Score1. See PROCESSING for the MSA score1 method. Default: Sum-of-pairs	Stats: Avg(col:Score1)
SD	Standard deviation of Score1	Stats: StdDev(col:Score1)
Score2	Average of Score2. See PROCESSING for the MSA score2 method. Default: Wentropy	Stats: Avg(col:Score2)
SD	Standard deviation of Score2	Stats: StdDev(col:Score2)
Sizes
[Range]	Range of cluster sizes per cluster set	Calculate by sorting on the col:Count

PAIRS

Similar pairs were identified from the comparing the sequences using a search program (BLAST or DIAMOND), where AA is from the amino acid search and NT is from the nucleotide search (NT may not exist). Use Pairs Filters to produce the pairs table.

The Overview shows the overall percentage. In the table below, any Compute cell with superscript¹ computes Average of percentages, which is a usually close, but not the same.
Any Compute cell with superscript² can only be calculated by viewing the Pairwise... of each pair and manually computing.

Label Description Compute

Hits: (from hit file); for the NT statistics, replace AA with NT.

Diff The number of hits from different datasets. Filter: Hits: AA pairs;
Datasets: Different sets

Same The number of hits from the same dataset. Filter: Hits: AA pairs;
Datasets: Same sets

Similarity Average percent similarity. Filter: Hits: AA pairs; Stats: Avg(col:%AAsim)¹

Coverage Average percent coverage. Filter: Hits: AA pairs; Stats: (Avg(col:%AAcov1) + Avg(col:%AAcov2))/2)¹

Aligned: Filter: Statistics: Has Stats (aligned using dynamic programming)

CDS: Number of aligned CDS bases including gaps but not overhangs. Stats: Sum(col:Align)

5UTR: Number of aligned 5'UTR bases including gaps but not overhangs. View values; see²

3UTR: Number of aligned 3'UTR bases including gaps but not overhangs. View values; see²

Codon column

Codons Number of aligned codons excluding gaps. Stats: Sum(col:Calign)

Exact Percent codons that are exactly the same. Stats: Avg(col:%Cexact)¹

Synonymous Percent codons that are synonymous (different codon, same amino acid). Stats: Avg(col:%Csyn)¹

Fourfold Percent codons that are fourfold (4d) (synonymous codons where the ith position allows any of the 4 bases). Stats: Avg(col:%C4d)¹

Twofold Percent codons that are twofold (2d) (synonymous codons where the ith position allows any of the 2 bases). Stats: Avg(col:%C2d)¹

Nonsynonymous Percent codons that are nonsynonynous (different amino acid). Stats: Avg(col:%nonSyn)¹

Amino acid column

Exact Percent amino acid that are the same. Stats: Avg(col:%Aexact)¹

Substitution>0 Percent amino acids that are substitutions with BLOSUM62>0. Stats: Avg(col:%Apos)¹

Substitution<=0 Percent amino acids that are substitution with BLOSUM62<=0. Stats: Avg(col:%Aneg)¹

Nucleotides columns

CDS Diff Percent CDS bases that are different, i.e. ((Gap+SNP)/Align)%. Stats: ((sum(col:gap) + sum(col:SNP)) / sum(col:align)) x 100.0

Gaps Percent CDS bases that are Gaps, i.e. (Gaps/Align)%. Stats: (sum(col:gap) / sum(col:align)) x 100.0

SNPs Percent CDS bases that are SNPs, i.e (SNP/Align)%. Stats: (sum(col:SNP) / sum(col:align)) x 100.0

5UTR Diff Percent 5'UTR bases that are different. Stats: Avg(col:%5diff)¹

3UTR Diff Percent 3'UTR bases that are different. Stats: Avg(col:%3diff)¹

Columns: Pos1 Pos2 Pos3 Total

Transition (ts) Percent SNPs that are transitions in each of the 3 codon positions. View values; see²

Transversion (tv) Percent SNPs that are transversion in each of the 3 codon positions. View values; see²

ts/tv The total number of transitions divided the total number of transversions View values; see²

Columns: GC CpG-Nt CpG-Cd
CpG-Nt (nucleotide) and CpG-Cd (codon, CpG does not cross codon boundaries).

Both Percent CDS bases where both (union) sequences have a GC base. Same for the CpG sites. View values; see²

Either Percent CDS bases where either or both (intersection) sequences have a GC base. Same for the CpG sites. View values; see²

Jaccard The total number of 'both' divided by the total number of 'either'. View values; see²

KaKs method:³

KaKs

It is rare for Ka/Ks to be exactly 1, so the following fudge factors are used:

Rule	Fudge factor	Strength
KaKs>1	>= 1.006	positive (driving change)
KaKs=1	>= 0.995 and < 1.006	neutral
KaKs<1	< 0.995	purifying (against change)

Set Pair Filters according to fudge factors on the left.

For NA, Filters: uncheck KaKs and check KaKs=NA

Quartiles Applies to the KaKs values. It uses the method of splitting the list in half; Q1 is the median of the lower half and Q3 is the median of the upper half. Q2: Stats: Median(col:KaKs)

Average

Ka Nonsynonymous substitution rate. Stats: Avg(col:Ka)

Ks Synonymous substitution rate. Stats: Avg(col:Ks)

P-value KaKs p-value. Stats: Avg(col:p-value)

P-value Counts of p-value in 4 ranges. Sort on col:p-value. Round-off error occurs (see Display Decimal Help).

SEQUENCES

The columns are in the Sequence table; use Sequence Filters to select a dataset to view its corresponding results.

Label	Description	Compute
Average Lengths	The ORFs were computed for the sTCWdbs and imported along with their translated sequence.	Stats: Avg(5UTR Len), Avg(3UTR Len), Avg(CDS Len)
%GC	The average percent of GC for 5'UTR, CDS and 3'UTR.	--
CpG O/E	The CpG observed/expected for the 5'UTR, CDS and 3'UTR [(#CpG/(#G#C))Len].	--
Counts	The total raw counts from each dataset. These can be verified from the singleTCWs.
Differential Expression	The total DE from each dataset. These can be verified from the singleTCWs.

¹Percents

Overview percents are computed by summing the numerator and denominator then dividing.
Stats: Avg(col:X): The Table... option Show Column Stats is taking the average of the percentages.
For example, in the mTCW_ex demo (created 20-Mar-22),
- Overall %Exact: sum(exact codons)/sum(total codons) = 57.8%.
- Average of %Exact: sum(%Exact)/number pairs = 59.39%.
The actual counts are only available in the Pairwise... view, as described below².
However, the Aligned and KsKs values can be computed for any Pair Table by selecting the Table... option Show Table Stats.

²Counts associated with percents

The only way to view most counts is through the Pair Table, as follows:

Select Pairwise..., select 5UTR, CDS, 3UTR.
In the alignment panel, select the button specified in the Align column below.
This will pop-up a window where you select the Option as specified below.
The resulting pop-up window will have columns of counts at the top, followed by the text alignment with indications of the requested Options information.
Relevant numbers;
- Calign: The codon length with overhang and gap codons removed.
- CROP: The nucleotide length with the overhang removed.
- SNPs: The number of nucleotide differences ignoring gaps.

Variables	Align	Option	Count
Codon Exact, Non/Synonymous	Align CDS...	Match	These counts are at the top. Divide by Calign.
Fourfold (4d), twofold (2d)	Align CDS...	Degenerate	These counts are at the top. Divide by Calign.
AA Exact, Substitution	Align CDS...	Amino acid	These counts are at the top. Divide by Calign.
ts, tv, ts/tv	Align CDS...	ts/tv	These counts are at the top. Divide by SNPs.
GC, CpG-Nt, CpG-Cd	Align CDS...	CpG	These counts are at the top. Divide by CROP. The CpG percents are 2x since they involve two nucleotides.
5'UTR Diff	Align 5UTR...	---	NT-Diff is at the top. Divide by CROP.
3'UTR Diff	Align 3UTR...	---	NT-Diff is at the top. Divide by CROP.

³ The KaKs_calculator

The KaKs_calculator (Zhang et al. 2006) will typically be used to compute the KaKs values, where the method used is shown on the "KaKs Method" line.