Label  Description  Compute


DATASETS

This first section describes the data imported from the sTCWdbs.

CLUSTER SETS
 One or more sets of clusters will be present in the mTCW. The columns are in the Cluster table.

Sizes

[Range]
 The range of cluster sizes per cluster set. These may be computed from viewing all
Clusters sorted on the 'Count' column.
 Count

Statistics

conLen  Average consensus length of the cluster  Avg(conLen)^{1}

sdLen  Average standard deviation of the lengths of
the amino acid sequences in the cluster.  Avg(sdLen)^{1}

Score1  The MSA scoring method is shown in the PROCESSING section.
 Avg(Score1)^{1}

SD  Standard deviation of Score1.
 StdDev(Score1)^{1}

Score2  The MSA scoring method is shown in the PROCESSING section.
 Avg(Score2)^{1}

SD  Standard deviation of Score2.
 StdDev(Score2)^{1}

PAIRS
 Similar pairs were identified from the comparing the
sequences using a search program (BLAST or DIAMOND),
where AA is from the amino acid search and NT is from the nucleotide search (NT may not exist).
The columns are in the Pairs table.

Hits (from hit file) The following is the same for NT.

Diff
 The number of hits from different datasets.
 Pair Filter: Has AA, Different sets.

Same
 The number of hits from the same dataset.
 Pair Filter: Has AA, Same sets.

Similarity
 Percent similarity of all alignments
(Average percent similarity of each alignment).
 see^{1b}
((Avg(%AAsim))^{1}

Coverage
 Percent overlap of all alignments
(Average percent alignment of the length of each sequence in the pair).
 see^{1b}
((Avg(%AAcov1)+
Avg(%AAcov2))/2)^{1}

Aligned (aligned using dynamic programming, Filter: Has Stats)

CDS:  The number of aligned CDS bases including gaps but not overhangs.  Sum(Align)^{1}

5UTR:  Same as CDS but for 5UTR.  see^{2}

3UTR:  Same as CDS but for 3UTR.  see^{2}


Codon column

Codons
 Number of aligned codons excluding gaps.
 Sum(Calign)^{1}

Exact
 Percent of the number of codons that are exactly the same.
 %Cexact; see^{1b}

Synonymous
 Percent of the number of codons that are synonymous (different codon, same amino acid).
 %Csyn; see^{1b}

Fourfold
 Percent of the number codons that are fourfold (4d), which are two synonymous codons
where the ith position allows any of the 4 bases.
 %C4d; see^{1b}

Twofold
 Percent of the number of codons that are fourfold (2d), which are two synonymous codons
where the ith position allows any of the 2 bases.
 %C2d; see^{1b}

Nonsynonymous
 Percent of the number of codons that are nonsynonynous (different amino acid).
 %CnonSyn; see^{1b}

Amino acid column (same number as Codons)

Exact
 Percent of the AA characters that are the same.
 %Aexact; see^{1b}

Substitution>0
 Percent of AA characters that are substitutions with BLOSUM62>0.
 %Apos; see^{1b}

Substitution<=0
 Percent of AA characters that are substitution with BLOSUM62<=0.
 %Aneg; see^{1b}

Nucleotides columns
(References to CDS, 5UTR, 3UTR are the numbers by 'Aligned:')

CDS Diff
 Percent of CDS bases that are not the same, i.e. ((Gap+SNP)/Align)*100.0.
 %Cdiff; see^{1b}

Gaps
 Percent of CDS bases that are Gaps, i.e. (Gaps/Align)*100.0
 see^{1b}

SNPs
 Percent of CDS bases that are SNPs, i.e (SNP/Align)*100.0.
 see^{1b}

5UTR Diff
 Percent of 5UTR bases that are different.
 %5diff; see^{1}

3UTR Diff
 Percent of 3UTR bases that are not different.
 %3diff; see^{1}

Columns: Pos1 Pos2 Pos3 Total^{4}

Transition (ts)
 Percent of SNPs that are transitions in each of the three codon positions.
 see^{2}

Transversion (tv)
 Percent of SNPs that are transversion in each of the three codon positions.
 see^{2}

ts/tv
 The total number of transitions divided the total number of transversions.
 ts/tv; see^{2}

Columns: GC CpGNt CpGCd ^{4}

Both
 Percent of CDS bases where both (union) sequences have a GC base.
Same for the CpG sites, where CpGNt (nucleotide) and CpGCd (codon, CpG does not cross codon boundaries).
 see^{3}

Either
 Percent of CDS bases where either or both (intersection) sequences have a GC base.
Same for the CpG sites, where CpGNt (nucleotide) and CpGCd (codon, CpG does not cross codon boundaries).
 see^{3}

Jaccard
 The total number of 'both' divided by the total number of 'either'.
There is a pair column for the GC and codonbased CpGCd, but not the nucleotidebase CpGNt.
 GCJI,
CpGJI; see^{3}

KaKs method^{5}

Ka/Ks  Selective strength. A score <1 is purifying, >1 is positive,
and significantly >1 implies that at least some of the mutations are probably advantageous.
It is rare for KaKs to be exactly 1, so the following rules are used:
If KaKs >= 0.995 and KaKs < 1.006, then KaKs~1
If KaKs >= 1.006, then KaKs > 1
If KaKs < 0.995, then KaKs < 1
 Set Filter according to rules on the left.

Quartiles  Of the KaKs values. It uses the method of splitting the list in half;
Q1 is the median of the lower half and Q3 is the median of the upper
half.
 Q2=Median(KaKs)^{1}

Average

Ka  Nonsynonymous substitution rate  Avg(Ka)^{1}

Ks  Synonymous substitution rate  Avg(Ks)^{1}

Pvalue  KaKs pvalue  Avg(pvalue)^{1}

Pvalue  The 'other' count includes pairs where the KaKs_calculator did not provide
a pvalue though did provide a KaKs value.  pvalue

SEQUENCES

The columns are in the Sequence table; use Sequence Filters to select a dataset
to view its corresponding results.

Average Lengths
 The ORFs were computed for the sTCWdbs and imported along with their translated sequence.
 5UTR Len, 3UTR Len, CDS Len; see^{1b}

%GC
 The average percent of GC for 5'UTR, CDS and 3'UTR.
The only %GC column is for the entire sequence.
 

CpG O/E
 The CpG observed/expected for the 5'UTR, CDS and 3'UTR [(#CpG/(#G*#C))*Len].
 5UTR CpG, CDS CpG, 3UTR CpG; see^{1b}

Counts
 The total raw counts from each dataset. These can be verified from the singleTCWs.
The TPM values are in the sequence table, but not the counts, though these can be viewed
in the Sequence Detail panel.

Differential Expression
 The total DE from each dataset. These can be verified from the singleTCWs.
