Reproduce sTCW overview
This describes how to obtain the table of results corresponding to statistics
in the overview. The following shorthand is used:
 The "Column:x" indicates that x should be selected for viewing in the table.
 #Seqs is the number of sequences, which is listed at the top of the overview.
 "Stats" is the "Show Column Stats" on the "Table..." dropdown.
Always clear filters before setting new ones!
INPUT
Most of the input section is data supplied by the user with runSingleTCW. The following
are computed:
Counts:
 SIZE  Show All  Column:Counts for all conditions: Stats, column:Sum
 Sequences:
 AVGlen  Show All  Column:Length; Stats, column:Average
 MEDlen  Show All  Column:Length; Stats, column:Median
The median in the two cases may be slightly different
because they are computed differently.

ANNOTATION
Hit Statistics:
Column  Search  Obtain number


Sequences with hits  Filters: Annotated  Number of rows
 Unique hits  AnnoDB Hits: Seq:None(slow)  Hits # above table
 Total sequence hits  AnnoDB Hits: Seq:None(slow)  Pairs # above table
 Bases covered by hit  AnnoDB Hits: Seq:Best Bits  Unselect "Group by Hit ID""; column:Align;
Stats, column:Sum; for NT, multiply by 3
 Total bases
(residues for AA seqs)
 Show All  Column:Length; Stats, column:Sum

AnnoDBs:
Column  Search  Obtain number


ONLY
 Filters: Annotated, Best Bits,
Enter DBtype and Taxonomy
values for AnnoDB,
General <=1 annoDB
 Number of rows
 The following all use AnnoDB Hits panel with the correct ANNODB selected from the AnnoDBs panel.
 BITS  Seq:Best Bits  Seqs # above table
 ANNO  Seq:Best Anno  Seqs # above table
 UNIQUE  Seq:None(slow)  Hits # above table
 TOTAL  Seq:None(slow)  Pairs # above table
 AVG %SIM  Seq:None(slow)  Unselect "Group by Hit ID"; column:%Sim; Stats, column:Average
 Rank=1 is the best hit for a sequence for a given annoDB.
 HAS HIT  Seq:Rank=1  Seqs # above table; percentage of total #Seqs
 AVG %SIM  Seq:Rank=1  Uncheck "Group by Hit"; Column:%Sim; Stats, column:Average
 Cover >=N  Seq:Rank=1,%Sim>=N,%HitCov^{*}>=N
 Seqs # above table; percentage of HAS HIT

^{*}HitCov is the difference between the hit stop and start coordinates divided by the length of the protein.
Top 15 species from total: N
The N is the number of unique species based on the first two words of the
species name. From "AnnoDB Hits":
 Select "Species"", select "Two words"", enter first two words of species name next to "Find", select "Find", select the entry on the
left and add to the right.
 Select "Best Bits", "Best Anno" or "None" for the three numbers shown.
 BUILD TABLE
 Use the number listed beside "Pairs".
Gene Ontology Statistics:
Column  Search  Obtain number


Unique GOs
 GO Annotation: no filters
 Results number
 Unique hits with GOs
 AnnoDB Hits: Seqs:None; GO,etc:Has GO
 Hits number at top of table
 Sequences with GOs
 Filters: Best Hit: Annotated, Best with GO
 Number of rows
 Seq best hit has GOs
 AnnoDB Hits: Seq:Best Bits; GO,etc:Has GO
 Seqs number at top of table
 biological_process
 GO Annotation: Level: biological_process
 Number GOs at top of table
 molecular_function
 GO Annotation: Level: molecular_function
 Number GOs at top of table
 cellular_component
 GO Annotation: Level: cellular_component
 Number GOs at top of table
 is_a, part_of
 GO Annotation: no filters
 Export..., Each GO's parents with relations, grep (see footnote^{*})

^{*} From terminal, 'grep is_a GOeachParents.tsv  wc'. Repeat with
is_a replaced with part_of.
EXPRESSION
The following sections may not exist if the input had no count files or the DE methods
were not executed.
TPM*
 Filter: select Condition under Exclude; set "At Most" to 1.99.
 This will continue using 4.99, 9.99, etc, where the previous results are
subtracted from the current. The results are for intervals >=N to <M.
 Differential Expression
 Filter: select the condition pair, then enter the number, e.g. 1E5.
 These counts are accumulative.
 GO Enrichment:
 GO Annotation; select the condition pair using the Enrich panel; enter the threshold,
e.g. 0.05.
 These counts are accumulative.

* If "RPKM" is at the top of the Overview instead of "TPM", then RPKM was computed instead of TPM.
SEQUENCES
If the sequences (e.g. ESTs) have been assembled, there are statistics on buried,
matepairs, etc. Most of them are columns, so select the column, then use the
"Show column stats" to view the sum. The following will only cover the three sections
that are found in unassembled sequences.
Sequence lengths
 Filters: General: change >= to <=,
enter upper limit into Length
 The counts of the intervals to the left need to be subtracted from the number of rows.
 ORF lengths
 Filter: SNPs and ORFs; enter lower limit into ORF len
 Work from right to left, the counts of the intervals to the right need to be subtracted from the number of rows.
 Quality
 Filters: General: Has Ns: Yes; Column: Ns
 The resulting rows are the #n>0; sort the rows in descending order and scroll down
to the row number that start having #n>10.

ORF Stats:
Column  Search  Obtain number


The following use the Basic Sequence. Select "TCW" and enter
the Substring indicated.
 Is Longest ORF  !Lg  #Seqs minus Result number
 Markov Best Score  !Mk  #Seqs minus Result number
 All of the above  $  Result number
 ORF=Hit  ORF=Hit  Result number
 ORF=Hit with Ends  ORF=Hit+  Result number
 Multiframe  Multiframe  Result number
 Stops in Hit  Stop  Result number
 >=9 Ns in ORF  9N  Result number.
 The following use the Filters: SNPs and ORFs.
 Both Ends  Has Ends (Start&Stop codons)  Number of rows
 Has Hit  Protein confirmation  Number of rows
 ORF>=300  Has ORF >= 300  Number of rows

GC Content:
The only number reproducible is the GC Content, which is the %GC over the entire sequence.
GC Content  Show All  Column:%GC; Stats, column:Average
Note, there will be some slight difference in the number due to roundoff error.

 The "Pos i" column is the percent of G or C in the ith positon of the CDS codons.
 The %GC is the percent of G's and C's over the sequence length.
 The CpGO/E is ratio observed/expected [(#CpG/(#G*#C))*length].
 The UTRs can be viewed in the Sequence Detail alignments, but there is no column for it.
