The cluster score for a Sum-of-Pairs has been changed to the sum-of-comparisons/#comparisons, where the #comparisons = (nSeqs*(nSeqs-1)/2) * nCols
Fixed a potential bug: if the MSA consensus sequence was very long, a MySQL error would occur.

A few improvements and tiny little bug fixes.

Fixed a potential bug: the pre-computed MSAdb would not display if score1<0.

runSingleTCW

Description prune update: (1) Was taking the one with the most GOs, even if the bitScore was less; now only takes the one with GOs if the other has none and bitscores are close. (2) If there was a "{...}" at end of description, it was not being removed before finding unique descriptions.

Scripts

Added/changed a few scripts that are used for results in the next version of the BioRxiv publication

v4.0.0 22-Mar-2022

General

Documentation and Help updates - mostly for mTCW.

runAS

The Check function will write to the terminal the UniProts added to an existing goDB. For existing goDBs, they will be updated on first Check and stored in the goDB.

viewMultiTCW

Explain on Overview: rewrote much of it for clarity.
A lot of little tidy up things. (1) Overview: removed "(Avg %Sim)" on AA and NT pairs. Slight change to KaKs counts. (2) Pair Filter: If KaKs set to be at least >=0.0, got Kaks with no values or NA. If pairs&clusters removed, pairs added but no clusters, the pairs filter crashed. (3) Sequence Table: check to see if it is !NTonly database, if so, do not give options of Export CDS/NT sequence. If !GO, do not give option of exports GOs.
Details: Add Copy... option to copy the displayed Detail sequence.

v3.4.2 8-Mar-2022

General

Made all button colors consistent (see Colors ).
Put all Help buttons on the upper right corner.
Updated all documentation snapshots.

runSingleTCW:

ORF Finder
- A few adjustments to the heuristics for choosing largest ORF versus best Markov score when there is no good hit.
- There will be less ORF=Hit and more ORF>Hit; this is because if a hit does not have a Start or Stop codon, more ORFs are being extended then previously.
- Previously, if a hit ended at a Start codon but did not have a Stop, it would look upstream for the furthest 5' Start. Now it uses the Start at the 5' end of the Hit.
- A few changes to the ORF summary in anno.log.
Fixed slight round-off error for Overview ANNODB AVG %SIM.
To regenerate overview, execute ./viewSingleTCW <project name> -o.

viewSingleTCW:

Overview: Improved the Reproduce instructions.
Seq Table: If the Markov Score column was displayed, the Show Column Stats gave all 0's for its statistics.
Seq Details: (1) Added Help for the Details panel. (2) The file append was not working.
Basic AnnoDB Hits: The %HitCov value was sometimes off by 1.
Basic GO Annotation: The default minumum level to show was >= 1. Level 0 is obsolete terms, which do occur, so the minimum level was changed to 0.

runMultiTCW:

Remove '--sensitive' from the DIAMOND default parameters.
This makes a slight difference in the orthoMCL results, so projcmp/ex/orthoMCL.OM-4 is updated.
The Remove "Clusters and Pairs" is faster.
For an AA-mTCW, the pairs AAlen1 and AAlen2 were getting length of 0, and now is the correct length.
The MSA redo had quite working in v3.1.3 and has been fixed.

viewMultiTCW:

If a sequence had no hits, an error was written to mTCW.error.log (but it worked).

v3.4.1 22-Jan-2022

viewMultiTCW

List Results: Multiple rows can be selected for removal.
Cluster Table: The MSA... option for aligning the AA sequences of the cluster includes the best hit.
Hit Table: Has a new option for Pairwise... that aligns all nSeq sequences against each other, of the row HitID against all nSeq sequences (where nSeq is the number of sequences with the hit).
Sequence Table: Has a new option MSA... that allows sequences from different clusters to be aligned together.
Sequence Details: Has a new option Pairwise... that aligns the detailed sequence to one of its sequence pairs or hit sequence.
All tabs on the left and summaries have beens standardized.

v3.4.0 14-Jan-2022

Colors: the various buttons were color coded on Linux, but not on Mac; moreover, their coloring was inconsistent. Now various buttons are color-coded on Linux and Mac, and represent the following:

beige	runs an algorithm
light green	replaces the current panel
rose	help
lavender	new window or popup
light gray	request for input file
light blue	tab on left of viewSingle and viewMulti.

runMultiTCW

Loading a project with an existing mTCWdb is much faster.
Change to Overview to put Methods table in Processing section.
Best Hit: made the output to the terminal and Help more obvious in regard to the algorithm.

viewMultiTCW

The pairwise and MSA alignments have been moved from the sequence table: (1) the Pairs table has pairwise alignments and the (2) Cluster table has MSA alignments.
The alignments create a new tab on the left underneath the respective table.
All tables created from selecting one row from another table have Prev/Next buttons. The summary of displaying these row tables has been improved and a few little bugs fixed.
Run MSA on existing cluster: (1) Now shows the correct scores. (2) Shows the type of alignment (e.g. MAFFT CDS).

runSingleTCW

Overview: When writing to file, the filename is the database name instead of the sTCWid (the sTCWid is not necessarily unique).

Software: (1) All alignment routines have been moved from seq.align to align directory. A new file called AlignButton.java has the code for the alignment buttons. (2) All color codes have been consolidated in the methods/Static.java file.

v3.3.9 27-Dec-2021

Share: All interfaces that use Blast and Diamond have access to a Help page of parameters.

Demo: The demo has been updated to Dec 2021, and updated as described in #3 below.

runAS

For the Full SwissProt and TrEMBL, there is now the option of creating the different subsets of the downloaded taxonomic databases and the complete SwissProt and/or TrEMBL. See Full subsets for an explanation.
Both adding UniProts for the runAS Build GOs and runSingleTCW Annotate is updated to better handle duplicate SwissProt hits (which can happen if the full SwissProt is used along with taxonomic databases).
The demo sp_fullSubset is renamed to sp_full, which contains the files uniprot_sprot.dat, uniprot_sprot_xBFxIxPxxx.fasta that has the bacteria, fungus, invertebrate and plant SwissProt entries removed, and uniprot_sprot_xxxxIxxxxx.fasta that has the invertebrate only removed.

v3.3.8 12-Dec-2021

runSingleTCW

Annotate:
1. Loading FASTA files:
  - Speedup - cut the time in half on Linux.
  - On Mac, was running out of memory loading TrEMBL plants, which has been fixed.
  - UniProt FASTA headers: Changed the rules for parsing FASTA headers to better fit UniProt 2018 descriptions.
2. Uninformative descriptions: The rules for computing uninformative descriptions have slightly changed. Though the rules should strictly adhere to the "International Protein Nomenclature Guidelines" bullet item "where no domain or motif is observed", many other uninformative descriptions seemed to be entered. The rules can be changed in the file "util/methods/BestAnno.java".
3. Unique Description (displayed in Seq Detail): Slight changes.

viewSingleTCW

Basic AnnoDB Hits: The Add has been significantly speeded up.
Basic AnnoDB Hits and Sequences: The Delete selected and Keep selected were not displaying the table correctly (though corrected itself on any sort, etc).

TCW package

The "/doc" directory has been made into a tar file with instructions in the README on how to use it.

v3.3.7 30-Nov-2021

viewSingleTCW - minor changes

Improved some of the text of the Help popups.
Basic Filters: buttons are made inactive when they cannot be used - this was not done consistently, but now is.
Basic GO: The "ADD" function would add duplicate GOs.
Seq Details: (1) Change to only highlight top drop-down buttons if an item is selected. (2) Speeded up Seq hits - inherited.

v3.3.6 22-Nov-2021

This release has improvements to the Basic Filters (Seq, Hit, and GO).

viewSingleTCW

All Basic
- Help has been split into a single Help.. drop-down.
- New They all have a new feature that highlights the selected rows in the table.
- The information on the status line above the tables is consistent and more informative.
Basic GO:
- New Add Select related in table, Select ancestors in table, Select descendants in table to the Show button.
- New Add Select terminal terms for the Table... button; this selects all rows that do not have descendants in the table.
- Select query was implemented, which executes the query and selects the intersecting rows.
- Add Row number column.
Basic Sequences:
- The SELECT ROWS feature has been moved to the bottom set of buttons and called Select Query.
Minor fixes: (1) Use "bit-score" everywhere in Help. Otherwise use bit-score. It was different everywhere.
(2) Basic GO: Was not disabling "#Seq" on Enrich line when Search was selected. Was not checking for a valid p-value. Was showing a level 0 (obsolete) when the range started at 1. Exports did not work right if GOid was not displayed.

The main change from this release is a feature to allow log2FC analysis besides FC.

v3.3.5 4-Nov-2021

runSingleTCW

Changes to ORF finder: A few small changes to the output to anno.log to make it clearer.

Sequence Table: Filter and Column:
- The N-fold column and filters are given the option of displaying and filtering on the log2FC or just the FC.
- Bug fix: Differential Expression: The "Any" for "Up" would only show up-regulated for all selected DE columns that had at least one <p-value; that restriction has been removed. Same for "Down".
Basic Hits: Seq Table, Delete Selected, Keep Selected for "Group by Hit ID"":
- If there were many seq-hit pairs, these functions could take a very long time - its faster now, but can still take a long time (a popup will advice the user now).
- Tiny rare bugs: (1) The Clear All did not work if there were no counts for the sequences. (2) The hit 'best align' value on rare occasions was wrong.
Seq Detail: - minor change
- For Align Best Hits, it now aligns all best hits even if they are not displayed in the hits table (e.g. if there is a different Best Bits and Best Anno, and Distinct Regions is displayed, one of the bests may not be in the table, so it would not be aligned).

This release is improvements for the ORF finder.

v3.3.4 18-Oct-2021

runSingleTCW

Changes to ORF finder:
- Algorithm: (the changes are fine-tuning results, no major changes)
  - For Markov training: it was using all sequences for training. Now it uses the longest N sequences (default 2000), after removing similar sequences. N can be changed by executing execAnno with the -t option.
  - Sort: (1) Instead of comparing the Markov scores, it tests against (Abs(log(score1)-log(score2)>0.3), where negative Markov scores are -(log(-score1)); (2) If the length and Markov scores are similar, a 4th rule checks for ends (Start & Stop codon).
  - With Hit: If the hit ends at start/stop codons, always use those coordinates. Otherwise, it finds all possible ORFs (including Stop to Stop), and sorts for the best ORF.
  - Stop codons: If there was stop codons in the hit, it was not always finding the best coordinates - now it does.
  - No hit: ORFs that are Stop to Stop may now be considered if the Stop is far enough from the last Start.
  - N's in sequence: It use to try to avoid N's, now it does not. However, it does remove them from the length before taking the log for comparing lengths.
  - Minimal sequences for Markov training: The default was 50, which is way too low. It is now 500.
- The selected ORFs that do not have both Start and/or Stop will have a remark.
- The output files are now sorted by SeqID.
- Previously, allGoodORFs.pep.fa + bestORFs.pep.fa provided all candidate ORFs; now all candidate ORFs are in allGoodORFs.pep.fa.

viewSingleTCW - tiny fixes

Load File for all 3 Basic filters: only the first word per line will be read as a SeqID, OrigID, HitID or GOID. This allows files to be used that have other information on each line.
- Basic GO annotations: Add "#" before column headings so an exported file can later be read in (The other Exports already do this).
New Sequence Table has new Export option to output the columns of the selected row.
Basic Sequence:
- New Select one sequence from the table followed by Seq Detail to see the Sequence Detail panel. This is in contrast to the Seq Table ,which results the sequences being shown in the Sequence Table.
- New The result of a search will SELECT ROWS from the existing table.
Results: This panel now shows all Sequence Detail labels from the left panel so that all results can easily be removed.
Changed a few labels, e.g. the "View Seqs" label to "Seq Table", and "View Selected Sequence" to "Seq Detail"

Bug fixes:

ORF finder: Sequence of length 0 crashed the ORF Finder.
Basic GO annotations: the #Seqs quit showing the number of DE seqs correctly (bug from v3.3.3).
Filter ORF Frame: Only worked for positive frames.

Other

demoTra: add N's to a few of the sequences.

v3.3.3 25-Sept-2021

Exports now write the correct filename of output to the terminal.
Sequence Detail - Frame: The Y-axis coordinate has been changed so that it can be added to the X-axis coordinate to get the last base of the respective codon.
Sequence Detail - Align:
- The highlight UTR would incorrectly extend over a hit overhang.
- The highlight HIT included one extra AA.
- Trim showed an extra AA on the 3' end.
- When the "Hit" was highlighted for a negative frame, the coordinates were often off by 1.
Main Table "Export GOs from Table" did not work if there were no GOseq values.
Basic GO: The "Show" was recently broken, and has been fixed.

v3.3.2 15-Sept-2021

runSingleTCW

Various features use "Rank=1", which was not being updated when pruning was applied.
"Remove Annotation" was not clearing the GOseq values totally.

Pair Alignment: if multiple pairs are shown, the alignment will start in the same place across all alignments.
Verified all "Reproduce" information from "Overview" and improved the description.
AnnoDB Hits bug fix from v3.3.0: Filtering on "%HitCov" stopped working.

The singleTCW database has a small schema update that will be applied the first time an existing database is viewed.

v3.3.1 1-Sept-2021

runSingleTCW -- Annotate

sTCWdb version sdb6.0: The percent similarity (identity) is stored as a real number instead of an integer.
NEW Prune Hits - there are many hits with the exact same coordinates and/or descriptions. A new function removes all but the best based on same alignment values or same description. This function can be set to run in the AnnoDBs Options or from the command line.
Tiny changes;
- DIAMOND - removed '--max-hsps 1' as TCW defaults because it is a DIAMOND default. Removed '--top 20' as this misses some good descriptions.
- TCW was only loading 25 hits per annoDB. This restriction has been removed since the user can set the limit in the search programs parameters.
- This really isn't a problem, but I fixed it any way as it could be confusing: If there were multiple hits with the same bit_score and E-value, it was using the one with the highest rank for the "Best Bit" assignment, which was changed to lowest.
- Tiny bug fix in Multi-Frame: If a sequence had an NT hit, it was typically incorrectly marked as 'Multi-frame'.

viewSingleTCW - tiny adjustments

Sequence Detail:
- A new Show button shows all columns for a selected hit (the Hit Table only provides a subset of all columns).
- The Hit Table now sorts on bit-score, e-value, %sim in that order (the addition of %sim is new).
Basic Hit:
- The Bit-score has been added to the columns.
- DE values of "-" will sort to the bottom of the table.
A reading frame was being assigned to an NT hit -- it now has a value of "-".
Tiny bug fix in alignments: In rare instance, aligning an NT-NT following by a NT-AA could create a bogus NT-AA alignment.

Sequence Table: DE values of "-" will sort to the bottom of the table and the sort will ignore the minus sign before an DE value (as viewSingleTCW does).

Software details

The Best Bits, etc assignment was moved from DoUniProt.java to DoUniAssign.java

Update existing multiTCW databases for this release: There is a small mTCWdb schema change, which will be updated the first time you access the database. However, the pairs table NTbest column will not have values unless you reload the AA/NT hit files.

v3.3.0 15-August-2021

Both DIAMOND and BLAST have been updated to their latest release (v2.0.11.149 and 2.12.0+, respectively). A slight change to the TCW was necessary for the latest DIAMOND and there is better error checking for the Find Hits feature.
The TCW DIAMOND defaults have changed to the following:
- Sequences against annoDB: --max-hsps 1 --masking 0 --top 20 (changed again in v3.3.1 1-Sep-21)
- Self-search: --max-hsps 1 --masking 0 --sensitive --query-cover 25 --subject-cover 25
All display of numbers use the Display Decimal settings except for the Overviews.

The DE value of 3.0 meant it was not computed; now this will display as an "-" instead of 3.0.
The Basic GO columns are in a more logical order.

Database change:
- mdb6.5: Add NTbest pairs columns. There has been a AAbest column, where a '2' indicates the pairs are the bi-directional best hits; only these are considered for the BBH-AA computation; having this column means that if the pair fails the BBH parameters, a second best pair cannot be used. This was not being done for the BBH-NT computation, and now is. The AAbest was also used in the AA Cluster algorithm, and now the NTbest is used for the NT Cluster algorithm.
- If the user chooses to not update the database, it now will continue even though it will probably fail if any action performed on it except checking the overview.
Computing pair alignments has a memory reduction, and hence, speedup.
Assign Majority Hit to Cluster: The algorithm has a slight change where it will now only assign a 'Best Anno' from the sequences in the cluster (it was including Best Bit and Best GO). The documentation on this has been improved.

For all filters on <=N and >M, if both N and M are 0, then the search is "=0".
The recently added KaKs=NA is now a checkbox to include or not include NA (it was yes/no/either).
When sorting a text column, blanks values would always sorted to the bottom; now they are sorted like any other value.
Added pair table 'NTbest' column.

Minor

runMultiTCW (1) Always record whether the BBH or Closure clustering is AA or NT. (2) Tiny bug fix: When removing clusters only, it was not resetting the flag that told viewMultiTCW that the pre-computed MSAs did not exist (anymore).
viewSingleTCW Overview includes the runDE filtering options under PROCESSING INFORMATION.

This release has improved Decimal Display and multiple tiny improvements.

v3.2.7 20-July-2021

Decimal Display

This existed for viewSingleTCW and has been added to viewMultiTCW, though the latter does not have the color coding.
A third parameter has been added for the leading digits for E-notation, i.e. 1.2E-02 versus 1E-02.

runSingleTCW - all minor

Overview: slight changes to the AnnoDB and ORF section.
Compute ORFs:
- The Default button resets to "Train with Best Hits"". If "Train with CDS file" is selected but no file entered, an error message will be written.

Basic Sequence and AnnoDB Hits: The search rules have changed so that it is always searches on "contained" unless the user adds an "%".
Sequence Detail (Bug Fix): The "Best Bits" hit should be listed first, but in some situation it was not, so the wrong hit was shown on the "Frame".
Very minor stuff:
- Filter: only show User Remark option if there are user remarks.
- Basic Sequence: The "Orig ID" has been moved so it follows the "Seq ID".
- GO Basic GO ID: Failed if the substring started with spaces.
- Moved "Results" under "General".

mTCW db v64: The OrigID (original ID) has been added to the sequence records .
There is better error checking on loading User Defined file.
Stats: Percentages with a numerator of 0 were being saved as "-" when the numerator was 0; they are now saved as 0.
The Overview .html file has centered contents.

There is a small schema update (dbVer 5.9) for singleTCW; the first time a sTCWdb is accessed, it will be updated.

KaKs will use NA instead of "--" to better distinguish it from "-", where NA is computed but gives an NA result when that is the KaKs_calculator result; the "-" indicates it was not computed because the alignment was too short.
View OrigID for sequence table and details.
List of cluster methods for filters and columns will be in order of creation, i.e. the same as shown on the overview.

Software Details

mTCW db v64: the OrigID was added to the sequence table and hasOrig to the info table.

v3.2.6 6-July-21

Dynamic Programming - used by both singleTCW and multiTCW

The Gap Open parameter has been changed from 4 to 7, which reduces gaps but increases mismatches.

goSeq: runDE has the option to run this script on data written to R by runDE. Sometimes, its not clear why there is more or less enrichment p-values < 0.05; New a summary of statistics is written to help give some indication.

Multi Align: Use built-in DP to create Multi-alignment if the cluster is size 2; this greatly speeds up this step.
New Remove...: Add Remove clusters from database.
Bug fix - Remove cluster: If a cluster set was removed after Multi Stats was run, then new clusters added, it did not work.

runDE

New Remove: The GO enrichment columns can be removed.

New The output to the terminal is appended to the file projects/RunDE/<dbName>.log.

DE for All Pairs: This option will not overwrite existing columns of the same name; the other three DE options ask the user if they want to over-write an existing column.

GO Enrichment: If executed on "All DE p-values", it will not overwrite an existing column. If executed on a selected DE p-value, it will overwrite an existing column.

The DE and GO p-value columns will be displayed in viewSingleTCW in the order that the DE p-value columns are created (only guaranteed if created with v3.2.6 or later).

The online DE User Manual was significantly updated.

runSingleTCW - Overview

AnnoDB: Stores annoDB stats so that it does not keep recomputing every time there is a change.
ORF Finder: Slight changes in Overview statistics, the anno.log stats removed, and the only Markov remark assigned to a sequence is if its not the best over all 6-frame ORFs.
GO: Save the number of sequences that have a hit with the GO directly assigned.

viewSingleTCW

Basic GO:
- New Export option to output p-value columns as -log10(p-value).
- New Column #Assign, i.e. number of sequences with hits that directly have the GO assigned. This is useless for any kind of analysis, but is good for providing intuition and understanding of the complex Hit-Seq-GO assigned-inherited relations.
Pair alignment: The bit score was added to the header.
Pair text alignment: Local and Affine Gap - changed from Blosum50 to Blosum62, which is what the semi-global (default) algorithm uses.
BUG FIX - Basic Sequence: If the database was built using the original sequence names, the Search did not work.

Go to Top

v3.2.5 4-June-21

This release make fixes and small changes to runMultiTCW.

Search Settings
- Added Defaults button which set everything back to default.
- Tiny bugs:
  - This got the wrong blast file in the following scenario: (1) Load an existing project. (2) Create a new project. (3) Build Database. (4) Search - used the initial existing project files.
  - If loaded existing project, then created new project, it used existing projects search parameters.
  - If search parameters where set to blank, they showed as defaults on the Settings panel on a new loading of the mTCW.cfg file.
Methods
- OrthoMCL
  - This did not work with MariaDB v10.4.12.
    This has been fixed by editing the orthoLoadBlast script to set the MySQL local_infile from within the script.
  - TCW recently changed to using ".fa" suffixes for FASTA files, where OrthoMCL requires ".fasta" suffixes.
    This has been fixed by editing the orthoBlastParser to accept ".fa" suffixes.
- Tiny bug: Non-default parameters were not used if the following order was followed: A method was added for a project with non-default parameters, exit runMultiTCW, restart runMultiTCW, then Add New Clusters.
Added a projcmp/ex directory with the results of running OrthoMCL within TCW on the three 'ex' demos; this can be added as an "User Defined" method.

v3.2.4 16-May-21

This release has improvements to viewSingleTCW for exploring GOs.

Basic GO, Basic Hit, Sequence Detail GO Panel
- These three panels now have separate Show and Export buttons. There are various new Show.. and Export... options.
- Many of the Export options allow the output to be All info or IDs only. The IDs only is very useful for then exploring the IDs in their respective Basic panel using the Load File.
- Terminology, file names and displays have been made systematic.
- The Basic GO option Hits with inherited GO listed had duplicate hits.
Display Decimal
- All parameters are saved between sessions.
- A Set Defaults button returns all parameters to their defaults.

The approach for GO evidence codes has been changed.

v3.2.3 9-May-21

For existing sTCWdbs, there is a small schema update and its necessary to redo the runSingleTCW GO only options.
viewSingleTCW Basic GO
- The columns and filters for evidence code now use the six GO-defined evidence categories instead of the 27 individual evidence codes.
- The evidence category columns can be shown in a long or short format, where the long shows all evidence codes that were found in the category. The short format just shows a 'Yes' that it has EvCs in the category.

v3.2.2 4-May-21

Terminology changes:
- GO: Domain => Ontology, the corresponding column heading is GO
- GO: Ontology abbreviations bio=>BP, cel=>CC, mol=>MF
- GO: DE => Enrich
- Evidence code: EC => EvC
runDE
- The GOseq results are multiple hypothesis testing corrected (see goSeqBH.R).
- The results are in 'oResults' instead of 'results'.
viewSingleTCW
- Decimal Display
  - There is a new option to highlight p-values that are less than a given amount. See this panel and its Help for more information
Bugs
- On the Main table, if the Row # column is moved in the table, the Copy Table and Export table of columns did not work right.
- On Basic Hit table, if there were no GOs, the DE P-values could not be viewed in the Seq-Hit mode.
- Recent tiny bug on Basic GO Annotation: For the DE #Seqs, it was suppose to be reading the original DE cutoff from the database, which was not working.
Software Details
- All terminology changes are in Globalx and the GO abbreviation mapping is Static.goTermMap(). BasicGOTablePanel and MainTable were querying for p-value columns, which now gets it from Metadata.
- All Exports files have been rechecked - no problems found.

Update existing sTCW databases for this release: There is a small sTCWdb schema change, which will be updated the first time you access the database.

v3.2.1 19-Apr-21

runDE
- The built-in GOseq has been changed to an R-script. This allows the user to make changes to the script or provide their own method.
- NOTE: The goSeq results are exactly the same as before, which are not multiple hypothesis tested. This has been made clear in the documentation, and now the user can alter the goSeq.R script to add the test.
- Some changes to the interface to make it more obvious.
Software Details
- The GO metadata has been moved from assem_msg.goDE column to two new columns in table libraryDE. This required changes in Schema, MetaData, Overview, runDE.QRprocess, annotator.DoGO

v3.2.0 29-Mar-21

Existing GOdb and sTCWdb: If you use GO evidence codes or EC (enzyme code), you will want to recreate the GOdb (i.e. runAS) and re-run GO Only from runSingleTCW.

runAS

Parsing go-basic.obo:
- As a sanity check, all UniProt GO are checked for existence in the go-basic.obo file.
- Was not saving the last GO.
Parsing UniProt:
- Only the last EC (enzyme code) was being saved; now all ECs under "RecName" are saved.
  Also, the text after the EC code is removed, e.g. "2.4.1.- {ECO:0000256|RuleBase:RU362057}" is "2.4.1.-".
- The GO evidence code "IC" was being stored as "UNK".
- A count of the number of obsolete GOs in an UniProt file is printed to the terminal.
  The obsolete GO is still in the GOdb and the obsolete GO in sTCWdb will have a prefix of "obsolete" and no neighborhood.

runSingleTCW

Evidence Codes: Bug - The evidence codes were wrong (I don't know what release broke this).
This has been fixed and only the evidence codes from the UniProt hits with the GO assigned will be shown.

viewSingleTCW - Basic GO

The interface has been updated to indicate that only assigned Evidence codes are used.
Slight changes to Show... to make the 'alt_id' (replacements) more obvious.
Changed some terminology to be compatible with AmiGO, e.g. "GO term" changed "GO ID".

Updated the runAS documentation to describe parsing the OBO file.

TCW Version 3.1

v3.1.9 25-Mar-21

Overview

Rearranged Overview.
Added some GO statistics

Added item under Table... called Each GO's parents with relation which produces a popup or file with output like:

Basic GO:

Show... Neighborhood list with relations has the added relations of replaced_by and replaces.

--------->      GO:0000019      bio     regulation of mitotic recombination
is_a            GO:0000018      bio     regulation of DNA recombination
--------->      GO:0000027      bio     ribosomal large subunit assembly
is_a            GO:0022618      bio     ribonucleoprotein complex assembly
part_of         GO:0042255      bio     ribosome assembly
part_of         GO:0042273      bio     ribosomal large subunit biogenesis

The GO Help has been updated.

Sequence Detail: A few small problems with GO, e.g. the number of "Unique GOs" was always 0.

On-line TCW Tour:

A new GO Help page, see GO:0000794

runAS - building the GOdb mysql database of GOs and UniProts.

v3.1.8 19-Mar-21

The GOs were obtained from the downloaded go_<date>-termdb-tables.tar.gz, which has been discontinued. Now runAS downloads the go-basic.obo file and builds the GOdb from scratch.
The GO Trim function is disabled (probably permanently).
Previous GOdbs - should still work except that for the GO_slims, which will be ignored.

Documentation

The Web documentation all use the same styles.
The Java Help pages for singleTCW all use the same style (the multiTCW Help pages will be updated later).

This release concentrates on improving the multiTCW MSA scores and viewing alignments.

Basic GO - Show... - The Neighbor List
- The list is sorted by description to match GO Amigo display.
- The "replaced_by" relation is now shown in this list.

Software Details
- The Web documentation use HTML5 commands that passes BBedit tests, though it does not process "<!--#include virtual=" statements. It also passes HTML validator, except that it complains about formatted words in columns or bullet lists; e.g. BBedit is formatted within a bullet list). This is used extensively and seems to be allowed by all major web browsers.
- The Java HTML renderer seems to only partially respect <style> commands, so it is a mix of HTML5 and HTML4. It has been tested on various MAC Java installation and Linux Java 8.1.

v3.1.7 21-Feb-21

Update existing sTCW databases for this release: it is a good idea to reload the hits to make the database consistent with the interface (e.g. Best Bitscore has replace Best Eval).

runSingleTCW

Best and Rank:

Best Bits	The Best Eval (best e-value) has been replaced with Best Bits. This is much better than using the e-value, which depends on the database size.
Best Anno	(1) It used to be the case that the e-value of the candidate best anno hit had to be reasonable close to the Best Eval e-value. It no longer checks the e-value or bitscore. (2) The rules for determining an un-informative hit have been slightly changed, e.g. "unnamed protein product" has been added to the un-informative list, which is found in nr.gz.
Best with GO	Is now the highest bitscore hit with GOs.
Rank	Is assigned based on the hit list for an given sequence and annoDB being sorted on the bitscore, then further sorted on the e-value when the bitscore is tied. It was the case that TCW used the input order, which where the e-value takes precedence over the bitscore.

hitWarnings.log: Only problems with hits that are found in the tab file are recorded to this file (use to be for all entries). The demoTra example TwoOsSeqs.fa sequence file has hits to NR entries that are too long and have been truncated, hence, written to the hitWarnings.log file.

viewSingleTCW
- Sequence Filter: (1) Filter has been added for bitscore. (2) Filter has been added on Best Eval != Best Anno and GO!=Bits&GO!=Anno.
- Sequence Detail: (1) In the hits table, the bitscore has been moved before the e-value to make it obvious that the sort is on the bitscore before the e-value. (2) A Copy.. Selected Hit Description has been added.
Demo files
- UniProt_demo has been updated to Jan-2021 along with the associated GO file.
- Other_demo has been added, specifically for demoTra. It has examples of NCBI protein nr.gz, PlantTFDB-all_TF_pep.fas, and NCBI RNA Sorghum bicolor sequences.
- demoTra - A README explains the various input files, as this project has examples of most input files.
/scripts
- A new python script called formatNCBIrna.py that formats the header lines of a NCBI RNA file for use with TCW.

v3.1.6 11-Feb-21

File Chooser
- Only shows files with the expected file extension. However, the File Format: drop-down can be changed to "All Files" in order to select any file.
- Always goes to the default directory. If the user changes to a different directory, the File Chooser will use that directory the next time it is used for the session.
- When any file is read, its basic file structure is verified.
Little fixes
- runMulti: (1) Removed obsolete parameters. (2) The 'blastn' defaults had an extra incorrect parameter (introduced in v3.1.6). (3) The Search "Setting" button now stays active even after search is done (so can see parameters).
- viewMulti: Seq Detail: if the GOs were displayed, and then Next was used, the same GOs for the previous page were displayed.
Software
- All methods (both for single and multi TCW) now use the File Chooser in the "util.file" class, with the exception of the runAS and sTCW "Count file generator".

v3.1.5 28-Jan-21

Compressed files allowed (i.e. suffix .gz)
- runAS - reading .fasta files and .dat files
- runSingleTCW - reading .fasta files, .qual files and expression files
- All File Choosers popups that restrict selected files to FASTA file now allows the ".gz" suffix on it. The accepted suffixes are "fa","fasta","fna", "ffn", "faa", "frn","fa.gz","fasta.gz","fna.gz", "ffn.gz", "faa.gz", "frn.gz".
runAS
1. The AnnoDB.cfg button (previous TCW.anno) now creates a file with the suffix ".cfg".
2. File sizes are included on the trace output.
runSingleTCW:
1. The File Chooser for "Import AnnoDBs" restricts files to those with the ".cfg" suffix, which works for the AnnoDB_UniProt_<date>.cfg and sTCW.cfg files.

v3.1.4 20-Jan-21

This release deals mainly with the singleTCW "Similar Pairs" option and updating with the latest DIAMOND.

Changes that will effect existing sTCWdbs:

Schema changes for sTCWdb, which will be updated the first time the sTCWdb is used.
The "Similarity Pairs" .tab file names have changes; old ones will not be recognized.

Additions and modifications:

For both singleTCW and multiTCW
- DIAMOND
  - DIAMOND has been updated from v0.9.22.123 to h4.0.6.144.
  - I had optimized parameter on the first DIAMOND release, which are now counter-productive. The DIAMOND defaults are now used except for "--max-hsps 1" since TCW only uses the first HSP.
- View
  - The Export methods all behave the same now.
  - The ruler for the pairwise alignment "Line" view has more precision for N:1 displays.
  - Find Hits: The File Chooser specifies file type extensions ".fa" and ".fasta".
singleTCW database
- There is a schema change for the sTCWdb (release v5.4). All changes are for "Similar Pair" processing.
runSingleTCW
- Similar Pairs
  - NEW option to compare translated ORFs for NT-sTCW and AA sequences for AA-sTCW, where either DIAMOND or BLAST can be used.
  - The interface no longer allows specifying an existing blast tab file, but there are instructions on how to provide one.
  - The sTCW.cfg keywords have been changed along with the file names.
- Reduced ORF finding and Overview computation output to terminal. The output for "Annotation" is clearer.
- Changing search parameters did not always work as expected; this has been fixed.
- AnnoDB Panel has a new "Reset to Default".
- The File Chooser specifies file type extensions ".tab", ".fa" and ".fasta".
viewSingleTCW
- Align
  - NEW The "Trim" function has been added to show only the aligned regions.
  - On the "Line" view, the arrow at the end of the line reflects the orientation.
  - For NT alignments, "R" is after the sequence name if it is reverse complemented.
- Similar Pairs
  - NEW There is a new column of "Hit Type" indicating whether there was a NT (blastn), AA (tblastX) or ORF (translated ORFs). There is a "Pair Filter" to search on this column. This is only relevant for NT-sTCW.
  - There are new "Copy..." options for the sequence or reverse complement.
- SeqDetail Frame: The ORFs are listed in order by frame (3,2,1,-1,-2,-3).
- Tiny bug
  - For Assembled sequences: if a gap was added to the consensus, it would in certain circumstances result in the incorrect frame for a protein hit.
  - Export would sometimes hang, which has been fixed.
Software
- All Exports methods are in sng.util.ExportFile
- The AlignBasePanel was split into PairBasePanel and ContigBasePanel

v3.1.3 30-Dec-20

This release continues to improve on the multiTCW MSA scores and viewing alignments.

runMultiTCW Align
- The score1 and score2 column values are stored in the database to be displayed for a MSA.
- Min-Max normalization is applied to the Sum-of-pairs scores so that they are within 0-1.
viewMultiTCW
- Align
  - For blosum=0, a different color is used (it was the same color as <0).
  - All align panels have a "Trim" option, which removes hanging sequence.
  - MSA
    - Header: The description and global score is shown above the alignment
    - In the Sequence view, clicking on an AA character will show the scores and composition of column
    - If other then the default score methods were used for building the mTCW database, they will be used for running the external MSA program (i.e. MSA...). The scores are written to file in /ResultsAlign.
    - Tiny bug: stop was not removed on perfect aligns
- Other
  - Hit Table: Add "Copy hit Sequence"
runSingleTCW
- If the sTCWdb was annotated with NT and AA hits, the Best Annotated Hit will be an AA hit (these are used for ORF finding). Note, the NT can still be the Best Eval.
viewSingleTCW Align
- Align
  - For hit and pair alignments, there is now a N:1 increase zoom along with the original 1:N decrease zoom
  - The "View" line has been modified so that all options show the current state (like viewMulti does).
  - Tiny interface cleanups:
    - For AA-sTCWdb (i.e. the database was created with protein sequences), the alignment process produced incorrect warnings (the alignment was fine). The Find Hits did not work for the AA-Seqs.
    - For an NT hit aligned to an NT sequence, the "Align..." pop-up produced incorrect warnings.
    - For the "Seq" view: (1) the Ruler extended the length of the NT sequences for an AA display. (2) Only 15 bases of the overhang was suppose to be shown, but that did not always work.
    - For selecting a sequence for the "Align...", the bottom sequence had to be selected; now either can be.
- Other
  - Basic AnnoDB Hit: Copy Hit Sequence: the name is copied with the sequence in FASTA format.
For both viewSingleTCW and viewMultiTCW
- The information popup windows have selectable text.
- Find Hits - AA-ORFs (sTCW) and AA-Seq (mTCW) are the default Subject with Diamond as the default search program. For "Delete search files", not all files were being deleted. A few confusing aspects of the interface were fixed, mainly dealing with parameters.
Software
- Move single align classes to sng.viewer.panels.align (only AlignPairOrig is shared)
- Split MainToolPanel into PairViewPanel and ContigViewPanel.
- Renamed sng classes to correspond to cmp classes of similar type.
- Removed any references to 'sng' in /util or /cmp
- Cleaned up a lot of dead code from graphics routines

Go to tope

v3.1.2 4-Dec-20

runMultiTCW
- Schema (mTCW db6.3): small change to add storage for the two score methods.
- The MSA score2 was the external MstatX Trident method. Score2 has been changed to a built-in Wentrophy, which is exactly the same as the MstatXd Wentropy except that the score is (1 - score) so that a value of '1' is the most conserved.
- The user can request that Score1 and/or Score2 be an MstatX method, where this is a command line option (see ./runMultiTCW -h). Scores can be updated just by, setting a command line flag, and then running the "Run Stats" with the "Compute MSA and score" option.
- Sum-of-Pairs can get large negative numbers if the cluster is large and has many gaps, hence, TCW will not compute this score if this is the case (this is a temporary fix, as it is rare, but can hang the machine).
- The Hit Cluster method has a new parameter, that when set, requires all sequences in a cluster to have a hit with all other sequences.
viewMultiTCW
- Since the MSA scores can be different from the defaults, they are now referred to as Score1 and Score2:
  - At the bottom of the Overview, it is stated what methods were used.
  - In the Cluster table, mousing over the column Score1 and Score2 in the column panel shows the method in the lower left hand corner.
- The alignment views have been improved:
  - The 3-views for a pair will scroll now and allow the viewing of the sequence.
  - The default view is to show non-synonymous amino acids in one color and the synonymous ones in another. It still does that, but the colors have been altered to make them easier to detect.
  - An option has been added to view the "Zappo" physicochemical coloring.
  - For the "View Seq" option:
    - The characters that are different from the consensus are in bigger bold face to make them more obvious.
    - The "Dot" options allows for all matches to be shown as a '.', which makes it easier to view the differences.
  - For the "View Line" graphical option:
    - The Zoom now has both 1:N and N:1 options, where 1:N decreases the graphic size, and N:1 increases the size. The N:1 provides more space between the vertical hashes (differences), which makes them easier to distinguish.
Updates to the Help and on-line documentation.

For developers: The alignment classes are now in a new seq.align package. The SumStats method is renamed to PairSumStats. A ./runMultiTCW -x will remove clusters before re-scoring clusters.

v3.1.1 10-Nov-20

It is often easier to use the TCW generated sequence names (e.g. Os_00001) than the sequence names supplied in the file (e.g. NM_001048268.2), but it is important to be able to access the original names, hence, the following changes.

runSingleTCW
- Bug fix: When Use Sequence Names from File was selected, it forced Skip Assembly to be unselected, whereas it was suppose to force it to be selected.
- Annotate: It will no longer create the HitWarnings.log file unless it is written to. This file has warning such as the "Description" being longer than allowed.
viewSingleTCW

Name Use Sequence Names from File Skip Assembly
Orig ID No Yes
Longest No No
Orig ID or none Yes Yes
- Basic Query Sequence: Allow query on Orig ID or Longest unless the Seq ID != Orig ID. Make the column heading Orig ID or Longest according to the table above.
- Sequence Detail: Show Orig ID or Longest in the top text area according to the table above, and add it to the Copy... drop-down.
- Sequence Table: Make the column heading Orig ID or Longest according to the table above.
Update all of Tour.

Name	Use Sequence Names from File	Skip Assembly
Orig ID	No	Yes
Longest	No	No
Orig ID or none	Yes	Yes

v3.1.0 5-Nov-20

mTCW schema update to version 62 for the new Hit Table.
- Current mTCW databases will be updated the first time they are viewed.
viewMultiTCW:
- Made all tabs on the left and their tables summaries consistent for Sequences, Pairs and Clusters. Also made their column naming and layout more consistent.
- Overview:
  - The "Explain" button, which explains what the numbers in the Overview mean, has been improved (albeit, still a bit confusing). There are now two AAsim percents, one is the percent of equal AA chars from all aligned AA chars, and second is the average of the %similarities for each pair alignment. The same two numbers exist for NTsim, AAcov, and NTcov.
- New Hit Table and Filter:
  - This is a new table and filter. It only links to the Sequence Table.
- Sequence Details:
  - Add "Copy...", which copies to the clipboard information for the selected seqID in Pairs table or hitID in Hit Table.
- Sequence Table:
  - Add to the "Copy..." option "Hit Sequence"
  - Hit Align: (1) Allow only one sequence to be selected, (2) If Best Hit is NT, translate to AA before aligning to AA sequence.
- Pairs Table:
  - Add "Cluster" button to show all clusters from the selected set of pairs
- Cluster Table:
  - Add "Next/Prev" when viewing Pairs or Sequence selection.
- Pair and Sequence Filters:
  - If a minimum of "0" was entered, "All" was returned; now it correctly remains "0".
- Updated the online Help.
runMultiTCW:
- Add 'Last Update' to the Overview.
- Reduced the memory a little on 'Pair Stats'.
runSingleTCW:
- ORF finder: If all Best Hits are nucleotide, the Best Hit and Markov options do not work; this information has been added to the Help.
Updated the Tour for runMulti.

2020 releases

v3.1.2	04-Dec-20	runMulti	Improve MSA scores
v3.1.0	05-Nov-20	viewMulti	Add Hit Filter and Table
v3.0.4	04-Sep-20	runSingle	TPM for normalization instead of RPKM
v3.0.3	16-Jun-20	Package	Moved external and external_osx to Ext/linux and Ext/mac