runSingleTCW can use
diamond for annotation.
- Using different search programs in TCW
- Timings and results
- Search program differences
Using different search programs in TCW
- Download (blast is required, the other two are optional):
- Download diamond. It only contains the
executable for linux, though it provides the source code.
usearch. It has executables for Linux and MacOSX.
It has executables for Linux and MacOS.
- Note: TCW should still supports Legacy blast, but it hasn't been tested for this release.
- Put the search path in the HOSTS.cfg file (see HOSTS.cfg for more detail):
The following results use the TCW defaults and 16 CPUs.
When you add an annoDB (i.e. database to search against such as UniProt), you
have the option of selecting the search program to use. It will only list the programs specified in
the HOSTs.cfg file. When there is more than one search program,
they will be listed along with TCW Selects.
TCW Select will automatically select the program to use according to
the following rules:
To be clear, you can override the TCW automatic selection by manually selecting the program from the
- SwissProt -- uses blast+ because we want every possible hit.
- TrEMBL -- these have gotten so big that they can take weeks with blast+, hence,
it will use a fast program according to rule 4.
- For any other annoDB type: blast+ will be used if the database is <1Gb, else the fast program will be used.
- Fast program: If diamond is available, it takes precedence over
usearch since it performs 6-frame translation whereas usearch uses ORFs.
runSingleTCW interface, and it can be selected separately for each annoDB, as shown in the
AnnoDB table in |
AnnoDB add/edit interface
1The times do not include formatting.
|2921 proteins against the 19M SwissProt plants
26,685 transcripts against 19M SwissProt plants
For the SwissProt plants, it takes less than 3secs to format the database by any of these programs.
For 1.7Gb TrEMBL plants, diamond takes 1m:30s, usearch runs out of memory due to 32-bit limit,
blast+ takes 3m:34s.
2For blast+, -max_target_seqs 25 was used to limit the output; an equivalent option was
not used for usearch.
Search program differences as of 24May15
|Go to top|
- Usearch 32-bit is free, however, it exceeds that memory limit on the TrEMBL databases, e.g.
1.7Gb Plant TrEMBL. A 64-bit version is available for a cost.
- Diamond can take as input zipped databases, the other two cannot.
- Usearch does not need to have a separate step for formatting the database, though formatting is
a good idea on large databases. The other two must format the database.
- It is not necessary to specify the database type for usearch, it is necessary for the other two.
- Blast and usearch find hits in the gray zone, i.e. similarity <40%; diamond does not find most of these.
- Blast provides tblastx (6 frame translated nucleotide against 6 frame translated nucleotide)
and tblastn (protein against 6 frame translated nucleotide database), whereas the other two programs
- Using Nt=nucleotide, Pr=protein, Tr=translated nucleotide:
1Usearch computes all ORFs and translates them for alignment.
|Pr to Pr (blastp)||Yes||Yes||Yes
|Tr to Pr (blastx)||Yes||Yes||Yes1
|Nt to Nt (blastn)||Yes||No||No2
2The strand must be specified, so this is not offered in TCW at this time.
Go to top