The University of Arizona
Tour: runSingleTCW
AGCoL | TCW Home | Doc Index | Tour | runSingleTCW | runDE | viewSingleTCW | runMultiTCW | viewMultiTCW
runSingleTCW
Main Window | Additional Windows | References

Main window

1. Load Data

  1. The input may be transcripts with optional read counts, proteins with optional spectral counts, or sequences to be assembled such as Sanger ESTs, 454 reads, and/or transcript libraries.
  2. The Add window allows you to define a dataset along with its conditions (e.g. tissue, treatment, etc).
    It allows you to define the count file(s) for transcript or protein libraries, where the information will be listed under Associated Counts on the main window.
    • If there are replicates, you may define them with the Define Replicates window.

2. Instantiate

  1. Skip Assembly - the sequences can be instantiated with no assembly.
  2. Assembly - the sequences can be assembled. If Read sequences (i.e. ESTs) are mixed with transcripts that have read counts, the EST dataset 'counts' for the contig will be the number of ESTs in the given contig, and the transcripts will retain their counts from the input.

3. Annotate Sequences

  1. Add one or more databases to search against, which can be protein or nucleotide.
  2. In the Options window, define the GO database (see Annotation Setup).

4. Add Remarks and Locations

Remarks and Locations (i.e. chromosome, start, end, strand) can be added to sequences and queried in the viewSingleTCW.

(Click any image to see larger)

Additional windows

Load Data - Add/Edit Selecting Add shows this window. Selecting Edit shows a similar window, but only the Attribute values can be changed after the database has been created. The counts can be in one file or generated with the Build combined count file option. On Save, the condition names will be written in the Associated Counts table on the main window.

Build count file - Generate File will generate a file called Combined_read_count.csv file where the columns are conditions with their respective counts.

Define Replicates (Main window) If the Associated Counts table has replicates, they can be defined in this window, which will update the table as shown above in the Main window.

 
Annotate Sequences - Add/Edit Databases to search against are referred to as "annoDBs". Any FASTA file can be used an annoDB, though TCW gives special support to using UniProt taxonomic databases; besides allowing taxonomic specific querying in viewSingleTCW, the UniProt .dat file is used to extract GO, KEGG, EC and PFam information. (A script is provide to download and format the UniProt databases).

As the UniProt Trembl databases increase in size (tr_bacteria.dat is 76GB as of 5/17/15), it has become impractical to search these with BLAST1. Fortunately, the diamond2 and usearch3 programs provide supper fast results (see fast searching) for timing and results.

References

Go to top

  1. BLAST is used for assembly, annotation and interactive searching in viewSingleTCW.
    Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389-3402.
  2. Diamond can be used for annotation of blastx and blastp searches.
    Buchfink B, Xie C, Huson D (2015) Fast and Sensitive Protein Alignment using DIAMOND, Nature Methods, 12, 59-60 doi:10.1038/nmeth.3176.
  3. Usearch can be used for annotation for blastp and modified blastx searches.
    Edgar,RC (2010) Search and clustering orders of magnitude faster than BLAST, Bioinformatics 26(19), 2460-2461.
  4. CAP3 is used for assembly.
    Huang X, Madan A (1999) CAP3: A DNA sequence assembly program. Genome Res 9: 868-877.
  5. UniProt is recommended for protein annotation as the GO and other information can be extracted by the TCW and added to the database.
    Dimmer EC, Huntley RP, Alam-Faruque Y, Sawford T, O'Donovan C, et al. (2012) The UniProt-GO Annotation database in 2011. Nucleic Acids Res 40: D565-570.
  6. Gene Ontology mySQL database is used for levels and descriptions.
    GO Consortium (2012) The Gene Ontology: enhancements for 2011. Nucleic Acids Res 40: D559-564.
Go to top
Email Comments To: tcw@agcol.arizona.edu