TCW overview for tra

Project:  tra   #Seqs: 211   #Hits: 12,259   #GOs: 6,641    TPM  Seq-DE  GO-Enrich   Pairs
 
01-Oct-21 Build Database  sequences loaded from external source
15-Oct-21 Last Annotation with sTCW v3.3.4

INPUT
   Counts: 
      SEQID  ID          SIZE  #REPS  
      tra    Root     955,097      5  
      tra    Stem   1,865,201      5  
      tra    Oleaf    474,551      5  

   Sequences: 
      SEQID  SIZE  TITLE                                 AVG-len  MED-len  
      tra     211  TCW demo with transcripts and counts    1,079      861  

ANNOTATIONS
   Hit Statistics:
      Sequences with hits     210  (99.5%)     Bases covered by hit  180,582  (79.3%)  
      Unique hits          12,259              Total bases           227,683           
      Total sequence hits  17,968                                                      

   annoDBs (Annotation databases): 7   (see Legend below)
      ANNODB            ONLY  BITS  ANNO  UNIQUE  TOTAL   AVG  Rank  HAS (%Seqs)   AVG  COVER  COVER  
                                                         %SIM   =1   HIT          %SIM   >=50   >=90  
      SP-plants            0     0     0   1,322  2,942  53.3    |   193 (91.5%)  69.6  61.7%   3.1%  
      SP-invertebrates     0     0     0     610  1,249  42.0    |   126 (59.7%)  47.0  30.2%   0.8%  
      SP-fungi             0     0     0     846  1,452  42.9    |   118 (55.9%)  46.4  24.6%   0.8%  
      SP-bacteria          0     0     0     782  1,121  43.7    |    63 (29.9%)  45.4  25.4%     0%  
      SP-fullSubset        0     0     0   1,259  2,270  44.3    |   137 (64.9%)  48.1  32.1%   0.7%  
      TR-plants            9   207   207   4,535  5,217  75.5    |   210 (99.5%)  81.4  69.5%   8.6%  
      TR-invertebrates     0     3     3   2,905  3,717  49.5    |   179 (84.8%)  54.0  39.1%   2.2%  

   Top 15 species from total: 1,508
      SPECIES (25 char)         BITS   ANNO  TOTAL     SPECIES                 BITS   ANNO   TOTAL  
      Musa acuminata              75     54    325     Anthurium amnicola         2      4     101  
      Musa balbisiana             59     42    313     Vitis vinifera             2      3      89  
      Ensete ventricosum          19     14    214     Apostasia shenzhenica      2      3      67  
      Elaeis guineensis           10     24    347     Dendrobium catenatum       2      2      73  
      Phoenix dactylifera          9     19    318     Macleaya cordata           2      2      41  
      Zingiber officinale          7      7     27     Nelumbo nucifera           1      3      95  
      Ananas comosus               4     11    458     Ricinus communis           1      2      43  
      Meloidogyne enterolobii      3      0     10     Other                     12     20  15,447  

   Gene Ontology Statistics:
      Unique GOs          6,641              Unique hits with GOs  10,662  (87.0%)  
      Sequences with GOs    208  (98.6%)     Seq best hit has GOs     194  (91.9%)  
      Has goslim_plant       95                                                     
                                                                                    
      biological_process  4,870  (73.3%)     is_a                  10,764           
      molecular_function  1,026  (15.4%)     part_of                1,175           
      cellular_component    745  (11.2%)                                            

EXPRESSION
   TPM: (% of 211)
               <2.0     2-5    5-10    10-50  50-100   100-1k     1k-5k     >=5k  
      Root   1(<1%)  0 (0%)  1(<1%)   1(<1%)  0 (0%)  60(28%)  101(48%)  47(22%)  
      Stem   3 (1%)  2 (1%)  1(<1%)   6 (3%)  7 (3%)  45(21%)  100(47%)  47(22%)  
      Oleaf  6 (3%)  1(<1%)  1(<1%)  11 (5%)  6 (3%)  40(19%)   96(45%)  50(24%)  

   Differential expression:  (% of 211)
             <1E-5    <1E-4   <0.001    <0.01     <0.05  
      RoSt  6 (3%)  17 (8%)  32(15%)  65(31%)   97(46%)  
      RoLe  1(<1%)  16 (8%)  37(18%)  74(35%)  110(52%)  
      StLe  0 (0%)   2 (1%)  12 (6%)  65(31%)  101(48%)  

   Gene ontology enrichment: (% of 6,641)
             <1E-5   <1E-4  <0.001    <0.01     <0.05  
      RoSt  0 (0%)  0 (0%)  1(<1%)  53 (1%)  312 (5%)  
      RoLe  0 (0%)  0 (0%)  2(<1%)  21(<1%)  165 (2%)  
      StLe  0 (0%)  0 (0%)  0 (0%)   1(<1%)   49 (1%)  

SEQUENCES
   Sequence lengths:
      <=100  101-500  501-1000  1001-2000  2001-3000  3001-4000  4001-5000  >5000  
      0(0%)  37(18%)   84(40%)    72(34%)      8(4%)      8(4%)      2(1%)  0(0%)  

   Quality:
      Sequences with #n>0:   3  ( 1.4%)  
      Sequences with #n>10:  1  ( 0.5%)  

   ORF lengths:
      <=100  101-500  501-1000  1001-2000  2001-3000  3001-4000  4001-5000  >5000  
      0(0%)  62(29%)   92(44%)    47(22%)      4(2%)      6(3%)      0(0%)  0(0%)  

   ORF Stats:   Average length 863
      Has Hit            210  (99.5%)    Both Ends    61  (28.9%)                               
      Is Longest ORF     190  (90.0%)    ORF>=300    190  (90.0%)    MultiFrame     3   (1.4%)  
      Markov Best Score  211 (100.0%)    ORF=Hit     119  (56.4%)    Stops in Hit   6   (2.8%)  
      All of the above   190  (90.0%)     with Ends   31  (14.7%)    >=9 Ns in ORF  1    (<1%)  
                        
   GC Content: 48.65%
      Pos1  18.3%              5UTR    CDS   3UTR              5UTR     CDS    3UTR  
      Pos2  13.7%    %GC      43.68  48.79  36.25    Length     21k    182k     25k  
      Pos3  16.6%    CpG-O/E   0.88   0.71   0.63    AvgLen    98.9   863.1   117.1  
                                    
   Similar pairs: 100
      Nucleotide              17  
      Translated nucleotide  100  
      Translated ORFs         95  

LOCATIONS
   Sequences with location:          12  unique locations: 12
   Sequences on positive strand:      7  negative strand:  5
   Sequences per group:
      1  2  3-4  5-7  8-10  11-20  21-30  >30  
      0  0    3    0     0      0      0    0  


-------------------------------------------------------------------
PROCESSING INFORMATION:
   AnnoDB Files:
      Type  Taxo           FILE                                DB DATE    ADD DATE   EXECUTE               
      sp    plants         uniprot_sprot_plants.fasta          27-Jan-21  15-Oct-21  diamond  --masking 0  
      sp    invertebrates  uniprot_sprot_invertebrates.fasta   27-Jan-21  15-Oct-21  diamond  --masking 0  
      sp    fungi          uniprot_sprot_fungi.fasta           27-Jan-21  15-Oct-21  diamond  --masking 0  
      sp    bacteria       uniprot_sprot_bacteria.fasta        27-Jan-21  15-Oct-21  diamond  --masking 0  
      sp    fullSubset     uniprot_sprot_fullSubset.fasta      27-Jan-21  15-Oct-21  diamond  --masking 0  
      tr    plants         uniprot_trembl_plants.fasta         27-Jan-21  15-Oct-21  diamond  --masking 0  
      tr    invertebrates  uniprot_trembl_invertebrates.fasta  27-Jan-21  15-Oct-21  diamond  --masking 0  

   Prune: none

   Gene Ontology: go-basic.obo-Feb2021  GOdb: go_demo  [GOs added with sTCW v3.3.4]
   GO Slim: goslim_plant

   ORF finder:
      Use ATG only for start site
      Rule 1: Use Good hit: E-value <=1E-10 or Sim >= 20%
      Rule 2: Use longest ORF if Log Ratio > 0.5
      Rule 3: Use best Markov score if Log Ratio > 0.4
              Train using best hits (204 seqs, 174.4k bases)

   Differential Expression computation: 
      Column       Method                         Conditions
      RoSt         edgeRglm.R CPM>1>=2            Root : Stem 
      RoLe         edgeRglm.R CPM>1>=2            Root : Oleaf 
      StLe         edgeRglm.R CPM>1>=2            Stem : Oleaf 

   GO enrichment computation: 
      Column       Method                         Cutoff
      RoSt         goSeqNoFDR.R                   5.0e-02
      RoLe         goSeqNoFDR.R                   5.0e-02
      StLe         goSeqNoFDR.R                   5.0e-02

-------------------------------------------------------------------
LEGEND:
   annoDB:
      ANNODB    is DBTYPE-TAXO, which is the DBtype and taxonomy
      ONLY      #Seqs that hit the annoDB and no others
      BITS      #Seqs with the overall best bitscore from the annoDB
      ANNO      #Seqs with the overall best annotation from the annoDB 
      UNIQUE    #Unique hits to the annoDB
      TOTAL     #Total seq-hit pairs for the annoDB
      AVG %SIM  Average percent similarity of the total seq-hit pairs
      HIT-SEQ   Percent of #Seqs that have at least one hit from the annoDB
      BEST HIT  The following columns refer to the best hit (Rank=1):
         AVG %SIM  Average percent similarity of the best hit seq-hit pairs
         Cover>=N  Percent of HIT-SEQ where the best hit has similarity>=N% and hit coverage>=N%

   #Seqs is listed at top of overview
   Best Annotation:
      Descriptions may not contain words such as 'uncharacterized protein'