The University of Arizona
AW Guide v1.0 | Pipeline | Files | runAW


Run Demo

  1. Edit HOSTS.cfg to enter the mysql userid (user=) and password (password=). Changing 'host=localhost' has no effect as it only works with localhost.
  2. Run "./runAW".
  3. Press "Select a project", which will show you "demo"; select it.
  4. Go to bottom of window and press "Build". You will be prompted in the terminal window to answer one or more questions.
  5. The expected output is shown at the bottom of this webpage.

To create the AW database

  1. Edit HOSTS.cfg to enter the mysql userid and password. Changing 'host=localhost' has no effect as it only works with localhost.
  2. Run "./runAW".
  3. Press "Create" and you will be prompted for a project name. This will create a directory
    under "/projects", where it will write the AW.cfg configuration file.
  4. Fill out all sections, where everything is required except those entries with (Optional).
  5. Either press "Build" at the bottom of the window.
    Or "Save", exit, and run "./buildAW <project name>",
    Either way, you will need to answer one or more questions at the terminal prompt.
This builds the database and creates a directory "AW_output" under the "project/<project name>"
which contains ntRef.fa and ntAlt.fa, which can be used as input to the script.

To update the AW database

  1. Make any changes to the input files using runAW, then save and exit.
  2. Execute ./buildAW with project_name and option.
    To view options, execute ./buildAW with no parameters, or see below.

Information on input

Details of the file formats are given in the Files documentation.

Important: The naming of the input files must agree with the abbreviations
for Condition #1 and optional Condition #2. This is explained in the Files document.

Variant Files

  1. The variant file(s) and variant annotation file(s) may be the same.
  2. The variant count files should be created with the AW pipeline as the naming of the files is important.
  3. The variant effect file may be made with snpEff (very easy to use)
    or the Ensembl Variant Predictor website. If no effect input is supplied,
    the buildAW will compute the effects missense, synonymous, UTR5 and UTR3.

Reference Genome

  1. This is used to create transcripts, which are loaded into the database and written to file.
  2. The sequenced genome must be split into chromosome files, prefixed by "chrN"
    where N is the number or X or Y. (The prefix does not have to be "chr", as long as whatever
    is used is consistently used).


buildAW <project> [optional action]
   Read projects/<project>/AW.cfg for the parameters.
   The database is AW_<project>

Load all (no action)
   Load GTK genes and trans (GTK)
   Load Variants (VCF)
   Compute transcripts and proteins (genome sequence)
   Load Variant coverage (BED)
Plus the following actions 5-8, which can also be run separately
To run separately, add the necessary file information to runAW, save, then run buildAW

Action  Description
5   Load Variant effects (Ensembl Variant Predictor or snpEFF)
6   Load Gene NCBI descriptions (Genbank)
7   Compute AI=allele imbalance

8   Load Transcript counts (.xprs)
    This is typically run separately since it needs the transcripts,
    which are output during the database build,
    but the trans counts will load during build if available.

N   (where N>20) Compute transcript coverage with read size=N
    by default, it uses 100 during the addition of variant coverage

Demo output

Read projects/demo/AW.cfg         13-Oct-19 07:03:28
> Condition1 Strain
      Strain: Young_hybrid NYf        yes       
> Condition2 Tissue
      Tissue: Brain      Br        
      Tissue: Muscle     Mus       
> Files
      Variants     demo/variants
      VariantCov   demo/SNPCOV/Results
      VariantAnno  demo/snpEFF.chr19.vcf
      GTF          demo/annotation/chr19.gtf
      Genome       demo/genome
      NCBI         demo/
      TransCnt     demo/EXPRESS/Results
Successful load of AW.cfg
   Loading schema database
   Validate abbreviations
      Conditions: NYf:Br   NYf:Mus   
   Checking files and directories
      Variant call directory demo/variants
      Variant coverage directory demo/SNPCOV/Results
         Number of reps per library:
            NYfBr:2      NYfMus:2     
         Good bed files 4
      One Variant effect file 
         Variant file demo/snpEFF.chr19.vcf
            SnpEFF file (SnpEffVersion)
      Genome annotation file demo/annotation/chr19.gtf
         Seqname prefix is 'chr'
      Genome sequence directory demo/genome
         Files found 1
      Transcript count directory demo/EXPRESS/Results
         Number of reps per library:
            NYfBr:2      NYfMus:2     
   Load library and metadata into database

+++Start building entire database+++         13-Oct-19 07:03:35

Add genome annotation (genes/trans)         13-Oct-19 07:03:35
   File demo/annotation/chr19.gtf
   GTF file is probably from Ensembl
      Read: 32642   Genes: 719   Trans: 1351                 
      Dup gene: 1   Pos strand: 363   Neg strand: 356
Finish loading gene and transcript coordinates          0m:3s
--Finish Step 1

Add variant calls to database         13-Oct-19 07:03:39
   File#1 /Users/cari_ann/Workspace/dev/AW_1_1/demo/variants/chr19.exon.snps.vcf
      Read: 2445   Variants: 2445  In Exon: 3878   New SNP: 234   New Indel: 28     
      Genes with variants: 337   Gene-variant pairs: 2449
      Trans with variants: 544   Tran-variant pairs: 3878
      chr: 19:2445 
   Update counts for Exons, Trans and Genes
      Update Genes: 337  Trans: 544  Exons: 1441
Finish adding variants                            0m:8s
--Finish Step 2

Add snpEFF variant effects         13-Oct-19 07:03:48
   File #1 /Users/cari_ann/Workspace/dev/AW_1_1/demo/snpEFF.chr19.vcf
      Update SNP-trans: 3933                                  
      Skipped SNPs: 112420   Skipped Trans: 4008
   Update mySQL Variant tables                        
      Added descriptions: 2442                     
Finish adding variant annotation                  0m:4s
--Finish Step 3

Add genome sequence (create transcripts files)         13-Oct-19 07:03:53
   Read 1 files from demo/genome
   Create output directory projects/demo/AW_output
   Write files to output directory projects/demo/AW_output
   Compute cDNApos (effects loaded from file)
      File #1 chr19.fa                        
      Sequences added: 3246   Update trans: 1351
      No start_codon: 82   No end_codon: 173
      Add exon remarks: 5353   Write aaRef: 1351
Finish generate sequences                    0m:21s  (264M)
--Finish Step 4

Add variant coverage          13-Oct-19 07:04:14
   Load 4 files from demo/SNPCOV/Results
   Add heterozygous SNP counts per library
      File #1 NYfBr1.bed
         Read:1383    Add:1383                     
      File #2 NYfBr3.bed
         Read:1291    Add:1291                     
      File #3 NYfMus1.bed
         Read:1015    Add:1015                     
      File #4 NYfMus2.bed
         Read:1022    Add:1022                     
      Add total variants: 4711  (Max Reps: 2)           0m:2s
   Sum ref/alt SNP coverage from replicates for 2 libraries and 2445 SNPs
      Add to SNP Lib: 2754                              0m:3s
   Sum ref/alt for gene coverage
      Read Genes: 719    With variants: 337
      Add to Gene Lib: 1482                             0m:2s
   Mark SNP clusters to count reads once using radius=50

Update mySQL tables         13-Oct-19 07:04:27
   Add distances between variants
      Trans with >0 variants: 544
      SNPs distance< 50: 1306(39%)
Finish computations                               0m:1s
      Excluded SNP/trans pairs for summing of counts: 1046(26%)
   Sum ref/alt to transLib
      Read Trans: 1351   With variants: 544         
      Add to Trans Lib: 2517                            0m:2s
Finish variant postprocess                        0m:18s
--Finish Step 5

Add NCBI functional annotation          13-Oct-19 07:04:32
   File demo/
      Found: 2016  Entered: 2016   Not found: 0
Finish loading NCBI annotation                    0m:2s
--Finish Step 6

Add transcript counts         13-Oct-19 07:04:35
   Load 4 files from demo/EXPRESS/Results
   Loading information from database
      Libs: 2  Trans: 1351  TransLib: 875
      File #1 NYfBr1.xprs
      File #2 NYfBr3.xprs
      File #3 NYfMus1.xprs
      File #4 NYfMus2.xprs
      Total loaded: 5404  Added: 1827
   Counts added, now updating gene information 
      Add gene lib: 998
      Update gene lib: 2402
Finish loading transcript counts                  0m:8s
--Finish Step 6b

Computing pvalues         13-Oct-19 07:04:43
   Computing SNP ASE...
      SNP/libs computed:850   ASE:201   Opposite Direction Reps:0
   Computing Trans ASE...
       Trans/libs computed:   674     SNP Coverage ASE:   112  Opposite Direction Reps:     3
                                         Read Count ASE:  277  Opposite Direction Reps:   47
   Computing Gene ASE with minimum coverage 20
      Gene/libs computed: 445   SNP coverage ASE:69   Opposite Direction reps:2
      Read Count ASE: 230   Opposite Direction reps: 38
      Updating database with dynamic SNP columns
      Updating database with trans dynamic columns
      Updating database with gene dynamic columns
      Updating database with SNP replicate 'Opposite Direction' remarks
         Updated SNPs: 184
      Updating database with trans 'Opposite Direction' remarks
         odSNP: 21    odRep: 141      
      Updating database with gene with SNP AI count
         Genes with at least 1 AI SNP: 82                 
Finish pvalues                                    0m:4s
--Finish Step 7

Update mySQL tables         13-Oct-19 07:04:48
   Add ref/alt information to main tables
      SNPs counts to SNP  : 1672
      SNPs counts to gene : 286
      SNPs counts to trans: 478
      Read counts to gene : 295
      Read counts to trans: 456
   Compute best trans per gene
      Rank=1: 299                                       0m:0s
   Add counts for missense and damaged/high SNP
      Updated SNPs: 343                   
   Add missense et al counts to transcripts
      Trans Missense SNPs: 559   Damaging: 0
   Add missense et al counts to genes
      Gene with missense SNPs: 145
   Add distances between variants
      Trans with >0 variants: 544
      SNPs distance< 50: 1306(39%)
   Add library sizes                              
Finish computations                               0m:5s
--Finish Step 8

Creating overview and writing to projects/demo/log/overview         13-Oct-19 07:04:53
   Make Totals
      Genes:    719  With SNPs:    336  With Indel:     30
      Trans:   1351  With SNPs:    541  With Indel:     44  AI: 97   [AI=Allele Imbalance (p<0.05)]
       SNPs:   2414     Coding:    983   Cov(>=20):    661  AI: 184  Lib Cov: 849  Lib AI: 201
      InDel:     31     Coding:     13
   Make Pvalue tables
   Make SNP coverage table
   Add files
   Total size of rep libraries will only be written to projects/demo/log/overview
      Make Replicate count table
Complete overview                                 0m:1s

++++++ Complete build AW_demo         13-Oct-19 07:04:55 Elapse time  1m:26s

Email Comments To: