Question: goseq pwf length bias plot: interpreting plot
gravatar for Darya Vanichkina
3.3 years ago by
Australia/Centenary Institute University of Sydney
Darya Vanichkina110 wrote:

Following on the question goseq pwf length bias plot: help interpreting plot, but with a similar and yet slightly different problem:

I am also using goseq with a manually compiled annotation, and am getting a strange plot similar to the one described by the author above (but I'm not prefiltering more than I should be, *I think*):

What does this plot mean? Should I be using as my background *all* gene lengths, including those not tested by limma, for example because of insufficient read counts /isexpr <- rowSums(cpm(y)>0.7) >= 3 where y is my DGElist/?

What I'm doing:

1. Using the command-line version of featureCounts to count reads to collapsed transcripts (i.e. genes), importing the resulting table into R and getting a clean data frame with counts (I'm using limma voom for differential expression), and a named vector with gene lengths.

2. I am then running the following code (as a first step before testing GO enrichment of genes upregulated in contrast 1 of interest):

goBiasLength <- counttableCompleteLen.vector[order(match(names(counttableCompleteLen.vector),names(results[,"Contrast1"])),na.last=NA)] #na,last=NA removes those that have no matches
Contrast1goUP <- as.numeric((results[,"Contrast1"] > 0))
names(Contrast14goUP) <- names(results[,"Contrast1"])
pwf.Contrast1 <- nullp(Contrast14goUP,,


3. To give you an idea what these are:

head(counttableCompleteLen.vector, n=3L)
Gene1 Gene2 Gene3 
              1070                110               6094
[1] "numeric"
head(results[,"Contrast1"], n=3L)
Gene100 Gene101 Gene102 
                 0                  0                  0 
[1] "TestResults"
[1] "limma"


Gene identifiers are unique. Thanks in advance!


R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.1 LTS

 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C               LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8     LC_MONETARY=en_AU.UTF-8   
 [6] LC_MESSAGES=en_AU.UTF-8    LC_PAPER=en_AU.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] graphics  grDevices utils     datasets  stats     methods   base     

other attached packages:
 [1] goseq_1.24.0          geneLenDataBase_1.8.0 BiasedUrn_1.07        edgeR_3.14.0          rJava_0.9-8           limma_3.28.14        
 [7] biomaRt_2.28.0        reshape2_1.4.1        RColorBrewer_1.1-2    data.table_1.9.6      ggplot2_1.0.1        

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.1                locfit_1.5-9.1             lattice_0.20-29            GO.db_3.3.0                Rsamtools_1.24.0          
 [6] Biostrings_2.40.2          digest_0.6.8               GenomeInfoDb_1.8.1         plyr_1.8.3                 chron_2.3-47              
[11] acepack_1.3-3.3            stats4_3.3.0               RSQLite_1.0.0              zlibbioc_1.18.0            GenomicFeatures_1.24.3    
[16] annotate_1.50.0            S4Vectors_0.10.1           Matrix_1.2-6               rpart_4.1-8                proto_0.3-10              
[21] splines_3.3.0              BiocParallel_1.6.2         geneplotter_1.50.0         stringr_1.0.0              foreign_0.8-61            
[26] RCurl_1.95-4.8             munsell_0.4.2              rtracklayer_1.32.1         BiocGenerics_0.18.0        mgcv_1.8-12               
[31] nnet_7.3-8                 SummarizedExperiment_1.2.3 gridExtra_2.2.1            Hmisc_3.17-0               IRanges_2.6.1             
[36] XML_3.98-1.4               GenomicAlignments_1.8.3    MASS_7.3-34                bitops_1.0-6               grid_3.3.0                
[41] nlme_3.1-117               xtable_1.8-2               gtable_0.1.2               DBI_0.4-1                  magrittr_1.5              
[46] scales_0.3.0               stringi_1.0-1              XVector_0.12.0             genefilter_1.54.2          latticeExtra_0.6-28       
[51] Formula_1.2-1              tools_3.3.0                Biobase_2.32.0             DESeq2_1.12.3              parallel_3.3.0            
[56] survival_2.38-1            AnnotationDbi_1.34.3       colorspace_1.2-6           cluster_1.15.3             GenomicRanges_1.24.2      
[61] knitr_1.13 


goseq • 814 views
ADD COMMENTlink modified 3.3 years ago by Gordon Smyth39k • written 3.3 years ago by Darya Vanichkina110
Answer: goseq pwf length bias plot: interpreting plot
gravatar for Gordon Smyth
3.3 years ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

I don't entirely follow your code, but this may help:

  1. No, you should not be using the genes that were filtered as not expressed.
  2. However, you should use all non-filtered genes together to estimate the length bias trend. Do not separate up and down genes for this purpose. You do not need separate trends for up and down genes.
  3. If the trend still seems to be decreasing, then just do an ordinary GO analysis without trend correction.
ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by Gordon Smyth39k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 506 users visited in the last hour