goseq pwf length bias plot: interpreting plot
1
1
Entering edit mode
@darya-vanichkina-6050
Last seen 7.1 years ago
Australia/Centenary Institute Universit…

Following on the question goseq pwf length bias plot: help interpreting plot, but with a similar and yet slightly different problem:

I am also using goseq with a manually compiled annotation, and am getting a strange plot similar to the one described by the author above (but I'm not prefiltering more than I should be, *I think*):

What does this plot mean? Should I be using as my background *all* gene lengths, including those not tested by limma, for example because of insufficient read counts /isexpr <- rowSums(cpm(y)>0.7) >= 3 where y is my DGElist/?

What I'm doing:

1. Using the command-line version of featureCounts to count reads to collapsed transcripts (i.e. genes), importing the resulting table into R and getting a clean data frame with counts (I'm using limma voom for differential expression), and a named vector with gene lengths.

2. I am then running the following code (as a first step before testing GO enrichment of genes upregulated in contrast 1 of interest):

goBiasLength <- counttableCompleteLen.vector[order(match(names(counttableCompleteLen.vector),names(results[,"Contrast1"])),na.last=NA)] #na,last=NA removes those that have no matches
Contrast1goUP <- as.numeric((results[,"Contrast1"] > 0))
names(Contrast14goUP) <- names(results[,"Contrast1"])
pwf.Contrast1 <- nullp(Contrast14goUP, bias.data=goBiasLength,plot.fit=TRUE)

 

3. To give you an idea what these are:

head(counttableCompleteLen.vector, n=3L)
Gene1 Gene2 Gene3 
              1070                110               6094
class(counttableCompleteLen.vector)
[1] "numeric"
head(results[,"Contrast1"], n=3L)
Gene100 Gene101 Gene102 
                 0                  0                  0 
class(results)
[1] "TestResults"
attr(,"package")
[1] "limma"

 

Gene identifiers are unique. Thanks in advance!

 

sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.1 LTS

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C               LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8     LC_MONETARY=en_AU.UTF-8   
 [6] LC_MESSAGES=en_AU.UTF-8    LC_PAPER=en_AU.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] graphics  grDevices utils     datasets  stats     methods   base     

other attached packages:
 [1] goseq_1.24.0          geneLenDataBase_1.8.0 BiasedUrn_1.07        edgeR_3.14.0          rJava_0.9-8           limma_3.28.14        
 [7] biomaRt_2.28.0        reshape2_1.4.1        RColorBrewer_1.1-2    data.table_1.9.6      ggplot2_1.0.1        

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.1                locfit_1.5-9.1             lattice_0.20-29            GO.db_3.3.0                Rsamtools_1.24.0          
 [6] Biostrings_2.40.2          digest_0.6.8               GenomeInfoDb_1.8.1         plyr_1.8.3                 chron_2.3-47              
[11] acepack_1.3-3.3            stats4_3.3.0               RSQLite_1.0.0              zlibbioc_1.18.0            GenomicFeatures_1.24.3    
[16] annotate_1.50.0            S4Vectors_0.10.1           Matrix_1.2-6               rpart_4.1-8                proto_0.3-10              
[21] splines_3.3.0              BiocParallel_1.6.2         geneplotter_1.50.0         stringr_1.0.0              foreign_0.8-61            
[26] RCurl_1.95-4.8             munsell_0.4.2              rtracklayer_1.32.1         BiocGenerics_0.18.0        mgcv_1.8-12               
[31] nnet_7.3-8                 SummarizedExperiment_1.2.3 gridExtra_2.2.1            Hmisc_3.17-0               IRanges_2.6.1             
[36] XML_3.98-1.4               GenomicAlignments_1.8.3    MASS_7.3-34                bitops_1.0-6               grid_3.3.0                
[41] nlme_3.1-117               xtable_1.8-2               gtable_0.1.2               DBI_0.4-1                  magrittr_1.5              
[46] scales_0.3.0               stringi_1.0-1              XVector_0.12.0             genefilter_1.54.2          latticeExtra_0.6-28       
[51] Formula_1.2-1              tools_3.3.0                Biobase_2.32.0             DESeq2_1.12.3              parallel_3.3.0            
[56] survival_2.38-1            AnnotationDbi_1.34.3       colorspace_1.2-6           cluster_1.15.3             GenomicRanges_1.24.2      
[61] knitr_1.13 

 

goseq • 1.7k views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 13 minutes ago
WEHI, Melbourne, Australia

I don't entirely follow your code, but this may help:

  1. No, you should not be using the genes that were filtered as not expressed.
  2. However, you should use all non-filtered genes together to estimate the length bias trend. Do not separate up and down genes for this purpose. You do not need separate trends for up and down genes.
  3. If the trend still seems to be decreasing, then just do an ordinary GO analysis without trend correction.
ADD COMMENT

Login before adding your answer.

Traffic: 696 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6