dmrseq: cannot adjust for a covariate
1
0
Entering edit mode
l.kremer • 0
@lkremer-14928
Last seen 6.2 years ago

Hello everyone,

I'm trying to use the tool dmrseq to find differentially methylated regions (DMRs) from bisulfite-seq data. My experimental design looks like this:

##             celltype treatment
##             <factor>  <factor>
## neuron_c1     neuron   control
## neuron_c2     neuron   control
## neuron_t1     neuron   treated
## neuron_t2     neuron   treated
## stemcell_c1 stemcell   control
## stemcell_c2 stemcell   control
## stemcell_t1 stemcell   treated
## stemcell_t2 stemcell   treated

Now I want to find e.g. DMRs between stem cells and neurons while adjusting for the effect of treatment. According to the help site of "dmrseq::dmrseq", this should be possible by specifying the "adjustCovariate" argument.

However, specifying "adjustCovariate" always results in an error (that occurs very quickly before any calculations are done). See below for a minimal working (crashing) example that's reproducible on multiple machines.

Of course I also tried the code below using a proper dataset, but to no avail. I also tried specifying pData as character instead of factor, but it doesn't seem to make a difference.

Thank you in advance, my experience with dmrseq has been great so far!

Detecting differentially methylated regions with dmrseq

require(bsseq)
require(BiocGenerics)
library(dmrseq)

Constructing a toy dataset from the bsseq example file.

infile <- system.file("extdata/test_data.fastq_bismark.bismark.cov.gz",
                       package = 'bsseq')

cpg <- BiocGenerics::combine(
  bsseq::read.bismark(files = infile, sampleNames = "neuron_c1", strandCollapse = F),
  bsseq::read.bismark(files = infile, sampleNames = "neuron_c2", strandCollapse = F),
  bsseq::read.bismark(files = infile, sampleNames = "neuron_t1", strandCollapse = F),
  bsseq::read.bismark(files = infile, sampleNames = "neuron_t2", strandCollapse = F),
  bsseq::read.bismark(files = infile, sampleNames = "stemcell_c1", strandCollapse = F),
  bsseq::read.bismark(files = infile, sampleNames = "stemcell_c2", strandCollapse = F),
  bsseq::read.bismark(files = infile, sampleNames = "stemcell_t1", strandCollapse = F),
  bsseq::read.bismark(files = infile, sampleNames = "stemcell_t2", strandCollapse = F)
)
## [read.bismark] Reading file '/home/lukas/R/x86_64-pc-linux-gnu-library/3.4/bsseq/extdata/test_data.fastq_bismark.bismark.cov.gz' ... done in 0.8 secs
## [read.bismark] Joining samples ... done in 0.2 secs
## [read.bismark] Reading file '/home/lukas/R/x86_64-pc-linux-gnu-library/3.4/bsseq/extdata/test_data.fastq_bismark.bismark.cov.gz' ... done in 0.5 secs
## [read.bismark] Joining samples ... done in 0.1 secs
## [read.bismark] Reading file '/home/lukas/R/x86_64-pc-linux-gnu-library/3.4/bsseq/extdata/test_data.fastq_bismark.bismark.cov.gz' ... done in 0.1 secs
## [read.bismark] Joining samples ... done in 0.1 secs
## [read.bismark] Reading file '/home/lukas/R/x86_64-pc-linux-gnu-library/3.4/bsseq/extdata/test_data.fastq_bismark.bismark.cov.gz' ... done in 0.1 secs
## [read.bismark] Joining samples ... done in 0.1 secs
## [read.bismark] Reading file '/home/lukas/R/x86_64-pc-linux-gnu-library/3.4/bsseq/extdata/test_data.fastq_bismark.bismark.cov.gz' ... done in 0.1 secs
## [read.bismark] Joining samples ... done in 0.1 secs
## [read.bismark] Reading file '/home/lukas/R/x86_64-pc-linux-gnu-library/3.4/bsseq/extdata/test_data.fastq_bismark.bismark.cov.gz' ... done in 0.1 secs
## [read.bismark] Joining samples ... done in 0.1 secs
## [read.bismark] Reading file '/home/lukas/R/x86_64-pc-linux-gnu-library/3.4/bsseq/extdata/test_data.fastq_bismark.bismark.cov.gz' ... done in 0.1 secs
## [read.bismark] Joining samples ... done in 0.1 secs
## [read.bismark] Reading file '/home/lukas/R/x86_64-pc-linux-gnu-library/3.4/bsseq/extdata/test_data.fastq_bismark.bismark.cov.gz' ... done in 0.1 secs
## [read.bismark] Joining samples ... done in 0.1 secs
cpg
## An object of type 'BSseq' with
##   2013 methylation loci
##   8 samples
## has not been smoothed
## All assays are in-memory

Labeling the toy samples according to treatment & cell type:

pData(cpg)$celltype  <- as.factor(c("neuron", "neuron", "neuron", "neuron",
                                    "stemcell", "stemcell", "stemcell", "stemcell"))
pData(cpg)$treatment <- as.factor(c("control", "control", "treated", "treated",
                                    "control", "control", "treated", "treated"))

Double-checking that this table makes sense:

pData(cpg)
## DataFrame with 8 rows and 2 columns
##             celltype treatment
##             <factor>  <factor>
## neuron_c1     neuron   control
## neuron_c2     neuron   control
## neuron_t1     neuron   treated
## neuron_t2     neuron   treated
## stemcell_c1 stemcell   control
## stemcell_c2 stemcell   control
## stemcell_t1 stemcell   treated
## stemcell_t2 stemcell   treated

Only keep loci that have at least 1 coverage in all samples

cpg <- filterLoci(cpg)
## Filtering out loci with coverage less than 1 read in at least one sample
## Removed 0 out of 2013 loci

Trying to detect differentially methylated regions (DMRs) with dmrseq while adjusting for "treatment":

regions <- dmrseq(bs = cpg,
                  testCovariate = "celltype",
                  adjustCovariate = "treatment")
## Error in colnames(design)[, (max(coeff) + 1):ncol(design)] <- colnames(pData(bs))[adjustCovariate]: incorrect number of subscripts on matrix

Traceback is not very helpful.

traceback()
## 1: dmrseq(bs = cpg, testCovariate = "celltype", adjustCovariate = "treatment")

That didn't work, let's try again, this time I'm specifying the covariate by column number:

regions <- dmrseq(bs = cpg,
                  testCovariate = "celltype",
                  adjustCovariate = 2)
## Error in colnames(design)[, (max(coeff) + 1):ncol(design)] <- colnames(pData(bs))[adjustCovariate]: incorrect number of subscripts on matrix
traceback()
## 1: dmrseq(bs = cpg, testCovariate = "celltype", adjustCovariate = 2)

Still didn't work. But it works if I don't specify a covariate to adjust for:

regions <- dmrseq(bs = cpg,
                  testCovariate = "celltype")
## Condition stemcell vs neuron
## Using a single core
## Detecting candidate regions with coefficient larger than 0.1 in magnitude.
## ... (rest of the results omitted) ...

(Okay, it doesn't actually work here because I'm using a toy dataset, but it works on my own data)

BiocInstaller::biocValid()
## [1] TRUE
sessionInfo()
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] dmrseq_0.99.1              bsseq_1.14.0              
##  [3] SummarizedExperiment_1.8.1 DelayedArray_0.4.1        
##  [5] matrixStats_0.53.0         Biobase_2.38.0            
##  [7] GenomicRanges_1.30.1       GenomeInfoDb_1.14.0       
##  [9] IRanges_2.12.0             S4Vectors_0.16.0          
## [11] BiocGenerics_0.24.0        knitr_1.19                
## 
## loaded via a namespace (and not attached):
##   [1] nlme_3.1-131                             
##   [2] bitops_1.0-6                             
##   [3] TxDb.Rnorvegicus.UCSC.rn6.refGene_3.4.1  
##   [4] bit64_0.9-7                              
##   [5] RColorBrewer_1.1-2                       
##   [6] progress_1.1.2                           
##   [7] httr_1.3.1                               
##   [8] TxDb.Hsapiens.UCSC.hg18.knownGene_3.2.2  
##   [9] doRNG_1.6.6                              
##  [10] tools_3.4.3                              
##  [11] R6_2.2.2                                 
##  [12] DBI_0.7                                  
##  [13] lazyeval_0.2.1                           
##  [14] colorspace_1.3-2                         
##  [15] permute_0.9-4                            
##  [16] prettyunits_1.0.2                        
##  [17] RMySQL_0.10.13                           
##  [18] bit_1.1-12                               
##  [19] compiler_3.4.3                           
##  [20] pkgmaker_0.22                            
##  [21] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.0 
##  [22] rtracklayer_1.38.3                       
##  [23] scales_0.5.0                             
##  [24] readr_1.1.1                              
##  [25] stringr_1.2.0                            
##  [26] digest_0.6.15                            
##  [27] Rsamtools_1.30.0                         
##  [28] R.utils_2.6.0                            
##  [29] XVector_0.18.0                           
##  [30] pkgconfig_2.0.1                          
##  [31] htmltools_0.3.6                          
##  [32] BSgenome_1.46.0                          
##  [33] regioneR_1.10.0                          
##  [34] limma_3.34.6                             
##  [35] TxDb.Dmelanogaster.UCSC.dm6.ensGene_3.4.1
##  [36] rlang_0.1.6                              
##  [37] RSQLite_2.0                              
##  [38] BiocInstaller_1.28.0                     
##  [39] shiny_1.0.5                              
##  [40] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2  
##  [41] bindr_0.1                                
##  [42] BiocParallel_1.12.0                      
##  [43] gtools_3.5.0                             
##  [44] dplyr_0.7.4                              
##  [45] R.oo_1.21.0                              
##  [46] RCurl_1.95-4.10                          
##  [47] magrittr_1.5                             
##  [48] GenomeInfoDbData_1.0.0                   
##  [49] Matrix_1.2-11                            
##  [50] Rcpp_0.12.15                             
##  [51] munsell_0.4.3                            
##  [52] R.methodsS3_1.7.1                        
##  [53] stringi_1.1.6                            
##  [54] yaml_2.1.16                              
##  [55] zlibbioc_1.24.0                          
##  [56] bumphunter_1.20.0                        
##  [57] org.Hs.eg.db_3.5.0                       
##  [58] plyr_1.8.4                               
##  [59] AnnotationHub_2.10.1                     
##  [60] grid_3.4.3                               
##  [61] blob_1.1.0                               
##  [62] lattice_0.20-35                          
##  [63] Biostrings_2.46.0                        
##  [64] TxDb.Rnorvegicus.UCSC.rn5.refGene_3.4.2  
##  [65] GenomicFeatures_1.30.2                   
##  [66] hms_0.4.1                                
##  [67] locfit_1.5-9.1                           
##  [68] pillar_1.1.0                             
##  [69] org.Dm.eg.db_3.5.0                       
##  [70] rngtools_1.2.4                           
##  [71] codetools_0.2-15                         
##  [72] reshape2_1.4.3                           
##  [73] biomaRt_2.34.2                           
##  [74] XML_3.98-1.9                             
##  [75] glue_1.2.0                               
##  [76] evaluate_0.10.1                          
##  [77] outliers_0.14                            
##  [78] annotatr_1.4.1                           
##  [79] data.table_1.10.4-3                      
##  [80] foreach_1.4.4                            
##  [81] httpuv_1.3.5                             
##  [82] org.Mm.eg.db_3.5.0                       
##  [83] gtable_0.2.0                             
##  [84] assertthat_0.2.0                         
##  [85] org.Rn.eg.db_3.5.0                       
##  [86] ggplot2_2.2.1                            
##  [87] TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2  
##  [88] mime_0.5                                 
##  [89] TxDb.Rnorvegicus.UCSC.rn4.ensGene_3.2.2  
##  [90] xtable_1.8-2                             
##  [91] tibble_1.4.2                             
##  [92] TxDb.Hsapiens.UCSC.hg38.knownGene_3.4.0  
##  [93] iterators_1.0.9                          
##  [94] registry_0.5                             
##  [95] GenomicAlignments_1.14.1                 
##  [96] AnnotationDbi_1.40.0                     
##  [97] memoise_1.1.0                            
##  [98] bindrcpp_0.2                             
##  [99] interactiveDisplayBase_1.16.0            
## [100] TxDb.Dmelanogaster.UCSC.dm3.ensGene_3.2.2
dmrseq dmr dmrs bisulfite methylation • 1.4k views
ADD COMMENT
2
Entering edit mode
keegan ▴ 60
@keegan-11987
Last seen 7 months ago
Vancouver, BC, Canada

Hi l.kremer,

Sorry for the delay in responding to this inquiry. I hadn't yet been checking the support site for questions since dmrseq was just recently accepted into Bioconductor.

Thanks for your thorough and reproducible bug report. I was able to track down the root cause and provide a patch. Now providing an adjustCovariate should work smoothly. You can get the latest version on github (https://github.com/kdkorthauer/dmrseq), or it will be in Bioc-devel shortly (within a matter of days).

Best,
Keegan

ADD COMMENT
0
Entering edit mode

Hi Keegan,

I updated to the latest version and now it works flawlessly. Thank you very much for the quick fix!

Lukas

ADD REPLY

Login before adding your answer.

Traffic: 881 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6