Search
Question: Internal bug of DMR-Seq?
0
9 weeks ago by
kuckunniwid0 wrote:

Working with DMR-seq I get the following error message from time to time:

Detecting candidate regions with coefficient larger than 0.05 in magnitude. ...Chromosome chr1: Smoothed (0.89 min). 335 regions scored (1.64 min).  ...Chromosome chr10: Smoothed (0.74 min). 117 regions scored (0.38 min).  ...Chromosome chr11: Smoothed (0.65 min). 119 regions scored (0.34 min).  ...Chromosome chr11_KI270721v1_random: Smoothed (0 min). No candidates found.  ...Chromosome chr12: Smoothed (0.66 min). 248 regions scored (0.44 min).  ...Chromosome chr13: Smoothed (0.54 min). 138 regions scored (0.48 min).  ...Chromosome chr14: Smoothed (0.54 min). Error in rbind(deparse.level, ...) :    numbers of columns of arguments do not match

The moment when this error appears is dependent on input parameters and sometimes does not appear at all. Does smb faced this problem? How can I fix it?

modified 5 weeks ago by keegan50 • written 9 weeks ago by kuckunniwid0

Hi,

Thanks for the report. So that I can reproduce the error, would you be able to give me a bit more information? Specifically,

• What is the output of BiocManager::valid() on your system?

• What is the code you are using that produces this error? Likewise, what settings do not produce the error?

• If possible, could you send me an .rds object containing a portion of the data that throws the error (specifically, chr14 here)? You can send via email to keegan@jimmy.harvard.edu.

Hopefully this will let me track down and fix the issue. Thanks!

Best,

Keegan

Hi keegan,

thanks for the fast reply! I have troubles even with the first line - somehow my DmrSeq worked without the package BiocManager. Installed it, the output is

> BiocManager::valid()

* sessionInfo()

R version 3.5.1 (2018-07-02) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.1 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C                [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8      [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8     [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                   [9] LC_ADDRESS=C               LC_TELEPHONE=C             [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages: [1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages: [1] BiocManager_1.30.2

loaded via a namespace (and not attached): [1] compiler_3.5.1 tools_3.5.1   

Bioconductor version '3.7'

  * 15 packages out-of-date   * 0 packages too new

create a valid installation with

  BiocManager::install(c(     "edgeR", "evaluate", "fansi", "GenomicFeatures", "graph", "mime", "plotrix",     "R6", "rstudioapi"   ), update = TRUE, ask = FALSE)

more details: BiocManager::valid()$too_new, BiocManager::valid()$out_of_date

Warning message: 15 packages out-of-date; 0 packages too new 

The code that leads to error is:

library(data.table) library(gplots) files <- list.files(pattern = "\\.cov.gz$") readBismarkFile <- function(fileName) { bismarkCoverage = fread(paste('zcat', fileName) ,stringsAsFactors = F) bismarkCoverage = bismarkCoverage[,-4] colnames(bismarkCoverage) = c("chr", "start", "end", paste(fileName, "meth"), paste(fileName, "unmeth")) return(bismarkCoverage) } bismarkCoverageFinal = NA for (file in files) { if is.na(bismarkCoverageFinal)){ bismarkCoverageFinal = readBismarkFile(file) print(colnames(bismarkCoverageFinal)) } else { if (file != "S_12.bismark.cov.gz") bismarkCoverageFinal = merge(bismarkCoverageFinal, readBismarkFile(file), by=c("chr","start","end")) print(colnames(bismarkCoverageFinal)) } } bismarkCoverageFinal = as.data.frame(bismarkCoverageFinal) ### BASIC STATS nrow(bismarkCoverageFinal) vec <- c(seq(from=4, to=ncol(bismarkCoverageFinal), by=2)) sampleMeth <- as.matrix(bismarkCoverageFinal[ , vec]) vec = vec + 1 sampleCov <- as.matrix(bismarkCoverageFinal[ , vec]) ### DMR SEQ library(dmrseq) M <- sampleMeth Cov <- sampleCov + sampleMeth chr <- as.character(bismarkCoverageFinal[,1]) pos <- bismarkCoverageFinal[,2] celltype <- pData(BS.chr21)$CellType sampleNames <- paste("S", 1:11, sep="_")

bs <- BSseq(chr = chr, pos = pos,             M = M, Cov = Cov,              sampleNames = sampleNames)

colours = rep("red", ncol(forDistance)) colours[c(1,9,11)] = "darkblue" colours [c(3,5,7)] = "darkred" pData(bs)$CellType <- colours testCovariate <- "CellType" blocks <- dmrseq(bs=bs[1:1000000,], cutoff = 0.05, testCovariate=testCovariate, block = TRUE, minInSpan = 50, bpSpan = 1e3, maxGapSmooth = 1e3, maxGap = 5e3) ADD REPLYlink written 9 weeks ago by kuckunniwid0 I think that the error is random and happens due to some incorrect permutations. So there is no "safe" settings - sometimes I lucky to not to get the error, sometimes I am not. And of course it takes a lot of time, each computation, so it is really nervous - waiting for the end of computations while this error may appear close to the end of the process and ruin everything =) I will send you the data if my department will allow it...unfortunately these are unpublished data... ADD REPLYlink written 9 weeks ago by kuckunniwid0 Thanks for the additional information. BiocManager is a new package that helps with installation and keeping packages up-to-date - the valid() function checks for out of date packages and outputs instructions to update packages, and the install() function will soon replace biocLite() for installing Bioconductor packages. It's not required, but the output tells me that while you have some out-of-date packages, it appears that dmrseq and its dependencies are not among them. It would be great if you could send me a small portion of the data, perhaps a single chromosome (e.g.14 as I mentioned before, since it was the culprit in the original post), that reproduces the error. From your original post, the error is not being thrown during the permutations, but rather in the step of scoring observed candidate regions. A note about your code - the line celltype <- pData(BS.chr21)$CellType fetches the condition labels from the example data included in dmrseq. This shouldn't be used in your analysis. Instead, the testCovariate should specify the column of pData(bs) that contains the conditions you wish to compare. It looks like later on, you define the "CellType" column to be a vector of three colors (red = samples 2,4,6,8,10 darkblue = samples 1,9,11, and darkred = samples 3,5,7). So here dmrseq will compare these three sample groups. Is this what you intend?

Best, Keegan

Dear Keegan,

thanks a lot for the answer. Once I'll receive the permission I will send it to you. I faced another error:

...Chromosome chrUn_KI270333v1: Error in (function (classes, fdef, mtable)  :

  unable to find an inherited method for function ‘rowSums2’ for signature ‘"DelayedArray"’

Other weird chromosomes (such as ...Chromosome chrUn_KI270330v1: Smoothed (0 min). No candidates found.) worked OK.

Yes, I am trying to compare 3 categories (normal samples vs 2 different types of pre-cancer). However the FDR seems to be too strict and I struggle trying to find significant variants...Will the increase of number of permutations potentially improve the power of detection?

1

Increasing the number of permutations could potentially improve the power of detection, especially if you are currently using very few. You might also check the individual comparisons between normal and each type of pre-cancer separately. In addition, from the code you sent it looks like you are searching for large blocks. You could also try looking for smaller local changes by altering the settings as described in the vignette.

2
9 weeks ago by
keegan50
keegan50 wrote:

I think I found the most likely cause of the error being thrown. Essentially, I don't think the error handling for region-level model fitting was correct when the covariate of interest is a multi-level factor. When a region fit failed to converge, it would output a row to the results table that assumed the covariate of interest was a two-group factor or continuous covariate which led to the rbind error.

I just pushed a patch for this issue, which you can obtain by reinstalling dmrseq from GitHub right away (install_github("kdkorthauer/dmrseq"), or from Bioconductor devel tomorrow from around 12:00PM EST (BiocManager::install("dmrseq", version = "devel")). Please let me know if that fixes the issue!

Thanks a lot! Will reinstall and check!

Hi, I am facing a similar issue with dmrseq.

I am running it on test samples (on some regions of the genome). Here is the error:

Assuming the test covariate group is a factor. Condition: 2 vs 1

Parallelizing using 16 workers/cores (backend: BiocParallel:MulticoreParam).

Computing on 1 chromosome(s) at a time. Detecting candidate regions with coefficient larger than 0.1 in magnitude.

...Chromosome JH7928281: Smoothed (0.71 min). 2 regions scored (0.15 min). ...Chromosome KQ0304911: Smoothed (0.12 min).

No candidates found. ...Chromosome chr9: Smoothed (0.45 min). No candidates found. ...Chromosome KQ0304881: Smoothed (0.18 min). No candidates found. ...Chromosome chr16: Smoothed (0.23 min). 3 regions scored (0.21 min). ...Chromosome GL4563591:

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘rowSums2’ for signature ‘"DelayedArray"’

Also,

I have tried to install and run analysis in two ways:

- installing dmrseq from GitHub

- installing dmrseq from Bioconductor

Can you inspect the output of BiocManager::valid() ? If it indicates that any packages are out of date, please try updating them and trying again.

I did that as well. Nothing is outdated.

I am tying by removing the "not normal" (extra chromosomes) like  JH792828.1, etc. But, I assume, including these extra chromosomes should not create any error

Thanks for checking on that.

I'm not sure where this error is coming from. Would it be possible to send me a small test .rds file that contains the BSseq input object - could just contain the offending chromosome GL4563591 - along with the code used to call dmrseq? If I can reproduce the problem, it will help me track down the error.

You can send it via email to keegan@jimmy.harvard.edu

Thanks!

Yes, I will send that. When I removed those chromosomes, it worked fine and I got results after 10 permutations.

0
5 weeks ago by
keegan50
keegan50 wrote:

Hi Deepak,

Thanks for sending the files. I was able to track down the issue and make a fix. The problem was related to improper handling of very small chromosomes that end up with zero coverage after filtering loci.

After applying the fix, the latest version of dmrseq runs your code without errors for me. You can obtain the latest version from Github immediately, or from BiocManager::install() tomorrow.

Please let me know if you come across any further issues.

Best,

Keegan

Content
Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.