Search
Question: Internal bug of DMR-Seq?
0
gravatar for kuckunniwid
9 days ago by
kuckunniwid0 wrote:

Working with DMR-seq I get the following error message from time to time:

Detecting candidate regions with coefficient larger than 0.05 in magnitude.
...Chromosome chr1: Smoothed (0.89 min). 335 regions scored (1.64 min). 
...Chromosome chr10: Smoothed (0.74 min). 117 regions scored (0.38 min). 
...Chromosome chr11: Smoothed (0.65 min). 119 regions scored (0.34 min). 
...Chromosome chr11_KI270721v1_random: Smoothed (0 min). No candidates found. 
...Chromosome chr12: Smoothed (0.66 min). 248 regions scored (0.44 min). 
...Chromosome chr13: Smoothed (0.54 min). 138 regions scored (0.48 min). 
...Chromosome chr14: Smoothed (0.54 min). Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match

The moment when this error appears is dependent on input parameters and sometimes does not appear at all. Does smb faced this problem? How can I fix it?

 

 

 

ADD COMMENTlink modified 7 days ago by keegan50 • written 9 days ago by kuckunniwid0

Hi,

Thanks for the report. So that I can reproduce the error, would you be able to give me a bit more information? Specifically,

  • What is the output of BiocManager::valid() on your system?

  • What is the code you are using that produces this error? Likewise, what settings do not produce the error?

  • If possible, could you send me an .rds object containing a portion of the data that throws the error (specifically, chr14 here)? You can send via email to keegan@jimmy.harvard.edu.

Hopefully this will let me track down and fix the issue. Thanks!

Best,

Keegan

ADD REPLYlink modified 9 days ago • written 9 days ago by keegan50

Hi keegan,

thanks for the fast reply! I have troubles even with the first line - somehow my DmrSeq worked without the package BiocManager. Installed it, the output is 

> BiocManager::valid()

* sessionInfo()

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocManager_1.30.2

loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1   

Bioconductor version '3.7'

  * 15 packages out-of-date
  * 0 packages too new

create a valid installation with

  BiocManager::install(c(
    "edgeR", "evaluate", "fansi", "GenomicFeatures", "graph", "mime", "plotrix",
    "R6", "rstudioapi"
  ), update = TRUE, ask = FALSE)

more details: BiocManager::valid()$too_new, BiocManager::valid()$out_of_date

Warning message:
15 packages out-of-date; 0 packages too new 

 

ADD REPLYlink written 8 days ago by kuckunniwid0

The code that leads to error is:

library(data.table)
library(gplots)
files <- list.files(pattern = "\\.cov.gz$")

readBismarkFile <- function(fileName) {
  bismarkCoverage = fread(paste('zcat', fileName) ,stringsAsFactors = F)
  bismarkCoverage = bismarkCoverage[,-4]
  colnames(bismarkCoverage) = c("chr", "start", "end", paste(fileName, "meth"), paste(fileName, "unmeth"))
  return(bismarkCoverage)
}

bismarkCoverageFinal = NA
for (file in files) {
  if is.na(bismarkCoverageFinal)){
    bismarkCoverageFinal = readBismarkFile(file)
    print(colnames(bismarkCoverageFinal))
  } else {
    if (file != "S_12.bismark.cov.gz")
      bismarkCoverageFinal = merge(bismarkCoverageFinal, readBismarkFile(file), by=c("chr","start","end"))
    print(colnames(bismarkCoverageFinal))
  }
}

bismarkCoverageFinal = as.data.frame(bismarkCoverageFinal)
### BASIC STATS
nrow(bismarkCoverageFinal)
vec <-  c(seq(from=4, to=ncol(bismarkCoverageFinal), by=2))
sampleMeth <- as.matrix(bismarkCoverageFinal[ , vec])
vec = vec + 1
sampleCov <- as.matrix(bismarkCoverageFinal[ , vec])

### DMR SEQ
library(dmrseq)
M <- sampleMeth
Cov <- sampleCov + sampleMeth
chr <- as.character(bismarkCoverageFinal[,1])
pos <- bismarkCoverageFinal[,2]
celltype <- pData(BS.chr21)$CellType
sampleNames <- paste("S", 1:11, sep="_")

bs <- BSseq(chr = chr, pos = pos,
            M = M, Cov = Cov, 
            sampleNames = sampleNames)

colours = rep("red", ncol(forDistance))
colours[c(1,9,11)] = "darkblue"
colours [c(3,5,7)] = "darkred"
pData(bs)$CellType <- colours


testCovariate <- "CellType"

blocks <- dmrseq(bs=bs[1:1000000,],
                 cutoff = 0.05,
                 testCovariate=testCovariate,
                 block = TRUE,
                 minInSpan = 50,
                 bpSpan = 1e3,
                 maxGapSmooth = 1e3,
                 maxGap = 5e3)

ADD REPLYlink written 8 days ago by kuckunniwid0

I think that the error is random and happens due to some incorrect permutations. So there is no "safe" settings - sometimes I lucky to not to get the error, sometimes I am not. And of course it takes a lot of time, each computation, so it is really nervous - waiting for the end of computations while this error may appear close to the end of the process and ruin everything =) I will send you the data if my department will allow it...unfortunately these are unpublished data...

ADD REPLYlink written 8 days ago by kuckunniwid0

Thanks for the additional information. BiocManager is a new package that helps with installation and keeping packages up-to-date - the valid() function checks for out of date packages and outputs instructions to update packages, and the install() function will soon replace biocLite() for installing Bioconductor packages. It's not required, but the output tells me that while you have some out-of-date packages, it appears that dmrseq and its dependencies are not among them.

It would be great if you could send me a small portion of the data, perhaps a single chromosome (e.g.14 as I mentioned before, since it was the culprit in the original post), that reproduces the error. From your original post, the error is not being thrown during the permutations, but rather in the step of scoring observed candidate regions.

A note about your code - the line  celltype <- pData(BS.chr21)$CellType fetches the condition labels from the example data included in dmrseq. This shouldn't be used in your analysis. Instead, the testCovariate should specify the column of pData(bs) that contains the conditions you wish to compare. It looks like later on, you define the "CellType" column to be a vector of three colors (red = samples 2,4,6,8,10 darkblue = samples 1,9,11, and darkred = samples 3,5,7). So here dmrseq will compare these three sample groups. Is this what you intend?

Best, Keegan

ADD REPLYlink modified 8 days ago • written 8 days ago by keegan50

Dear Keegan,

thanks a lot for the answer. Once I'll receive the permission I will send it to you. I faced another error:

...Chromosome chrUn_KI270333v1: Error in (function (classes, fdef, mtable)  :

  unable to find an inherited method for function ‘rowSums2’ for signature ‘"DelayedArray"’

Other weird chromosomes (such as ...Chromosome chrUn_KI270330v1: Smoothed (0 min). No candidates found.) worked OK.

Yes, I am trying to compare 3 categories (normal samples vs 2 different types of pre-cancer). However the FDR seems to be too strict and I struggle trying to find significant variants...Will the increase of number of permutations potentially improve the power of detection?

ADD REPLYlink written 7 days ago by kuckunniwid0
1

Increasing the number of permutations could potentially improve the power of detection, especially if you are currently using very few. You might also check the individual comparisons between normal and each type of pre-cancer separately. In addition, from the code you sent it looks like you are searching for large blocks. You could also try looking for smaller local changes by altering the settings as described in the vignette.

ADD REPLYlink modified 7 days ago • written 7 days ago by keegan50
2
gravatar for keegan
7 days ago by
keegan50
keegan50 wrote:

I think I found the most likely cause of the error being thrown. Essentially, I don't think the error handling for region-level model fitting was correct when the covariate of interest is a multi-level factor. When a region fit failed to converge, it would output a row to the results table that assumed the covariate of interest was a two-group factor or continuous covariate which led to the rbind error.

I just pushed a patch for this issue, which you can obtain by reinstalling dmrseq from GitHub right away (install_github("kdkorthauer/dmrseq"), or from Bioconductor devel tomorrow from around 12:00PM EST (BiocManager::install("dmrseq", version = "devel")). Please let me know if that fixes the issue!

ADD COMMENTlink modified 7 days ago • written 7 days ago by keegan50

Thanks a lot! Will reinstall and check!

ADD REPLYlink written 6 days ago by kuckunniwid0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 351 users visited in the last hour