Error using Bsmooth.tstat due to NAs

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 11.3 years ago

Dear all, I am trying to use bsseq to analyze WGBS data and identify DMRs following drug treatment. I have a BSseq object consisting of 2 samples (treated and ctrl) that has been smoothed: >smooth An object of type 'BSseq' with 38250590 methylation loci 2 samples has been smoothed with BSmooth (ns = 50, h = 500, maxGap = 100000000) When trying to run BSmooth.tstat, I am encountering the following error due to NAs: >smooth=BSmooth.tstat(smooth, group1="Ctrl", group2="Treated", estimate.var="paired", verbose=TRUE, local.correct=TRUE) preprocessing ... done in 76.1 sec computing stats within groups ... done in 11.9 sec computing stats across groups ... Error in approxfun(xx, yy) : need at least two non-NA values to interpolate Timing stopped at: 7.994 1.649 9.64 However, when I checked in my methylation and coverage matrix, I didn't see any NAs contained in my data, so I am not sure why I am getting this error. > summary(getMeth(smooth)) Ctrl Treated Min. :0.0000 Min. :0.0000 1st Qu.:0.6064 1st Qu.:0.3391 Median :0.8402 Median :0.4816 Mean :0.7131 Mean :0.4365 3rd Qu.:0.9006 3rd Qu.:0.5600 Max. :1.0000 Max. :1.0000 I would appreciate any suggestions or advice. Thank you very much, Fides -- output of sessionInfo(): R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] bsseqData_0.1.3 bsseq_0.8.0 matrixStats_0.8.14 [4] GenomicRanges_1.12.5 IRanges_1.18.4 BiocGenerics_0.6.0 [7] plyr_1.8 loaded via a namespace (and not attached): [1] Biobase_2.20.1 R.methodsS3_1.6.1 RColorBrewer_1.0-5 colorspace_1.2-4 [5] dichromat_2.0-0 grid_3.0.1 labeling_0.2 lattice_0.20-24 [9] locfit_1.5-9.1 munsell_0.4.2 scales_0.2.3 stats4_3.0.1 [13] stringr_0.6.2 tools_3.0.1 zlibbioc_1.6.0 -- Sent via the guest posting facility at bioconductor.org.

Coverage bsseq • 3.3k views

ADD COMMENT • link updated 2.1 years ago by Adam • 0 • written 11.2 years ago by Guest User ★ 13k

0

Entering edit mode

Kasper Daniel Hansen ★ 6.5k

@kasper-daniel-hansen-2979

Last seen 2.5 years ago

United States

This is hard to know for sure without knowing much more about the data. My guess is that you have some contigs (chromosomes) which are super small and they cause problems. You can also try setting verbose to say 2 or 3 and see if that helps narrow down where in the function it happens. On Wed, Sep 10, 2014 at 12:18 PM, Fides Lay [guest] <guest at="" bioconductor.org=""> wrote: > Dear all, > > I am trying to use bsseq to analyze WGBS data and identify DMRs following > drug treatment. I have a BSseq object consisting of 2 samples (treated and > ctrl) that has been smoothed: > > >smooth > An object of type 'BSseq' with > 38250590 methylation loci > 2 samples > has been smoothed with > BSmooth (ns = 50, h = 500, maxGap = 100000000) > > When trying to run BSmooth.tstat, I am encountering the following error > due to NAs: > >smooth=BSmooth.tstat(smooth, group1="Ctrl", group2="Treated", > estimate.var="paired", verbose=TRUE, local.correct=TRUE) > preprocessing ... done in 76.1 sec > computing stats within groups ... done in 11.9 sec > computing stats across groups ... Error in approxfun(xx, yy) : > need at least two non-NA values to interpolate > Timing stopped at: 7.994 1.649 9.64 > > However, when I checked in my methylation and coverage matrix, I didn't > see any NAs contained in my data, so I am not sure why I am getting this > error. > > > summary(getMeth(smooth)) > Ctrl Treated > Min. :0.0000 Min. :0.0000 > 1st Qu.:0.6064 1st Qu.:0.3391 > Median :0.8402 Median :0.4816 > Mean :0.7131 Mean :0.4365 > 3rd Qu.:0.9006 3rd Qu.:0.5600 > Max. :1.0000 Max. :1.0000 > > I would appreciate any suggestions or advice. > > Thank you very much, > Fides > > -- output of sessionInfo(): > > R version 3.0.1 (2013-05-16) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] bsseqData_0.1.3 bsseq_0.8.0 matrixStats_0.8.14 > [4] GenomicRanges_1.12.5 IRanges_1.18.4 BiocGenerics_0.6.0 > [7] plyr_1.8 > > loaded via a namespace (and not attached): > [1] Biobase_2.20.1 R.methodsS3_1.6.1 RColorBrewer_1.0-5 > colorspace_1.2-4 > [5] dichromat_2.0-0 grid_3.0.1 labeling_0.2 > lattice_0.20-24 > [9] locfit_1.5-9.1 munsell_0.4.2 scales_0.2.3 stats4_3.0.1 > [13] stringr_0.6.2 tools_3.0.1 zlibbioc_1.6.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 11.2 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

As Tim Triche pointed out off-list: what you're doing does not make sense when you only have 1 sample in each group. I was clearly reading the report too fast. On Sun, Sep 14, 2014 at 1:07 PM, Kasper Daniel Hansen <khansen at="" jhsph.edu=""> wrote: > This is hard to know for sure without knowing much more about the data. > > My guess is that you have some contigs (chromosomes) which are super small > and they cause problems. You can also try setting verbose to say 2 or 3 > and see if that helps narrow down where in the function it happens. > > On Wed, Sep 10, 2014 at 12:18 PM, Fides Lay [guest] < > guest at bioconductor.org> wrote: > >> Dear all, >> >> I am trying to use bsseq to analyze WGBS data and identify DMRs following >> drug treatment. I have a BSseq object consisting of 2 samples (treated and >> ctrl) that has been smoothed: >> >> >smooth >> An object of type 'BSseq' with >> 38250590 methylation loci >> 2 samples >> has been smoothed with >> BSmooth (ns = 50, h = 500, maxGap = 100000000) >> >> When trying to run BSmooth.tstat, I am encountering the following error >> due to NAs: >> >smooth=BSmooth.tstat(smooth, group1="Ctrl", group2="Treated", >> estimate.var="paired", verbose=TRUE, local.correct=TRUE) >> preprocessing ... done in 76.1 sec >> computing stats within groups ... done in 11.9 sec >> computing stats across groups ... Error in approxfun(xx, yy) : >> need at least two non-NA values to interpolate >> Timing stopped at: 7.994 1.649 9.64 >> >> However, when I checked in my methylation and coverage matrix, I didn't >> see any NAs contained in my data, so I am not sure why I am getting this >> error. >> >> > summary(getMeth(smooth)) >> Ctrl Treated >> Min. :0.0000 Min. :0.0000 >> 1st Qu.:0.6064 1st Qu.:0.3391 >> Median :0.8402 Median :0.4816 >> Mean :0.7131 Mean :0.4365 >> 3rd Qu.:0.9006 3rd Qu.:0.5600 >> Max. :1.0000 Max. :1.0000 >> >> I would appreciate any suggestions or advice. >> >> Thank you very much, >> Fides >> >> -- output of sessionInfo(): >> >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] bsseqData_0.1.3 bsseq_0.8.0 matrixStats_0.8.14 >> [4] GenomicRanges_1.12.5 IRanges_1.18.4 BiocGenerics_0.6.0 >> [7] plyr_1.8 >> >> loaded via a namespace (and not attached): >> [1] Biobase_2.20.1 R.methodsS3_1.6.1 RColorBrewer_1.0-5 >> colorspace_1.2-4 >> [5] dichromat_2.0-0 grid_3.0.1 labeling_0.2 >> lattice_0.20-24 >> [9] locfit_1.5-9.1 munsell_0.4.2 scales_0.2.3 stats4_3.0.1 >> [13] stringr_0.6.2 tools_3.0.1 zlibbioc_1.6.0 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]

ADD REPLY • link 11.2 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

I had the same problem which I was struggling for days. This lovely error:

Error in approxfun(xx, yy) :
  need at least two non-NA values to interpolate

Moreover sometimes it worked, however there was info about this error in all rows in column with adjusted stat. Hence dmr analysis was impossible.

As the autohor said, chromosomes are too small to perform approxfun namely not enough CpG per chromosome (as I understood). So how to fix it? First take a look about your chromosomes:

chr_info <- your_bsseq_obj@rowRanges@seqnames
test <- data.frame(chr_info@values,chr_info@lenghts)

In this dataframe you'll see the length of particular chr. If lengths is 1-3 you'll recieve this approxfun error. You have to remove such chromosomes from you .bedGraphs (from bismark or methydackel) before reading the data, before this step:

your_bsseq_obj = bsseq::read.bismark(
  files = files,
  colData = data.frame(row.names = names),
  rmZeroCov = FALSE,
  strandCollapse = FALSE
)

You can do this manually, however I recommend to write some simple script in python or bash. After that when trefoil chromosomes are removed, read data again and repeat the analysis. BSmooth.tstat should work fine.

ADD REPLY • link 2.1 years ago Adam • 0

0

Entering edit mode

parker • 0

@parker-7456

Last seen 8.4 years ago

Switzerland

I am also having the same problem - but when I get the information on the smoothed data:

summary(getMeth(bsseq.data.smoothed))

I see that there are quite a few NAs ~700 for most of my samples. I don't know whether this is because I did a targeted approach and many of the CpGs were not targeted. Is there somehow I can remove those CpGs which have not been covered by the targeted approach?

Many thanks in advance for your help!

Hannah

ADD COMMENT • link 10.7 years ago parker • 0

Login before adding your answer.