Error using Bsmooth.tstat due to NAs
2
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.3 years ago
Dear all, I am trying to use bsseq to analyze WGBS data and identify DMRs following drug treatment. I have a BSseq object consisting of 2 samples (treated and ctrl) that has been smoothed: >smooth An object of type 'BSseq' with 38250590 methylation loci 2 samples has been smoothed with BSmooth (ns = 50, h = 500, maxGap = 100000000) When trying to run BSmooth.tstat, I am encountering the following error due to NAs: >smooth=BSmooth.tstat(smooth, group1="Ctrl", group2="Treated", estimate.var="paired", verbose=TRUE, local.correct=TRUE) preprocessing ... done in 76.1 sec computing stats within groups ... done in 11.9 sec computing stats across groups ... Error in approxfun(xx, yy) : need at least two non-NA values to interpolate Timing stopped at: 7.994 1.649 9.64 However, when I checked in my methylation and coverage matrix, I didn't see any NAs contained in my data, so I am not sure why I am getting this error. > summary(getMeth(smooth)) Ctrl Treated Min. :0.0000 Min. :0.0000 1st Qu.:0.6064 1st Qu.:0.3391 Median :0.8402 Median :0.4816 Mean :0.7131 Mean :0.4365 3rd Qu.:0.9006 3rd Qu.:0.5600 Max. :1.0000 Max. :1.0000 I would appreciate any suggestions or advice. Thank you very much, Fides -- output of sessionInfo(): R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] bsseqData_0.1.3 bsseq_0.8.0 matrixStats_0.8.14 [4] GenomicRanges_1.12.5 IRanges_1.18.4 BiocGenerics_0.6.0 [7] plyr_1.8 loaded via a namespace (and not attached): [1] Biobase_2.20.1 R.methodsS3_1.6.1 RColorBrewer_1.0-5 colorspace_1.2-4 [5] dichromat_2.0-0 grid_3.0.1 labeling_0.2 lattice_0.20-24 [9] locfit_1.5-9.1 munsell_0.4.2 scales_0.2.3 stats4_3.0.1 [13] stringr_0.6.2 tools_3.0.1 zlibbioc_1.6.0 -- Sent via the guest posting facility at bioconductor.org.
Coverage bsseq • 3.0k views
ADD COMMENT
0
Entering edit mode
@kasper-daniel-hansen-2979
Last seen 18 months ago
United States
This is hard to know for sure without knowing much more about the data. My guess is that you have some contigs (chromosomes) which are super small and they cause problems. You can also try setting verbose to say 2 or 3 and see if that helps narrow down where in the function it happens. On Wed, Sep 10, 2014 at 12:18 PM, Fides Lay [guest] <guest at="" bioconductor.org=""> wrote: > Dear all, > > I am trying to use bsseq to analyze WGBS data and identify DMRs following > drug treatment. I have a BSseq object consisting of 2 samples (treated and > ctrl) that has been smoothed: > > >smooth > An object of type 'BSseq' with > 38250590 methylation loci > 2 samples > has been smoothed with > BSmooth (ns = 50, h = 500, maxGap = 100000000) > > When trying to run BSmooth.tstat, I am encountering the following error > due to NAs: > >smooth=BSmooth.tstat(smooth, group1="Ctrl", group2="Treated", > estimate.var="paired", verbose=TRUE, local.correct=TRUE) > preprocessing ... done in 76.1 sec > computing stats within groups ... done in 11.9 sec > computing stats across groups ... Error in approxfun(xx, yy) : > need at least two non-NA values to interpolate > Timing stopped at: 7.994 1.649 9.64 > > However, when I checked in my methylation and coverage matrix, I didn't > see any NAs contained in my data, so I am not sure why I am getting this > error. > > > summary(getMeth(smooth)) > Ctrl Treated > Min. :0.0000 Min. :0.0000 > 1st Qu.:0.6064 1st Qu.:0.3391 > Median :0.8402 Median :0.4816 > Mean :0.7131 Mean :0.4365 > 3rd Qu.:0.9006 3rd Qu.:0.5600 > Max. :1.0000 Max. :1.0000 > > I would appreciate any suggestions or advice. > > Thank you very much, > Fides > > -- output of sessionInfo(): > > R version 3.0.1 (2013-05-16) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] bsseqData_0.1.3 bsseq_0.8.0 matrixStats_0.8.14 > [4] GenomicRanges_1.12.5 IRanges_1.18.4 BiocGenerics_0.6.0 > [7] plyr_1.8 > > loaded via a namespace (and not attached): > [1] Biobase_2.20.1 R.methodsS3_1.6.1 RColorBrewer_1.0-5 > colorspace_1.2-4 > [5] dichromat_2.0-0 grid_3.0.1 labeling_0.2 > lattice_0.20-24 > [9] locfit_1.5-9.1 munsell_0.4.2 scales_0.2.3 stats4_3.0.1 > [13] stringr_0.6.2 tools_3.0.1 zlibbioc_1.6.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
As Tim Triche pointed out off-list: what you're doing does not make sense when you only have 1 sample in each group. I was clearly reading the report too fast. On Sun, Sep 14, 2014 at 1:07 PM, Kasper Daniel Hansen <khansen at="" jhsph.edu=""> wrote: > This is hard to know for sure without knowing much more about the data. > > My guess is that you have some contigs (chromosomes) which are super small > and they cause problems. You can also try setting verbose to say 2 or 3 > and see if that helps narrow down where in the function it happens. > > On Wed, Sep 10, 2014 at 12:18 PM, Fides Lay [guest] < > guest at bioconductor.org> wrote: > >> Dear all, >> >> I am trying to use bsseq to analyze WGBS data and identify DMRs following >> drug treatment. I have a BSseq object consisting of 2 samples (treated and >> ctrl) that has been smoothed: >> >> >smooth >> An object of type 'BSseq' with >> 38250590 methylation loci >> 2 samples >> has been smoothed with >> BSmooth (ns = 50, h = 500, maxGap = 100000000) >> >> When trying to run BSmooth.tstat, I am encountering the following error >> due to NAs: >> >smooth=BSmooth.tstat(smooth, group1="Ctrl", group2="Treated", >> estimate.var="paired", verbose=TRUE, local.correct=TRUE) >> preprocessing ... done in 76.1 sec >> computing stats within groups ... done in 11.9 sec >> computing stats across groups ... Error in approxfun(xx, yy) : >> need at least two non-NA values to interpolate >> Timing stopped at: 7.994 1.649 9.64 >> >> However, when I checked in my methylation and coverage matrix, I didn't >> see any NAs contained in my data, so I am not sure why I am getting this >> error. >> >> > summary(getMeth(smooth)) >> Ctrl Treated >> Min. :0.0000 Min. :0.0000 >> 1st Qu.:0.6064 1st Qu.:0.3391 >> Median :0.8402 Median :0.4816 >> Mean :0.7131 Mean :0.4365 >> 3rd Qu.:0.9006 3rd Qu.:0.5600 >> Max. :1.0000 Max. :1.0000 >> >> I would appreciate any suggestions or advice. >> >> Thank you very much, >> Fides >> >> -- output of sessionInfo(): >> >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] bsseqData_0.1.3 bsseq_0.8.0 matrixStats_0.8.14 >> [4] GenomicRanges_1.12.5 IRanges_1.18.4 BiocGenerics_0.6.0 >> [7] plyr_1.8 >> >> loaded via a namespace (and not attached): >> [1] Biobase_2.20.1 R.methodsS3_1.6.1 RColorBrewer_1.0-5 >> colorspace_1.2-4 >> [5] dichromat_2.0-0 grid_3.0.1 labeling_0.2 >> lattice_0.20-24 >> [9] locfit_1.5-9.1 munsell_0.4.2 scales_0.2.3 stats4_3.0.1 >> [13] stringr_0.6.2 tools_3.0.1 zlibbioc_1.6.0 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode

I had the same problem which I was struggling for days. This lovely error:

Error in approxfun(xx, yy) :
  need at least two non-NA values to interpolate

Moreover sometimes it worked, however there was info about this error in all rows in column with adjusted stat. Hence dmr analysis was impossible.

As the autohor said, chromosomes are too small to perform approxfun namely not enough CpG per chromosome (as I understood). So how to fix it? First take a look about your chromosomes:

chr_info <- your_bsseq_obj@rowRanges@seqnames
test <- data.frame(chr_info@values,chr_info@lenghts)

In this dataframe you'll see the length of particular chr. If lengths is 1-3 you'll recieve this approxfun error. You have to remove such chromosomes from you .bedGraphs (from bismark or methydackel) before reading the data, before this step:

your_bsseq_obj = bsseq::read.bismark(
  files = files,
  colData = data.frame(row.names = names),
  rmZeroCov = FALSE,
  strandCollapse = FALSE
)

You can do this manually, however I recommend to write some simple script in python or bash. After that when trefoil chromosomes are removed, read data again and repeat the analysis. BSmooth.tstat should work fine.

ADD REPLY
0
Entering edit mode
parker • 0
@parker-7456
Last seen 7.5 years ago
Switzerland

I am also having the same problem - but when I get the information on the smoothed data: 

summary(getMeth(bsseq.data.smoothed))

I see that there are quite a few NAs ~700 for most of my samples. I don't know whether this is because I did a targeted approach and many of the CpGs were not targeted. Is there somehow I can remove those CpGs which have not been covered by the targeted approach?

Many thanks in advance for your help!

Hannah

ADD COMMENT

Login before adding your answer.

Traffic: 521 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6