Question: Varying results from DSS when specifying different BSobj
0
gravatar for varunorama
27 days ago by
varunorama0
varunorama0 wrote:

Hello!

I am using the DSS package to identify differentially methylated loci (DMLs) and regions (DMRs) between treatment and control samples of a time-series data set. However, I am getting varying results based on how I design my BSobj dataset. If anyone has any insight as to why I am getting varying results, I would greatly appreciate it!

The dataset I am using is a time-series dataset composed of 9-time points (6, 12, 24, 48, 72, 96, 120, 144, 168); with one treatment and control per timepoint. There is only one replicate per sample. The contrasts that I am running are between the treatment and the control of each time point independently, and I am trying to identify differentially methylated loci (DMLs) and regions (DMRs)

The first way I designed my BSobj was:

library(DSS)

BSobj.allhr = makeBSseqData( list(C_6hr_cpg[,c(1:4)], T_6hr_cpg[,c(1:4)],
                                  C_12hr_cpg[,c(1:4)], T_12hr_cpg[,c(1:4)],
                                  C_24hr_cpg[,c(1:4)], T_24hr_cpg[,c(1:4)],
                                  C_48hr_cpg[,c(1:4)], T_48hr_cpg[,c(1:4)],
                                  C_72hr_cpg[,c(1:4)], T_72hr_cpg[,c(1:4)],
                                  C_96hr_cpg[,c(1:4)], T_96hr_cpg[,c(1:4)],
                                  C_120hr_cpg[,c(1:4)], T_120hr_cpg[,c(1:4)],
                                  C_144hr_cpg[,c(1:4)], T_144hr_cpg[,c(1:4)],
                                  C_168hr_cpg[,c(1:4)], T_168hr_cpg[,c(1:4)]),
                           c("C6", "H6","C12", "H12","C24","H24","C48", "H48",
                             "C72", "H72","C96", "H96","C120", "H120","C144", "H144","C168", "H168") )

where I incorporated all of my timepoints into a single object called BSobj.allhr. From here I am comparing each timepoint treatment to it's control. As an example, for my 6hr samples, I would run:

dmltest.6hr.sm500 = DMLtest(BSobj.allhr,
                            group1 = c("H6"),group2 = c("C6"),
                            smoothing = TRUE, smoothing.span = 500)
dmltest.6hr.sm500.dml = callDML(dmltest.6hr.sm500, delta = 0.1, p.threshold=0.01)
dmltest.6hr.sm500.dmrs = callDMR(dmltest.6hr.sm500, p.threshold=0.05, delta = 0.1, dis.merge = 250, minCG = 3, pct.sig = 0.5, minlen = 50)

The resulting DML and DMR file produced approx. ~800 DML's and 185 DMRs.

However, if I create a reduced BSobj for only the timepoint comparison I am running (e.g. 6hr) and run the DML and DMR analysis as such:

BSobj.6hr = makeBSseqData( list(C_6hr_cpg[,c(1:4)], T_6hr_cpg[,c(1:4)]),
                             c("C6", "H6") )
dmltest.6hr.sm500.dml.reduced = callDML(dmltest.6hr.sm500.reduced, delta = 0.1, p.threshold=0.01)
dmltest.6hr.sm500.dmrs.reduced = callDMR(dmltest.6hr.sm500.reduced, p.threshold=0.05, delta = 0.1, dis.merge = 250, minCG = 3, pct.sig = 0.5, minlen = 50)

I identify ~1360 DML's and ~380 DMR's, a significant increase from the previous analysis.

Several questions arise: By incorporating all of the timepoints, is that affecting the smoothing approach somehow? Of the two approaches, would you recommend the second? Any advice would be greatly appreciated.

Thank you.

dss bsseq dss-single • 28 views
ADD COMMENTlink modified 27 days ago by James W. MacDonald51k • written 27 days ago by varunorama0
Answer: Varying results from DSS when specifying different BSobj
0
gravatar for James W. MacDonald
27 days ago by
United States
James W. MacDonald51k wrote:

When you fit a linear model and then compute contrasts, the denominator of your contrast is based on the within-group variability, averaged across all your groups. In general, given the fact that sample variance estimates are not very efficient, it's better to include more data rather than less. In other words, in your first example, the denominator for your contrast includes data from all the other groups, even though they don't directly play a part in the computation of the numerator of that contrast!

That said, if you have some groups that are far more variable than others, then they will affect the overall power to detect differences, because they will (possibly incorrectly?) increase the overall variance estimate. In a situation like yours, there may not be a biological reason for increased variability of measurements at any time point, in which case it is probably best to include everything. But you are the analyst, and it's your call to make (and defend).

ADD COMMENTlink written 27 days ago by James W. MacDonald51k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 427 users visited in the last hour