Bumphunter output "regions" of 1 probe (L =1)
1
0
Entering edit mode
@isbrorson-13349
Last seen 2.3 years ago

Hi all,

I have two related questions regarding bumphunter

1) To get a reasonable number of candidate bumps, I need to use pickcutoffQ =.999, and even then the output would be ~ 18,000 candidate bumps. If I set the quantile to 95% or even 99%, it will result in at least 80 000 bumps (I did a quick check and ran it with B=1). The minfi tutorial (https://www.bioconductor.org/help/course-materials/2015/BioC2015/methylation450k.html) recommends no more than 30,000 candidate bumps. Is there any reason to be concerned about my data ? Code:

 dmrs <- bumphunter(GRSet, design = designMatrix, pickCutoff=T, pickCutoffQ=.999,B=1000, type="M", nullMethod='bootstrap')


My designmatrix includes SVs as covariates and looks like this:

  (Intercept) pheno.treat$CohortFollowup sv.treat$V1  sv.treat$V2 sv.treat$V3   sv.treat$V4 sv.treat$V5  sv.treat$V6 1 1 0 -0.061862297 -0.043804370 0.155861131 -0.1339645911 0.122396963 -0.071573199 2 1 0 0.126700203 -0.081453396 0.150190568 -0.1488971458 0.019325485 0.088171420 3 1 0 -0.042331161 -0.086767917 0.109915301 0.0101167599 0.060133643 0.127854026 4 1 0 0.266824697 -0.295717842 -0.006385234 -0.0003092789 -0.168059079 0.013780865 5 1 0 -0.058048360 -0.060320928 0.028131679 -0.1240023344 0.064810238 0.106945146 6 1 0 0.093184655 -0.255128273 0.003195866 -0.0219380222 -0.011526742 0.007678827 Truncated for simplicity  2) Using the code shown in question 1, the resulting list of candidate bumps includes a high number of regions consisting of only 1 probe, like shown below. head(dmrs.b1000.cutoff.999.sva.treat$table, 20)
chr     start       end     value     area cluster indexStart indexEnd L clusterL      p.value  fwer  p.valueArea fwerArea
12297 chr12 133000178 133000178 -4.353422 4.353422   98843     527238   527238 1        8 0.000000e+00 0.000 0.0025740044    0.770
1126  chr10  15210264  15210264  4.345061 4.345061   40704     409605   409605 1       14 0.000000e+00 0.000 0.0025935786    0.773
10539  chr1  78444904  78444904 -3.589283 3.589283   16759      34556    34556 1       16 2.796311e-06 0.002 0.0043790232    0.889
13678 chr17  45266772  45266772 -3.314790 3.314790  161285     656273   656273 1       17 2.796311e-06 0.002 0.0053689173    0.930
14037 chr19    797342    797342 -3.311399 3.311399  178278     689503   689503 1       12 2.796311e-06 0.002 0.0053856952    0.931
15910  chr3 156392701 156392703 -1.995375 3.990751  268542     169109   169110 2       29 1.537971e-05 0.011 0.0032772766    0.824
11050 chr10  17659399  17659399 -2.681196 2.681196   40985     410113   410113 1        8 4.474098e-05 0.032 0.0090111125    0.976
14236 chr19  19779476  19779476 -2.676268 2.676268  184718     705014   705014 1       12 4.613913e-05 0.033 0.0090572517    0.976
16173  chr4  48485301  48485301 -2.666049 2.666049  280261     191412   191412 1       12 4.753729e-05 0.034 0.0091481318    0.976
13545 chr17  18965556  18965556 -2.451483 2.451483  156147     645050   645050 1        2 1.090561e-04 0.072 0.0111209293    0.985
14464  chr2  10588646  10588646 -2.437339 2.437339  194619      79031    79031 1       19 1.202414e-04 0.078 0.0112481614    0.985
11657 chr11  64684723  64684723 -2.244582 2.244582   67553     463915   463915 1       13 2.348901e-04 0.140 0.0135495255    0.993
6431  chr20  62367632  62367893  0.989776 6.928432  235817     744686   744692 7       27 2.334920e-04 0.153 0.0007857634    0.413
14307 chr19  41119278  41119278 -2.222419 2.222419  187608     711367   711367 1       10 2.642514e-04 0.154 0.0138473326    0.994


My previous experience with bumphunter have never resulted in this many 1-probe-regions. Does anyone know why this happens? And how can I change it so that my list will consist of regions with multiple CpGs? The data set contains ~790,000 probes after normalisation and QC.

Any help is highly appreciated!

Thanks, Ina

DMR bumphunter EPIC chip • 281 views
0
Entering edit mode
@james-w-macdonald-5106
Last seen 4 hours ago
United States

Using the results of a statistical test to infer incoming quality of the data is probably suboptimal. Ideally you would have done some EDA prior to ensure that the data all look good, or to at least identify potential problems.

As to why you have lots of 1-probe regions, if you have big differences between your two groups at a single CpG, and those differences are bigger than what you see when you bootstrap, you will get significant results. But again, that's not something you should be trying to figure out now, but something you might have expected, given the results of EDA (say an MDS plot for example).

0
Entering edit mode

Thank you so much for answering. My data has been well explored and checked beforehand, and we do expect quite big differences between the groups. And reading your answer now, I understand how that would result in lots of 1 probe-regions - I just wasn't able to see that before.

Thanks for making it clear !

Ina