How to remove unwanted probes before normalization in 450k data
1
1
Entering edit mode
AST ▴ 50
@ast-8648
Last seen 4.7 years ago
INDIA

Can someone please suggest me a way to remove unwanted probes (XY probes, SNP associated probes, etc.) from my 450k dataset prior to normalization. I don't want them to screw up the downstream data analysis.

I tried removing these probes from rgSet object of minfi but it didn't help. Moreover, after this I was not able to convert it to grset object. Following is the error message:

> RGsetEx <- read.450k.exp(targets = targets, extended = TRUE)
> dim(RGsetEx)
Features  Samples
622399       10
> detP <- detectionP(RGsetEx)
> keep <- rowSums(detP < 0.01) == ncol(RGsetEx)
> RGsetEx <- RGsetEx[keep,]
> dim(RGsetEx)
Features  Samples
619508       10
> grset <- preprocessFunnorm(RGsetEx, nPCs=8, sex = NULL, bgCorr = TRUE, dyeCorr = TRUE, verbose = TRUE)
[preprocessFunnorm] Background and dye bias correction with noob
Error in getGreen(object)[IRed\$AddressA, ] : subscript out of bounds

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252    LC_MONETARY=English_India.1252
[4] LC_NUMERIC=C                   LC_TIME=English_India.1252

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] IlluminaHumanMethylation450kmanifest_0.4.0 missMethyl_1.4.0
[3] RSQLite_1.0.0                              DBI_0.3.1
[5] ENmix_1.4.1                                doParallel_1.0.10
[7] minfi_1.16.1                               bumphunter_1.10.0
[9] locfit_1.5-9.1                             iterators_1.0.8
[11] foreach_1.4.3                              Biostrings_2.38.4
[13] XVector_0.10.0                             SummarizedExperiment_1.0.2
[15] GenomicRanges_1.22.4                       GenomeInfoDb_1.6.3
[17] IRanges_2.4.8                              S4Vectors_0.8.11
[19] lattice_0.20-33                            Biobase_2.30.0
[21] BiocGenerics_0.16.1

loaded via a namespace (and not attached):
[1] nor1mix_1.2-1
[2] splines_3.2.3
[3] ellipse_0.3-8
[4] statmod_1.4.24
[5] doRNG_1.6
[6] Rsamtools_1.22.0
[7] methylumi_2.16.0
[8] impute_1.44.0
[9] limma_3.26.8
[11] digest_0.6.9
[12] RColorBrewer_1.1-2
[13] colorspace_1.2-6
[14] preprocessCore_1.32.0
[15] Matrix_1.2-4
[16] plyr_1.8.3
[17] GEOquery_2.36.0
[18] siggenes_1.44.0
[19] XML_3.98-1.4
[20] mixOmics_5.2.0
[21] biomaRt_2.26.1
[22] genefilter_1.52.1
[23] zlibbioc_1.16.0
[24] xtable_1.8-2
[25] corpcor_1.6.8
[26] scales_0.4.0
[27] BiocParallel_1.4.3
[28] annotate_1.48.0
[29] beanplot_1.2
[30] pkgmaker_0.22
[31] mgcv_1.8-12
[32] ggplot2_2.1.0
[33] GenomicFeatures_1.22.13
[34] survival_2.38-3
[35] magrittr_1.5
[36] mclust_5.1
[37] nlme_3.1-125
[38] MASS_7.3-45
[39] tools_3.2.3
[40] registry_0.3
[41] org.Hs.eg.db_3.2.3
[42] matrixStats_0.50.1
[43] stringr_1.0.0
[44] munsell_0.4.3
[45] rngtools_1.2.4
[46] AnnotationDbi_1.32.3
[47] lambda.r_1.1.7
[48] base64_1.1
[49] futile.logger_1.4.1
[50] grid_3.2.3
[51] RCurl_1.95-4.8
[52] igraph_1.0.1
[53] bitops_1.0-6
[54] gtable_0.2.0
[55] codetools_0.2-14
[56] multtest_2.26.0
[57] reshape_0.8.5
[58] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.1
[59] ruv_0.9.6
[60] illuminaio_0.12.0
[61] GenomicAlignments_1.6.3
[62] rtracklayer_1.30.2
[63] wateRmelon_1.10.0
[64] futile.options_1.0.0
[65] stringi_1.0-1
[66] sva_3.18.0
[67] Rcpp_0.12.3
[68] geneplotter_1.48.0
[69] rgl_0.95.1441

Can some one please suggest me how to remove unwanted probes before normalization.

minfi 450k champ rnbeads • 1.7k views
0
Entering edit mode

If you want help, you need to be very explicit. Nobody but you knows what you mean by

I tried removing these probes from rgSet object of minfi but it didn't help. Moreover, after this I was not able to convert it to mset object.

Instead of saying what you did, it's better if you show a very limited amount of code that isn't doing what you expect. In addition you should indicate what type of object you are dealing with (using the class  function) and also show what versions of R/BioC you are using (by showing the results of running sessionInfo() after you have run all your code).

0
Entering edit mode

Hi James,

I have included the script and the error code here.

2
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

The problem here is that you are subsetting your RGChannelSet first, which isn't something you should do. I suppose it is hypothetically possible to also subset your manifest object so you don't have this problem, or you could try to convince the minfi developers  (Tim Triche Jr, in particular) to make preprocessNoob work somewhat differently, but the problem arises from the fact that a couple of internal steps in the normalization procedure rely on subsetting a matrix using the row.names of that matrix. And if you have a row.name that doesn't exist in that matrix, you get the error you see. As a simple example:

> mat <- matrix(rnorm(100), 10)
> row.names(mat) <- letters[1:10]
> mat
[,1]       [,2]        [,3]       [,4]        [,5]       [,6]
a  0.3709970 -2.0116615 -1.21149415  0.6638382  0.03659271  2.2569702
b -0.9473655  1.0290758 -0.30754218 -0.5595065  2.51112745  1.1659491
c  0.5560287  0.4430431 -0.50840200 -0.4671531  0.18405680  0.2757360
d  0.4652777 -1.0155842 -0.82632379  0.4651436  0.45080591 -0.8361706
e -1.4373481 -1.7211055 -0.93050895  1.9487600  1.50039226 -1.6016487
f  0.3804068 -0.1015975 -1.40620418 -0.9956680  0.64625803  1.5518482
g -1.4694913 -0.7282363  0.33781047 -1.2208803 -1.44387787  0.6753268
h -0.4476593 -0.6621178  2.08757391  0.7633143  0.21890015 -0.4753443
i -0.8321351 -0.9099048  0.08701877  0.5804936  1.97661858  0.1411349
j -0.4407734 -0.4347822 -2.63394467 -0.4855034 -0.84696107 -0.5706390
[,7]       [,8]        [,9]       [,10]
a -0.06117854 -0.2852286  0.64977763 -0.53529725
b -0.04865041 -1.9257401  0.01339627 -1.19639716
c  0.43383909 -0.4085163 -1.06670161 -0.19863183
d  1.72501337 -1.8235541  0.80291538 -0.76599607
e -1.06246580 -0.9887508 -0.39689052 -0.22341377
f  1.17843445  0.1303126  0.60399966 -0.45423505
g  2.20500158 -0.8566114 -0.13084707 -0.79465650
h -1.81985530  0.2065925 -1.71127201 -0.66237321
i -2.11721982 -0.4987227 -0.54174290  2.42489161
j  0.19824620  0.6290796  1.38432869  0.01123403
> mat[c("a","b","d","z"),]
Error in mat[c("a","b","d","z"), ] : subscript out of bounds

But the issue you want to avoid isn't a problem at the step you are trying to avoid it. In other words, you appear to be worried that the methylation data based on probes with SNPs in their sequence or from chromosomes with varying numbers of copies will not be reliable (a valid worry, IMO). But the normalization step is really orthogonal to that worry - at that step all you are trying to do is adjust the distribution of probe intensities from different arrays so they are on comparable scales. Whether or not a given probe is accurately measuring something isn't relevant at that step - you have probes of varying intensity and you want to make those intensities 'similar' across arrays, for some definition of similar.

So really what you should be doing is processing your data up to the point that you have a MethylSet or GenomicMethylSet and then subsetting to remove data from probes that you don't trust.

1
Entering edit mode
I agree with Jim on probe removal before and after normalization. I also think the detectionP value procedure, which many people seem to like, is poorly justified. However, one thing which complicates this, is that the detectionP results are at the probe level and a MethylSet is at the CpG level. These two things differs because either 1 or 2 probes might be used to measure the CpG. So doing this is not as straightforward as it should be. We have this on our list of issues for minfi. Best, Kasper On Tue, Mar 29, 2016 at 10:39 AM, James W. MacDonald [bioc] < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User James W. MacDonald <https: support.bioconductor.org="" u="" 5106=""/> wrote Answer: > How to remove unwanted probes before normalization in 450k data > <https: support.bioconductor.org="" p="" 79282="" #80175="">: > > The problem here is that you are subsetting your RGChannelSet first, which > isn't something you should do. I suppose it is hypothetically possible to > also subset your manifest object so you don't have this problem, or you > could try to convince the minfi developers (Tim Triche Jr, in particular) > to make preprocessNoob work somewhat differently, but the problem arises > from the fact that a couple of internal steps in the normalization > procedure rely on subsetting a matrix using the row.names of that matrix. > And if you have a row.name that doesn't exist in that matrix, you get the > error you see. As a simple example: > > > mat <- matrix(rnorm(100), 10) > > row.names(mat) <- letters[1:10] > > mat > [,1] [,2] [,3] [,4] [,5] [,6] > a 0.3709970 -2.0116615 -1.21149415 0.6638382 0.03659271 2.2569702 > b -0.9473655 1.0290758 -0.30754218 -0.5595065 2.51112745 1.1659491 > c 0.5560287 0.4430431 -0.50840200 -0.4671531 0.18405680 0.2757360 > d 0.4652777 -1.0155842 -0.82632379 0.4651436 0.45080591 -0.8361706 > e -1.4373481 -1.7211055 -0.93050895 1.9487600 1.50039226 -1.6016487 > f 0.3804068 -0.1015975 -1.40620418 -0.9956680 0.64625803 1.5518482 > g -1.4694913 -0.7282363 0.33781047 -1.2208803 -1.44387787 0.6753268 > h -0.4476593 -0.6621178 2.08757391 0.7633143 0.21890015 -0.4753443 > i -0.8321351 -0.9099048 0.08701877 0.5804936 1.97661858 0.1411349 > j -0.4407734 -0.4347822 -2.63394467 -0.4855034 -0.84696107 -0.5706390 > [,7] [,8] [,9] [,10] > a -0.06117854 -0.2852286 0.64977763 -0.53529725 > b -0.04865041 -1.9257401 0.01339627 -1.19639716 > c 0.43383909 -0.4085163 -1.06670161 -0.19863183 > d 1.72501337 -1.8235541 0.80291538 -0.76599607 > e -1.06246580 -0.9887508 -0.39689052 -0.22341377 > f 1.17843445 0.1303126 0.60399966 -0.45423505 > g 2.20500158 -0.8566114 -0.13084707 -0.79465650 > h -1.81985530 0.2065925 -1.71127201 -0.66237321 > i -2.11721982 -0.4987227 -0.54174290 2.42489161 > j 0.19824620 0.6290796 1.38432869 0.01123403 > > mat[c("a","b","d","z"),] > Error in mat[c("a","b","d","z"), ] : subscript out of bounds > > But the issue you want to avoid isn't a problem at the step you are trying > to avoid it. In other words, you appear to be worried that the methylation > data based on probes with SNPs in their sequence or from chromosomes with > varying numbers of copies will not be reliable (a valid worry, IMO). But > the normalization step is really orthogonal to that worry - at that step > all you are trying to do is adjust the distribution of probe intensities > from different arrays so they are on comparable scales. Whether or not a > given probe is accurately measuring something isn't relevant at that step - > you have probes of varying intensity and you want to make those intensities > 'similar' across arrays, for some definition of similar. > > So really what you should be doing is processing your data up to the point > that you have a MethylSet or GenomicMethylSet and then subsetting to remove > data from probes that you don't trust. > > > > ------------------------------ > > Post tags: minfi, 450k, champ, rnbeads > > You may reply via email or visit > A: How to remove unwanted probes before normalization in 450k data >