Question: How to remove unwanted probes before normalization in 450k data
1
3.7 years ago by
AST50
INDIA
AST50 wrote:

Can someone please suggest me a way to remove unwanted probes (XY probes, SNP associated probes, etc.) from my 450k dataset prior to normalization. I don't want them to screw up the downstream data analysis.

I tried removing these probes from rgSet object of minfi but it didn't help. Moreover, after this I was not able to convert it to grset object. Following is the error message:

> RGsetEx <- read.450k.exp(targets = targets, extended = TRUE)
> dim(RGsetEx)
Features  Samples
622399       10
> detP <- detectionP(RGsetEx)
> keep <- rowSums(detP < 0.01) == ncol(RGsetEx)
> RGsetEx <- RGsetEx[keep,]
> dim(RGsetEx)
Features  Samples
619508       10
> grset <- preprocessFunnorm(RGsetEx, nPCs=8, sex = NULL, bgCorr = TRUE, dyeCorr = TRUE, verbose = TRUE)
[preprocessFunnorm] Background and dye bias correction with noob
Error in getGreen(object)[IRed\$AddressA, ] : subscript out of bounds

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252    LC_MONETARY=English_India.1252
[4] LC_NUMERIC=C                   LC_TIME=English_India.1252

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] IlluminaHumanMethylation450kmanifest_0.4.0 missMethyl_1.4.0
[3] RSQLite_1.0.0                              DBI_0.3.1
[5] ENmix_1.4.1                                doParallel_1.0.10
[7] minfi_1.16.1                               bumphunter_1.10.0
[9] locfit_1.5-9.1                             iterators_1.0.8
[11] foreach_1.4.3                              Biostrings_2.38.4
[13] XVector_0.10.0                             SummarizedExperiment_1.0.2
[15] GenomicRanges_1.22.4                       GenomeInfoDb_1.6.3
[17] IRanges_2.4.8                              S4Vectors_0.8.11
[19] lattice_0.20-33                            Biobase_2.30.0
[21] BiocGenerics_0.16.1

loaded via a namespace (and not attached):
[1] nor1mix_1.2-1
[2] splines_3.2.3
[3] ellipse_0.3-8
[4] statmod_1.4.24
[5] doRNG_1.6
[6] Rsamtools_1.22.0
[7] methylumi_2.16.0
[8] impute_1.44.0
[9] limma_3.26.8
[11] digest_0.6.9
[12] RColorBrewer_1.1-2
[13] colorspace_1.2-6
[14] preprocessCore_1.32.0
[15] Matrix_1.2-4
[16] plyr_1.8.3
[17] GEOquery_2.36.0
[18] siggenes_1.44.0
[19] XML_3.98-1.4
[20] mixOmics_5.2.0
[21] biomaRt_2.26.1
[22] genefilter_1.52.1
[23] zlibbioc_1.16.0
[24] xtable_1.8-2
[25] corpcor_1.6.8
[26] scales_0.4.0
[27] BiocParallel_1.4.3
[28] annotate_1.48.0
[29] beanplot_1.2
[30] pkgmaker_0.22
[31] mgcv_1.8-12
[32] ggplot2_2.1.0
[33] GenomicFeatures_1.22.13
[34] survival_2.38-3
[35] magrittr_1.5
[36] mclust_5.1
[37] nlme_3.1-125
[38] MASS_7.3-45
[39] tools_3.2.3
[40] registry_0.3
[41] org.Hs.eg.db_3.2.3
[42] matrixStats_0.50.1
[43] stringr_1.0.0
[44] munsell_0.4.3
[45] rngtools_1.2.4
[46] AnnotationDbi_1.32.3
[47] lambda.r_1.1.7
[48] base64_1.1
[49] futile.logger_1.4.1
[50] grid_3.2.3
[51] RCurl_1.95-4.8
[52] igraph_1.0.1
[53] bitops_1.0-6
[54] gtable_0.2.0
[55] codetools_0.2-14
[56] multtest_2.26.0
[57] reshape_0.8.5
[58] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.1
[59] ruv_0.9.6
[60] illuminaio_0.12.0
[61] GenomicAlignments_1.6.3
[62] rtracklayer_1.30.2
[63] wateRmelon_1.10.0
[64] futile.options_1.0.0
[65] stringi_1.0-1
[66] sva_3.18.0
[67] Rcpp_0.12.3
[68] geneplotter_1.48.0
[69] rgl_0.95.1441

Can some one please suggest me how to remove unwanted probes before normalization.

minfi champ 450k rnbeads • 1.2k views
modified 3.6 years ago by James W. MacDonald51k • written 3.7 years ago by AST50

If you want help, you need to be very explicit. Nobody but you knows what you mean by

I tried removing these probes from rgSet object of minfi but it didn't help. Moreover, after this I was not able to convert it to mset object.

Instead of saying what you did, it's better if you show a very limited amount of code that isn't doing what you expect. In addition you should indicate what type of object you are dealing with (using the class  function) and also show what versions of R/BioC you are using (by showing the results of running sessionInfo() after you have run all your code).

Hi James,

I have included the script and the error code here.

Answer: How to remove unwanted probes before normalization in 450k data
2
3.6 years ago by
United States
James W. MacDonald51k wrote:

The problem here is that you are subsetting your RGChannelSet first, which isn't something you should do. I suppose it is hypothetically possible to also subset your manifest object so you don't have this problem, or you could try to convince the minfi developers  (Tim Triche Jr, in particular) to make preprocessNoob work somewhat differently, but the problem arises from the fact that a couple of internal steps in the normalization procedure rely on subsetting a matrix using the row.names of that matrix. And if you have a row.name that doesn't exist in that matrix, you get the error you see. As a simple example:

> mat <- matrix(rnorm(100), 10)
> row.names(mat) <- letters[1:10]
> mat
[,1]       [,2]        [,3]       [,4]        [,5]       [,6]
a  0.3709970 -2.0116615 -1.21149415  0.6638382  0.03659271  2.2569702
b -0.9473655  1.0290758 -0.30754218 -0.5595065  2.51112745  1.1659491
c  0.5560287  0.4430431 -0.50840200 -0.4671531  0.18405680  0.2757360
d  0.4652777 -1.0155842 -0.82632379  0.4651436  0.45080591 -0.8361706
e -1.4373481 -1.7211055 -0.93050895  1.9487600  1.50039226 -1.6016487
f  0.3804068 -0.1015975 -1.40620418 -0.9956680  0.64625803  1.5518482
g -1.4694913 -0.7282363  0.33781047 -1.2208803 -1.44387787  0.6753268
h -0.4476593 -0.6621178  2.08757391  0.7633143  0.21890015 -0.4753443
i -0.8321351 -0.9099048  0.08701877  0.5804936  1.97661858  0.1411349
j -0.4407734 -0.4347822 -2.63394467 -0.4855034 -0.84696107 -0.5706390
[,7]       [,8]        [,9]       [,10]
a -0.06117854 -0.2852286  0.64977763 -0.53529725
b -0.04865041 -1.9257401  0.01339627 -1.19639716
c  0.43383909 -0.4085163 -1.06670161 -0.19863183
d  1.72501337 -1.8235541  0.80291538 -0.76599607
e -1.06246580 -0.9887508 -0.39689052 -0.22341377
f  1.17843445  0.1303126  0.60399966 -0.45423505
g  2.20500158 -0.8566114 -0.13084707 -0.79465650
h -1.81985530  0.2065925 -1.71127201 -0.66237321
i -2.11721982 -0.4987227 -0.54174290  2.42489161
j  0.19824620  0.6290796  1.38432869  0.01123403
> mat[c("a","b","d","z"),]
Error in mat[c("a","b","d","z"), ] : subscript out of bounds

But the issue you want to avoid isn't a problem at the step you are trying to avoid it. In other words, you appear to be worried that the methylation data based on probes with SNPs in their sequence or from chromosomes with varying numbers of copies will not be reliable (a valid worry, IMO). But the normalization step is really orthogonal to that worry - at that step all you are trying to do is adjust the distribution of probe intensities from different arrays so they are on comparable scales. Whether or not a given probe is accurately measuring something isn't relevant at that step - you have probes of varying intensity and you want to make those intensities 'similar' across arrays, for some definition of similar.

So really what you should be doing is processing your data up to the point that you have a MethylSet or GenomicMethylSet and then subsetting to remove data from probes that you don't trust.