Question: Removing sex-chromosome probes from methyLumiSet
0
3.6 years ago by
United States
martens.christopher0 wrote:

I have DNA methylation data that I have collected from male and female subjects using the Illumina 450K Array platform.  I have read the data into R as a "methyLumiSet" from .IDAT files using the "methyLumi" package and I am planning on pre-processing the data using some of the normalization functions in the "wateRmelon" package:

suppressPackageStartupMessages(require(methylumi))
suppressPackageStartupMessages(require(wateRmelon))
suppressPackageStartupMessages(require(IlluminaHumanMethylation450kanno.ilmn12.hg19))

# phenoData is a data.frame with the row names = SentrixPosition_barcode
# and columns containing sample group, age and gender.
# Barcodes are pulled from a column in phenoData called 'barcodes'.

barcodes <- subset(phenoData, select=barcodes)

# Import .IDAT files as methyLumiSet
methyLumiSet <- methylumIDAT(barcodes = barcodes, pdat = phenoData,
idatPath = "/Users/Martens/Desktop/08272014/IDATs")

As confirmation that all of the features were imported, I checked the number of rows in methyLumiSet:

nrow(methyLumiSet)
Features
485577

The methyLumiSet is based off of the eSet class in Biobase.  I would like to remove all features spanning X and Y chromosomes, as is common practice in DNA methylation analysis.

As an initial attempt, I tried to determine which probes fall on the Y chromosome using the following code with the idea that I would then remove those probes from the methyLumiSet.

methyLumiSet.ChrY <- methyLumiSet[fData(methyLumiSet)$CHROMOSOME=="Y", ] however, when I check the number of probes, the result is 0 features: nrow(methyLumiSet.ChrY) Features 0 I cannot figure out why I am unable to subset features of my methyLumiSet. However, a potential issue might be with the annotation. When I try to run the methyLumi function 'featureFilter' to remove the X chromosome, I get the following error messages: methyLumiSet.Xfilt <- featureFilter(methyLumiSet, exclude.ChrX = TRUE) Warning message: In .featureFilter(eset, require.entrez = require.entrez, require.GOBP = require.GOBP, : HumanMethylation450k probes annotate to multiple accessions(!) Error in mget(featureNames(eset), envir = annotate::getAnnMap("CHR", annChip), : error in evaluating the argument 'envir' in selecting a method for function 'mget': Error: getAnnMap: package IlluminaHumanMethylation450k not available When I try to install IlluminaHumanMethylation450k, I get the following: source("http://bioconductor.org/biocLite.R") biocLite("IlluminaHumanMethylation450k") BioC_mirror: http://bioconductor.org Using Bioconductor version 3.1 (BiocInstaller 1.18.3), R version 3.2.0. Installing package(s) ‘IlluminaHumanMethylation450k’ Old packages: 'stringi', 'VariantAnnotation' Update all/some/none? [a/s/n]:  I type 'a' to update all and after updating I get the following error message. # Warning message: package ‘IlluminaHumanMethylation450k’ is not available (for R version 3.2.0)  My only guess is that my issue with subsetting probes by chromosome has to do with not being able to load the annotation information, but I am stuck on trying to figure out how to fix it. According to the reference manual for methyLumi, the package should be compatible with R version 3.2.0 and depends on IlluminaHumanMethylation450kanno.ilmn12.hg19 but even when I require this package I don't understand how to link the annotation data to the methyLumiSet. any advice on properly annotating a methyLumiSet and/or removing X,Y chromosomes from a methyLumiSet would be great. I am pretty new to R. I would prefer to do this without coercing to another structure (e.g., SummarizedExperiment/minfi type of object) if possible. Thanks! -Chris ADD COMMENTlink modified 3.6 years ago by James W. MacDonald49k • written 3.6 years ago by martens.christopher0 1) why would you want to toss out the X chromosome? There are an awful lot of sites on chrX that matter 2) why not add a term for gender in the regression fit / DMRcate / whatever instead? 3) why not check that the sex of samples (from X copy number and/or X inactivation) matches the putative sex of the subjects? I should provide more information in ?methylumIDAT, to be sure, and I'll patch that as soon as I have a moment (to point to the somewhat extensive vignette, and perhaps add a few functions for the above, which have been languishing in another package). However, I would like to caution that simply throwing out sex chromosomes is often not the best idea. If you do want to do so, a good time to perform this task would be after normalization and mapping to the genome, which is explored to a degree in http://www.bioconductor.org/packages/release/bioc/vignettes/methylumi/inst/doc/methylumi450k.pdf . Best, --t ADD REPLYlink written 3.6 years ago by Tim Triche4.2k Thanks Tim, I hear you on keeping the chrX probes and using gender as a covariate instead and will keep that in mind. For my current analysis, I am trying to validate targets from a previously published dataset in my own independent set of samples and thus, want to follow the exact analysis pipeline used in the original paper. Because they removed the sex chromosomes, there is no point in me keeping them. One thing I'm still stuck on. Any idea why the featureFilter function will not work for me? > methyLumiSet.Xfilt <- featureFilter(methyLumiSet, exclude.ChrX = TRUE) The error message I get is: Warning message: In .featureFilter(eset, require.entrez = require.entrez, require.GOBP = require.GOBP, : HumanMethylation450k probes annotate to multiple accessions(!) Error in mget(featureNames(eset), envir = annotate::getAnnMap("CHR", annChip), : error in evaluating the argument 'envir' in selecting a method for function 'mget': Error: getAnnMap: package IlluminaHumanMethylation450k not available Thanks again, -Chris ADD REPLYlink written 3.6 years ago by martens.christopher0 Hmmm, that's a featureFilter issue, but something I could fix... let me see if I can patch that and the methylumIDAT docs at the same time. Looks like I'll be writing a lot of patches today (i.e. the ones I wanted to write for ozymandias and DMRcate, and these) ADD REPLYlink modified 3.6 years ago by Dan Tenenbaum8.2k • written 3.6 years ago by Tim Triche4.2k Answer: Removing sex-chromosome probes from methyLumiSet 0 3.6 years ago by United States James W. MacDonald49k wrote: Using the example for methylumiR() (since there isn't a runnable example for methylumIDAT() and I don't feel like rummaging around for some IDAT files), I get this: > example(methylumiR) mthylR> ## Read in sample information mthylR> samps <- read.table(system.file("extdata/samples.txt", mthylR+ package = "methylumi"),sep="\t",header=TRUE) mthylR> ## Perform the actual data reading mthylR> ## This is an example of reading data from an mthylR> ## Sentrix Array format file (actually two files, mthylR> ## one for data and one for QC probes) mthylR> mldat <- methylumiR(system.file('extdata/exampledata.samples.txt',package='methylumi'), mthylR+ qcfile=system.file('extdata/exampledata.controls.txt',package="methylumi"), mthylR+ sampleDescriptions=samps) mthylR> mldat Object Information: MethyLumiSet (storageMode: lockedEnvironment) assayData: 1536 features, 10 samples element names: Avg_NBEADS, BEAD_STDERR, betas, methylated, pvals, unmethylated protocolData: none phenoData sampleNames: M_1 M_2 ... F_10 (10 total) varLabels: sampleID SampleLabel Sample Gender varMetadata: labelDescription featureData featureNames: AATK_E63_R AATK_P519_R ... ZP3_P220_F (1536 total) fvarLabels: TargetID ProbeID ... PRODUCT (17 total) fvarMetadata: labelDescription experimentData: use 'experimentData(object)' Annotation: Major Operation History: submitted finished 1 2015-06-22 14:13:20 2015-06-22 14:13:20 command 1 methylumiR(filename = system.file("extdata/exampledata.samples.txt", package = "methylumi"), qcfile = system.file("extdata/exampledata.controls.txt", package = "methylumi"), sampleDescriptions = samps) And we can see what chromosomes are being measured: > table(fData(mldat)$CHROM)

1  10  11  12  13  14  15  16  17  18  19   2  20  21  22   3   4   5   6   7
122  46 113  77  26  37  62  41  76  36  75  89  36  23  25  89  66  70  98 122
8   9   X
58  62  87

Which may be why you are having problems filtering on ChrY.

Thank you for providing an answer.  Unfortunately, I get the same output when I try to filter on ChrX.

It looks like my fvarLabels: in featureData only contains Probe_ID, DESIGN, and COLOR_CHANNEL whereas your Object Information lists "17 total"

> methyLumiSet

Object Information:
MethyLumiSet (storageMode: lockedEnvironment)
assayData: 485577 features, 27 samples
element names: betas, methylated, methylated.OOB, pvals, unmethylated, unmethylated.OOB
protocolData: none
phenoData
sampleNames: 9976861136_R01C01 9976861136_R02C01 ... 9482801028_R03C01 (27 total)
varLabels: Sample_Name Sample_Well ... barcodes (10 total)
featureData
featureNames: cg00000029 cg00000108 ... rs9839873 (485577 total)
fvarLabels: Probe_ID DESIGN COLOR_CHANNEL
experimentData: use 'experimentData(object)'
Annotation: IlluminaHumanMethylation450k
Major Operation History:
submitted            finished
1 2015-06-22 16:03:08 2015-06-22 16:04:11
2 2015-06-22 16:04:15 2015-06-22 16:04:21
command
1 methylumIDAT(barcodes = barcodes.YO, pdat = phenoData, idatPath = "/Users/Martens/Desktop/08272014/IDATs")
2  Subset of 485577 features.

When I try to duplicate your table above I get the following:

> table(fData(methyLumiSet)$CHROM) < table of extent 0 > ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by martens.christopher0 1) that's a GoldenGate example, there are no chrY probes on that platform 2) here is a vignette-length 450k example for methylumIDAT (which I should point to, now that you mention it, in the man page for methylumIDAT): http://www.bioconductor.org/packages/release/bioc/vignettes/methylumi/inst/doc/methylumi450k.pdf i.e., R> suppressPackageStartupMessages(require('methylumi')) R> suppressPackageStartupMessages(require('TCGAMethylation450k')) R> suppressPackageStartupMessages(require('FDb.InfiniumMethylation.hg19')) R> idatPath <- system.file('extdata/idat',package='TCGAMethylation450k') R> mset450k <- methylumIDAT(getBarcodes(path=idatPath), idatPath=idatPath) ## 0 HumanMethylation27 samples found ## 10 HumanMethylation450 samples found ## Attempting to extract protocolData() from list... ## Determining chip type from IDAT protocolData... R> sampleNames(mset450k) <- paste0('TCGA', 1:ncol(mset450k)) 3) a heroically annoying way to verify that the chrY probes are there (per above): R> keepSeqlevels(mapToGenome(mset450k), 'chrY') class: GenomicMethylSet dim: 416 10 metadata(0): assays(2): Meth Unmeth rownames(416): cg13654344 cg04169747 ... cg08265308 cg14273923 rowRanges metadata column names(0): colnames(10): TCGA1 TCGA2 ... TCGA9 TCGA10 colData names(1): barcode Annotation array: IlluminaHumanMethylation450k annotation: ilmn12.hg19 Preprocessing Method: methylumi minfi version: 1.15.9 Manifest version: 0.4 Once upon a time we actually kept chromosomes on the methyLumiSet objects, but frankly it's (IMHO) more sensible to hustle them into a SummarizedExperiment-like object as fast as possible. YMMV... Statistics is the grammar of science. Karl Pearson <http: en.wikipedia.org="" wiki="" the_grammar_of_science=""> On Mon, Jun 22, 2015 at 2:15 PM, James W. MacDonald [bioc] < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User James W. MacDonald <https: support.bioconductor.org="" u="" 5106=""/> wrote Answer: > Removing sex-chromosome probes from methyLumiSet > <https: support.bioconductor.org="" p="" 69035="" #69040="">: > > Using the example for methylumiR() (since there isn't a runnable example > for methylumIDAT() and I don't feel like rummaging around for some IDAT > files), I get this: > > > example(methylumiR) > > mthylR> ## Read in sample information > mthylR> samps <- read.table(system.file("extdata/samples.txt", > mthylR+ package = "methylumi"),sep="\t",header=TRUE) > > mthylR> ## Perform the actual data reading > mthylR> ## This is an example of reading data from an > mthylR> ## Sentrix Array format file (actually two files, > mthylR> ## one for data and one for QC probes) > mthylR> mldat <- methylumiR(system.file('extdata/exampledata.samples.txt',package='methylumi'), > mthylR+ qcfile=system.file('extdata/exampledata.controls.txt',package="methylumi"), > mthylR+ sampleDescriptions=samps) > > mthylR> mldat > > Object Information: > MethyLumiSet (storageMode: lockedEnvironment) > assayData: 1536 features, 10 samples > element names: Avg_NBEADS, BEAD_STDERR, betas, methylated, pvals, unmethylated > protocolData: none > phenoData > sampleNames: M_1 M_2 ... F_10 (10 total) > varLabels: sampleID SampleLabel Sample Gender > varMetadata: labelDescription > featureData > featureNames: AATK_E63_R AATK_P519_R ... ZP3_P220_F (1536 total) > fvarLabels: TargetID ProbeID ... PRODUCT (17 total) > fvarMetadata: labelDescription > experimentData: use 'experimentData(object)' > Annotation: > Major Operation History: > submitted finished > 1 2015-06-22 14:13:20 2015-06-22 14:13:20 > command > 1 methylumiR(filename = system.file("extdata/exampledata.samples.txt", package = "methylumi"), qcfile = system.file("extdata/exampledata.controls.txt", package = "methylumi"), sampleDescriptions = samps) > > > And we can see what chromosomes are being measured: > > > table(fData(mldat)$CHROM) > > 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 3 4 5 6 7 > 122 46 113 77 26 37 62 41 76 36 75 89 36 23 25 89 66 70 98 122 > 8 9 X > 58 62 87 > > Which may be why you are having problems filtering on ChrY. > > > > ------------------------------ > > You may reply via email or visit > A: Removing sex-chromosome probes from methyLumiSet >