Search
Question: Removing sex-chromosome probes from methyLumiSet
0
gravatar for martens.christopher
3.2 years ago by
United States
martens.christopher0 wrote:

I have DNA methylation data that I have collected from male and female subjects using the Illumina 450K Array platform.  I have read the data into R as a "methyLumiSet" from .IDAT files using the "methyLumi" package and I am planning on pre-processing the data using some of the normalization functions in the "wateRmelon" package:

suppressPackageStartupMessages(require(methylumi))
suppressPackageStartupMessages(require(wateRmelon))
suppressPackageStartupMessages(require(IlluminaHumanMethylation450kanno.ilmn12.hg19))

# Read sample data. 
# phenoData is a data.frame with the row names = SentrixPosition_barcode 
# and columns containing sample group, age and gender. 
# Barcodes are pulled from a column in phenoData called 'barcodes'.

phenoData <- read.csv("/Users/Martens/Desktop/08272014/IDATs/Sample_Sheet.csv",header=TRUE) 
barcodes <- subset(phenoData, select=barcodes)

# Import .IDAT files as methyLumiSet
methyLumiSet <- methylumIDAT(barcodes = barcodes, pdat = phenoData,
idatPath = "/Users/Martens/Desktop/08272014/IDATs")

As confirmation that all of the features were imported, I checked the number of rows in methyLumiSet:

nrow(methyLumiSet)
Features
    485577

The methyLumiSet is based off of the eSet class in Biobase.  I would like to remove all features spanning X and Y chromosomes, as is common practice in DNA methylation analysis.  

As an initial attempt, I tried to determine which probes fall on the Y chromosome using the following code with the idea that I would then remove those probes from the methyLumiSet.

methyLumiSet.ChrY <- methyLumiSet[fData(methyLumiSet)$CHROMOSOME=="Y", ]

however, when I check the number of probes, the result is 0 features:

nrow(methyLumiSet.ChrY)
Features
    0

I cannot figure out why I am unable to subset features of my methyLumiSet.  However, a potential issue might be with the annotation.  When I try to run the methyLumi function 'featureFilter' to remove the X chromosome, I get the following error messages:

methyLumiSet.Xfilt <- featureFilter(methyLumiSet, exclude.ChrX = TRUE)

Warning message:
In .featureFilter(eset, require.entrez = require.entrez, require.GOBP =            
require.GOBP,  : HumanMethylation450k probes annotate to multiple accessions(!)
 Error in mget(featureNames(eset), envir = annotate::getAnnMap("CHR", annChip),  : 
 error in evaluating the argument 'envir' in selecting a method for     function 'mget': 
 Error: getAnnMap: package IlluminaHumanMethylation450k not available


When I try to install IlluminaHumanMethylation450k, I get the following:

source("http://bioconductor.org/biocLite.R")
biocLite("IlluminaHumanMethylation450k")

BioC_mirror: http://bioconductor.org
Using Bioconductor version 3.1 (BiocInstaller 1.18.3), R version 3.2.0.
Installing package(s) ‘IlluminaHumanMethylation450k’
Old packages: 'stringi', 'VariantAnnotation'
Update all/some/none? [a/s/n]: 

I type 'a' to update all and after updating I get the following error message.

# Warning message: package ‘IlluminaHumanMethylation450k’ is not available (for R version 3.2.0) 


My only guess is that my issue with subsetting probes by chromosome has to do with not being able to load the annotation information, but I am stuck on trying to figure out how to fix it.  According to the reference manual for methyLumi, the package should be compatible with R version 3.2.0 and depends on IlluminaHumanMethylation450kanno.ilmn12.hg19 but even when I require this package I don't understand how to link the annotation data to the methyLumiSet.

any advice on properly annotating a methyLumiSet and/or removing X,Y chromosomes from a methyLumiSet would be great.  I am pretty new to R.

I would prefer to do this without coercing to another structure (e.g., SummarizedExperiment/minfi type of object) if possible.  Thanks!

-Chris

ADD COMMENTlink modified 3.2 years ago by James W. MacDonald47k • written 3.2 years ago by martens.christopher0

1) why would you want to toss out the X chromosome?  There are an awful lot of sites on chrX that matter

2) why not add a term for gender in the regression fit / DMRcate / whatever instead?

3) why not check that the sex of samples (from X copy number and/or X inactivation) matches the putative sex of the subjects?

 

I should provide more information in ?methylumIDAT, to be sure, and I'll patch that as soon as I have a moment (to point to the somewhat extensive vignette, and perhaps add a few functions for the above, which have been languishing in another package).  However, I would like to caution that simply throwing out sex chromosomes is often not the best idea.  If you do want to do so, a good time to perform this task would be after normalization and mapping to the genome, which is explored to a degree in http://www.bioconductor.org/packages/release/bioc/vignettes/methylumi/inst/doc/methylumi450k.pdf .

 

Best,

 

--t

 

ADD REPLYlink written 3.2 years ago by Tim Triche4.2k

Thanks Tim,

I hear you on keeping the chrX probes and using gender as a covariate instead and will keep that in mind.  For my current analysis, I am trying to validate targets from a previously published dataset in my own independent set of samples and thus, want to follow the exact analysis pipeline used in the original paper.  Because they removed the sex chromosomes, there is no point in me keeping them. 

One thing I'm still stuck on.  Any idea why the featureFilter function will not work for me?

> methyLumiSet.Xfilt <- featureFilter(methyLumiSet, exclude.ChrX = TRUE)

The error message I get is: 

Warning message:
In .featureFilter(eset, require.entrez = require.entrez, require.GOBP = require.GOBP,  :
  HumanMethylation450k probes annotate to multiple accessions(!)
Error in mget(featureNames(eset), envir = annotate::getAnnMap("CHR", annChip),  : 
  error in evaluating the argument 'envir' in selecting a method for function 'mget': Error: getAnnMap: package IlluminaHumanMethylation450k not available

 

Thanks again,

-Chris

 

ADD REPLYlink written 3.2 years ago by martens.christopher0
Hmmm, that's a featureFilter issue, but something I could fix... let me see if I can patch that and the methylumIDAT docs at the same time. Looks like I'll be writing a lot of patches today (i.e. the ones I wanted to write for ozymandias and DMRcate, and these)
ADD REPLYlink modified 3.2 years ago by Dan Tenenbaum ♦♦ 8.2k • written 3.2 years ago by Tim Triche4.2k
0
gravatar for James W. MacDonald
3.2 years ago by
United States
James W. MacDonald47k wrote:

Using the example for methylumiR() (since there isn't a runnable example for methylumIDAT() and I don't feel like rummaging around for some IDAT files), I get this:

> example(methylumiR)

mthylR> ## Read in sample information
mthylR> samps <- read.table(system.file("extdata/samples.txt",
mthylR+                                 package = "methylumi"),sep="\t",header=TRUE)

mthylR> ## Perform the actual data reading
mthylR> ## This is an example of reading data from an
mthylR> ## Sentrix Array format file (actually two files,
mthylR> ## one for data and one for QC probes)
mthylR> mldat <- methylumiR(system.file('extdata/exampledata.samples.txt',package='methylumi'),
mthylR+                     qcfile=system.file('extdata/exampledata.controls.txt',package="methylumi"),
mthylR+                     sampleDescriptions=samps)

mthylR> mldat

Object Information:
MethyLumiSet (storageMode: lockedEnvironment)
assayData: 1536 features, 10 samples
  element names: Avg_NBEADS, BEAD_STDERR, betas, methylated, pvals, unmethylated
protocolData: none
phenoData
  sampleNames: M_1 M_2 ... F_10 (10 total)
  varLabels: sampleID SampleLabel Sample Gender
  varMetadata: labelDescription
featureData
  featureNames: AATK_E63_R AATK_P519_R ... ZP3_P220_F (1536 total)
  fvarLabels: TargetID ProbeID ... PRODUCT (17 total)
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:  
Major Operation History:
            submitted            finished
1 2015-06-22 14:13:20 2015-06-22 14:13:20
                                                                                                                                                                                                            command
1 methylumiR(filename = system.file("extdata/exampledata.samples.txt",     package = "methylumi"), qcfile = system.file("extdata/exampledata.controls.txt",     package = "methylumi"), sampleDescriptions = samps)


And we can see what chromosomes are being measured:

> table(fData(mldat)$CHROM)

  1  10  11  12  13  14  15  16  17  18  19   2  20  21  22   3   4   5   6   7
122  46 113  77  26  37  62  41  76  36  75  89  36  23  25  89  66  70  98 122
  8   9   X
 58  62  87

Which may be why you are having problems filtering on ChrY.

 

ADD COMMENTlink written 3.2 years ago by James W. MacDonald47k

Thank you for providing an answer.  Unfortunately, I get the same output when I try to filter on ChrX.  

It looks like my fvarLabels: in featureData only contains Probe_ID, DESIGN, and COLOR_CHANNEL whereas your Object Information lists "17 total" 

> methyLumiSet

Object Information:
MethyLumiSet (storageMode: lockedEnvironment)
assayData: 485577 features, 27 samples 
  element names: betas, methylated, methylated.OOB, pvals, unmethylated, unmethylated.OOB 
protocolData: none
phenoData
  sampleNames: 9976861136_R01C01 9976861136_R02C01 ... 9482801028_R03C01 (27 total)
  varLabels: Sample_Name Sample_Well ... barcodes (10 total)
  varMetadata: labelDescription
featureData
  featureNames: cg00000029 cg00000108 ... rs9839873 (485577 total)
  fvarLabels: Probe_ID DESIGN COLOR_CHANNEL
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation: IlluminaHumanMethylation450k 
Major Operation History:
            submitted            finished
1 2015-06-22 16:03:08 2015-06-22 16:04:11
2 2015-06-22 16:04:15 2015-06-22 16:04:21
                                                                                                     command
1 methylumIDAT(barcodes = barcodes.YO, pdat = phenoData, idatPath = "/Users/Martens/Desktop/08272014/IDATs")
2  Subset of 485577 features.

 

When I try to duplicate your table above I get the following:

> table(fData(methyLumiSet)$CHROM)

< table of extent 0 >

 

 

 

 

 

 

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by martens.christopher0
1) that's a GoldenGate example, there are no chrY probes on that platform 2) here is a vignette-length 450k example for methylumIDAT (which I should point to, now that you mention it, in the man page for methylumIDAT): http://www.bioconductor.org/packages/release/bioc/vignettes/methylumi/inst/doc/methylumi450k.pdf i.e., R> suppressPackageStartupMessages(require('methylumi')) R> suppressPackageStartupMessages(require('TCGAMethylation450k')) R> suppressPackageStartupMessages(require('FDb.InfiniumMethylation.hg19')) R> idatPath <- system.file('extdata/idat',package='TCGAMethylation450k') R> mset450k <- methylumIDAT(getBarcodes(path=idatPath), idatPath=idatPath) ## 0 HumanMethylation27 samples found ## 10 HumanMethylation450 samples found ## Attempting to extract protocolData() from list... ## Determining chip type from IDAT protocolData... R> sampleNames(mset450k) <- paste0('TCGA', 1:ncol(mset450k)) 3) a heroically annoying way to verify that the chrY probes are there (per above): R> keepSeqlevels(mapToGenome(mset450k), 'chrY') class: GenomicMethylSet dim: 416 10 metadata(0): assays(2): Meth Unmeth rownames(416): cg13654344 cg04169747 ... cg08265308 cg14273923 rowRanges metadata column names(0): colnames(10): TCGA1 TCGA2 ... TCGA9 TCGA10 colData names(1): barcode Annotation array: IlluminaHumanMethylation450k annotation: ilmn12.hg19 Preprocessing Method: methylumi minfi version: 1.15.9 Manifest version: 0.4 Once upon a time we actually kept chromosomes on the methyLumiSet objects, but frankly it's (IMHO) more sensible to hustle them into a SummarizedExperiment-like object as fast as possible. YMMV... Statistics is the grammar of science. Karl Pearson <http: en.wikipedia.org="" wiki="" the_grammar_of_science=""> On Mon, Jun 22, 2015 at 2:15 PM, James W. MacDonald [bioc] < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User James W. MacDonald <https: support.bioconductor.org="" u="" 5106=""/> wrote Answer: > Removing sex-chromosome probes from methyLumiSet > <https: support.bioconductor.org="" p="" 69035="" #69040="">: > > Using the example for methylumiR() (since there isn't a runnable example > for methylumIDAT() and I don't feel like rummaging around for some IDAT > files), I get this: > > > example(methylumiR) > > mthylR> ## Read in sample information > mthylR> samps <- read.table(system.file("extdata/samples.txt", > mthylR+ package = "methylumi"),sep="\t",header=TRUE) > > mthylR> ## Perform the actual data reading > mthylR> ## This is an example of reading data from an > mthylR> ## Sentrix Array format file (actually two files, > mthylR> ## one for data and one for QC probes) > mthylR> mldat <- methylumiR(system.file('extdata/exampledata.samples.txt',package='methylumi'), > mthylR+ qcfile=system.file('extdata/exampledata.controls.txt',package="methylumi"), > mthylR+ sampleDescriptions=samps) > > mthylR> mldat > > Object Information: > MethyLumiSet (storageMode: lockedEnvironment) > assayData: 1536 features, 10 samples > element names: Avg_NBEADS, BEAD_STDERR, betas, methylated, pvals, unmethylated > protocolData: none > phenoData > sampleNames: M_1 M_2 ... F_10 (10 total) > varLabels: sampleID SampleLabel Sample Gender > varMetadata: labelDescription > featureData > featureNames: AATK_E63_R AATK_P519_R ... ZP3_P220_F (1536 total) > fvarLabels: TargetID ProbeID ... PRODUCT (17 total) > fvarMetadata: labelDescription > experimentData: use 'experimentData(object)' > Annotation: > Major Operation History: > submitted finished > 1 2015-06-22 14:13:20 2015-06-22 14:13:20 > command > 1 methylumiR(filename = system.file("extdata/exampledata.samples.txt", package = "methylumi"), qcfile = system.file("extdata/exampledata.controls.txt", package = "methylumi"), sampleDescriptions = samps) > > > And we can see what chromosomes are being measured: > > > table(fData(mldat)$CHROM) > > 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 3 4 5 6 7 > 122 46 113 77 26 37 62 41 76 36 75 89 36 23 25 89 66 70 98 122 > 8 9 X > 58 62 87 > > Which may be why you are having problems filtering on ChrY. > > > > ------------------------------ > > You may reply via email or visit > A: Removing sex-chromosome probes from methyLumiSet >
ADD REPLYlink written 3.2 years ago by Tim Triche4.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 127 users visited in the last hour