minfi - how to subset a GenomicMethylSet by chromosome/seqname
1
0
Entering edit mode
Toby Hughes ▴ 30
@toby-hughes-5699
Last seen 9.6 years ago
Hi all, I am relatively new to both R and the Bioconductor packages, so apologies if this is a naïve question. I am using minfi to preprocess a small set (60 samples) of Illumina450k microarray data. I have got to the point where I have a GenomicMethylSet, following the use of mapToGenome on a MethylSet to provide annotation, having excluded those probes without a defined position in the process. I now wish to exclude probes from the sex chromosomes before using dmpFinder, and so I am seeking to subset the GenomicMethylSet to the autosomes only. Despite days of searching and a number of references saying this is relatively straightforward, I have not yet had any success (have been exploring GenomicRanges). Could someone please point me in the right direction with a code snippet or a reference? Kind regards, Toby. [[alternative HTML version deleted]]
Microarray Annotation PROcess minfi Microarray Annotation PROcess minfi • 2.4k views
ADD COMMENT
0
Entering edit mode
Tim Triche ★ 4.2k
@tim-triche-3561
Last seen 3.6 years ago
United States
I do this all the time with SEs, so here's a couple of options. ## keepSeqlevels generic setMethod("keepSeqlevels", signature(x="SummarizedExperiment", value="ANY"), function(x, value) { # {{{ y <- which(rownames(x) %in% names(keepSeqlevels(rowData(x),value))) return(x[y, ]) }) # }}} gmset.noXY <- keepSeqlevels(gmset, paste0('chr', 1:22)) ## or use split; this is slower and balkier than keepSeqlevels gmset.noXY <- split(gmset, as.vector(seqnames(gmset)))[ paste0('chr', 1:22) ] ## bonus: masking, cf. MM load("maskedRegions.rda") gmset <- gmset[ countOverlaps(gmset, maskedRegions) == 0, ] ## or perhaps gmset[ !gmset %within% maskedRegions, ] ... ? there is a small pile of functions like this (combine(), combine(, withNAs=TRUE), keepSeqlevels(), etc.) I'd like to see as generics for all SummarizedExperiments in GenomicRanges. Let the appropriate people know if you agree with this notion and any pitfalls you foresee --t On Thu, Jan 10, 2013 at 5:01 AM, Toby Hughes <toby.hughes@adelaide.edu.au>wrote: > Hi all, > > I am relatively new to both R and the Bioconductor packages, so apologies > if this is a naïve question. I am using minfi to preprocess a small set > (60 samples) of Illumina450k microarray data. I have got to the point > where I have a GenomicMethylSet, following the use of mapToGenome on a > MethylSet to provide annotation, having excluded those probes without a > defined position in the process. I now wish to exclude probes from the sex > chromosomes before using dmpFinder, and so I am seeking to subset the > GenomicMethylSet to the autosomes only. Despite days of searching and a > number of references saying this is relatively straightforward, I have not > yet had any success (have been exploring GenomicRanges). Could someone > please point me in the right direction with a code snippet or a reference? > > Kind regards, > > Toby. > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
On 01/10/2013 10:15 AM, Tim Triche, Jr. wrote: > I do this all the time with SEs, so here's a couple of options. > > ## keepSeqlevels generic > setMethod("keepSeqlevels", signature(x="SummarizedExperiment", value="ANY"), > function(x, value) { # {{{ > y <- which(rownames(x) %in% > names(keepSeqlevels(rowData(x),value))) > return(x[y, ]) > }) # }}} > gmset.noXY <- keepSeqlevels(gmset, paste0('chr', 1:22)) ?seqlevels<- has comparable behavior (in-place restriction to a subset of levels); after example(SummarizedExperiment) > seqlevels(sset) [1] "chr1" "chr2" > seqlevels(sset, force = TRUE) <- "chr2" > seqlevels(sset) [1] "chr2" > > ## or use split; this is slower and balkier than keepSeqlevels > gmset.noXY <- split(gmset, as.vector(seqnames(gmset)))[ paste0('chr', 1:22) > ] > > ## bonus: masking, cf. MM > load("maskedRegions.rda") > gmset <- gmset[ countOverlaps(gmset, maskedRegions) == 0, ] > > ## or perhaps gmset[ !gmset %within% maskedRegions, ] ... ? > > there is a small pile of functions like this (combine(), combine(, > withNAs=TRUE), keepSeqlevels(), etc.) I'd like to see as generics for all > SummarizedExperiments in GenomicRanges. Let the appropriate people know if > you agree with this notion and any pitfalls you foresee > > > --t > > > > > > On Thu, Jan 10, 2013 at 5:01 AM, Toby Hughes <toby.hughes at="" adelaide.edu.au="">wrote: > >> Hi all, >> >> I am relatively new to both R and the Bioconductor packages, so apologies >> if this is a na?ve question. I am using minfi to preprocess a small set >> (60 samples) of Illumina450k microarray data. I have got to the point >> where I have a GenomicMethylSet, following the use of mapToGenome on a >> MethylSet to provide annotation, having excluded those probes without a >> defined position in the process. I now wish to exclude probes from the sex >> chromosomes before using dmpFinder, and so I am seeking to subset the >> GenomicMethylSet to the autosomes only. Despite days of searching and a >> number of references saying this is relatively straightforward, I have not >> yet had any success (have been exploring GenomicRanges). Could someone >> please point me in the right direction with a code snippet or a reference? >> >> Kind regards, >> >> Toby. >> >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
Thanks Martin. Will try this one as well. Kind regards, Toby. -----Original Message----- From: Martin Morgan [mailto:mtmorgan@fhcrc.org] Sent: Friday, 11 January 2013 5:02 AM To: ttriche at usc.edu Cc: Tim Triche, Jr.; Toby Hughes; bioconductor at r-project.org Subject: Re: [BioC] minfi - how to subset a GenomicMethylSet by chromosome/seqname On 01/10/2013 10:15 AM, Tim Triche, Jr. wrote: > I do this all the time with SEs, so here's a couple of options. > > ## keepSeqlevels generic > setMethod("keepSeqlevels", signature(x="SummarizedExperiment", value="ANY"), > function(x, value) { # {{{ > y <- which(rownames(x) %in% > names(keepSeqlevels(rowData(x),value))) > return(x[y, ]) > }) # }}} > gmset.noXY <- keepSeqlevels(gmset, paste0('chr', 1:22)) ?seqlevels<- has comparable behavior (in-place restriction to a subset of levels); after example(SummarizedExperiment) > seqlevels(sset) [1] "chr1" "chr2" > seqlevels(sset, force = TRUE) <- "chr2" > seqlevels(sset) [1] "chr2" > > ## or use split; this is slower and balkier than keepSeqlevels > gmset.noXY <- split(gmset, as.vector(seqnames(gmset)))[ paste0('chr', > 1:22) ] > > ## bonus: masking, cf. MM > load("maskedRegions.rda") > gmset <- gmset[ countOverlaps(gmset, maskedRegions) == 0, ] > > ## or perhaps gmset[ !gmset %within% maskedRegions, ] ... ? > > there is a small pile of functions like this (combine(), combine(, > withNAs=TRUE), keepSeqlevels(), etc.) I'd like to see as generics for > all SummarizedExperiments in GenomicRanges. Let the appropriate > people know if you agree with this notion and any pitfalls you foresee > > > --t > > > > > > On Thu, Jan 10, 2013 at 5:01 AM, Toby Hughes <toby.hughes at="" adelaide.edu.au="">wrote: > >> Hi all, >> >> I am relatively new to both R and the Bioconductor packages, so >> apologies if this is a na?ve question. I am using minfi to >> preprocess a small set >> (60 samples) of Illumina450k microarray data. I have got to the >> point where I have a GenomicMethylSet, following the use of >> mapToGenome on a MethylSet to provide annotation, having excluded >> those probes without a defined position in the process. I now wish >> to exclude probes from the sex chromosomes before using dmpFinder, >> and so I am seeking to subset the GenomicMethylSet to the autosomes >> only. Despite days of searching and a number of references saying >> this is relatively straightforward, I have not yet had any success >> (have been exploring GenomicRanges). Could someone please point me in the right direction with a code snippet or a reference? >> >> Kind regards, >> >> Toby. >> >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
Thanks Tim. I went with option one, and it worked perfectly. I will spend some time working with the other options shortly. Kind regards, Toby. From: Tim Triche, Jr. [mailto:tim.triche@gmail.com] Sent: Friday, 11 January 2013 4:46 AM To: Toby Hughes Cc: bioconductor@r-project.org Subject: Re: [BioC] minfi - how to subset a GenomicMethylSet by chromosome/seqname I do this all the time with SEs, so here's a couple of options. ## keepSeqlevels generic setMethod("keepSeqlevels", signature(x="SummarizedExperiment", value="ANY"), function(x, value) { # {{{ y <- which(rownames(x) %in% names(keepSeqlevels(rowData(x),value))) return(x[y, ]) }) # }}} gmset.noXY <- keepSeqlevels(gmset, paste0('chr', 1:22)) ## or use split; this is slower and balkier than keepSeqlevels gmset.noXY <- split(gmset, as.vector(seqnames(gmset)))[ paste0('chr', 1:22) ] ## bonus: masking, cf. MM load("maskedRegions.rda") gmset <- gmset[ countOverlaps(gmset, maskedRegions) == 0, ] ## or perhaps gmset[ !gmset %within% maskedRegions, ] ... ? there is a small pile of functions like this (combine(), combine(, withNAs=TRUE), keepSeqlevels(), etc.) I'd like to see as generics for all SummarizedExperiments in GenomicRanges. Let the appropriate people know if you agree with this notion and any pitfalls you foresee --t On Thu, Jan 10, 2013 at 5:01 AM, Toby Hughes <toby.hughes@adelaide.edu.au<mailto:toby.hughes@adelaide.edu.au>> wrote: Hi all, I am relatively new to both R and the Bioconductor packages, so apologies if this is a naïve question. I am using minfi to preprocess a small set (60 samples) of Illumina450k microarray data. I have got to the point where I have a GenomicMethylSet, following the use of mapToGenome on a MethylSet to provide annotation, having excluded those probes without a defined position in the process. I now wish to exclude probes from the sex chromosomes before using dmpFinder, and so I am seeking to subset the GenomicMethylSet to the autosomes only. Despite days of searching and a number of references saying this is relatively straightforward, I have not yet had any success (have been exploring GenomicRanges). Could someone please point me in the right direction with a code snippet or a reference? Kind regards, Toby. [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- A model is a lie that helps you see the truth. Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 840 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6