minfi release 1.8

0

Entering edit mode

Tim Triche ★ 4.2k

@tim-triche-3561

Last seen 3.6 years ago

United States

There are at least 3 different ways you could go about this; Tiffany Morris' probe variant annotation package accompanying ChAMP is probably the most comprehensive, while (I claim) the UCSC common variant track is the most trivial and the SNPlocs packages are the most intensive. Let's assume your data is in a GenomicRatioSet named grSet for all examples. Writing and testing out the code for this took a little while so I do hope you'll work through each example. All things considered, I'd recommend option #1, but it would be even better if the maintainers tweaked a few settings so that bsseq::data.frame2GRanges( probe.450K.VCs.af) was more trivial. (Also: consider merging $diVC.com1pop.F and $diVC.com1pop.R, making everything that can reasonably be so boolean, etc... GRanges are so much easier to work with than data.frames when dealing with genomic coordinates) You'll need to supply your own grSet, but if you didn't have one, I don't imagine you'd have asked :-) I have provided the dimensions (rows x columns) of the filtered grSets for comparison in each case. You will have to decide what degree of conservatism is most appropriate for your experiment. Read the docs! 1) Use the ProbeVariants package alliuded to above. This is comprehensive, but a bit of a PITA. require(minfi) grSet ## class: GenomicRatioSet ## dim: 485512 34 require(Illumina450ProbeVariants.db) ?probe.450K.VCs.af ## read the docs! dataprobe.450K.VCs.af) commonSnpInProbe <- rownamesprobe.450K.VCs.af)[ probe.450K.VCs.af$probe50VC.com1pop > 0 ] grSet.noCommonSNPsInProbes <- grSet[ setdiff(rownames(grSet), commonSnpInProbe), ] grSet.noCommonSNPsInProbes ## class: GenomicRatioSet ## dim: 308901 34 commonVariantsEitherStrand <- !is.naprobe.450K.VCs.af$diVC.com1pop.F)|! is.naprobe.450K.VCs.af$diVC.com1pop.R) CpGsWithCommonVariants <- rownamesprobe.450K.VCs.af)[ commonVariantsEitherStrand ] grSet.noCommonVariantsAtCpGs <- grSet[ setdiff(rownames(grSet),CpGsWithCommonVariants), ] grSet.noCommonVariantsAtCpGs ## class: GenomicRatioSet ## dim: 445502 34 Consult the documentation for finer control over the process (e.g. specific populations, etc.). ?probe.450K.VCs.af ## read the docs again It would not break my heart if the maintainers turned the data.frame here into a GRanges; a few small changes followed by data.frame2GRanges() would do the trick. But still, it's a very handy compilation. See below for the reason I prefer GRanges (in short, because I'm impatient and don't like debugging). 2) Use the FDb package (> 1% MAF across all populations, dbSNP build 135). This is a one-liner. require(minfi) grSet ## class: GenomicRatioSet ## dim: 485512 34 require(FDb.UCSC.snp135common.hg19) commonSNPs <- features(FDb.UCSC.snp135common.hg19) ## With a GRanges object, the previous rigamarole becomes a one-liner: grSet.noCommonSNPsAtCpGs <- grSet[ grSet %outside% commonSNPs, ] grSet.noCommonSNPsAtCpGs ## class: GenomicRatioSet ## dim: 478385 34 This was my workaround from 2-3 years ago to permit masking of TCGA Level 3 methylation data. It still works and it still works well, but it's been superseded (IMHO) by more flexible approaches like #1. 3) use SNPlocs (all SNPs in dbSNP; mildly annoying complication with 'ch' vs. 'chr' seqlevels) require(minfi) grSet ## class: GenomicRatioSet ## dim: 485512 34 ## work around the annoyance: chroms <- seqlevels(grSet) names(chroms) <- chroms chroms <- gsub('chr', 'ch', chroms) require(SNPlocs.Hsapiens.dbSNP.20120608) SNPs.byChr <- GRangesList(lapply(chroms, getSNPlocs, as.GRanges=TRUE)) ## time passes... seqlevels(SNPs.byChr) <- gsub('ch', 'chr', seqlevels(SNPs.byChr)) ## back to normal genome(SNPs.byChr) <- 'hg19' ## GRCh37.p5 coordinates are identical to hg19 save for chrMT ## once the above hoops have been jumped through, it's back to one- liners: grSet.noSNPsAtCpGs <- grSet[ grSet %outside% SNPs.byChr, ] grSet.noSNPsAtCpGs ## class: GenomicRatioSet ## dim: 444722 34 This used to be documented in the minfi code/manual somewhere, though I don't know if it still is. Statistics is the grammar of science. Karl Pearson <http: en.wikipedia.org="" wiki="" the_grammar_of_science=""> On Fri, Jan 31, 2014 at 1:06 PM, C T <offsubs@gmail.com> wrote: > Any tutorial on how to remove probes that contains SNPs? > > On Tuesday, November 12, 2013 7:12:46 PM UTC-5, Kasper Hansen wrote: > > > > As part of Bioconductor 2.13, we have released minfi 1.8.x. Due to a > > number of last minute errors, the recommended version is 1.8.3 (or > bigger). > > > > Users may find that their old objects cannot be linked to annotation. > > Please run > > OBJECT = updateObject(OBJECT) > > to fix this. > > > > Highlights include > > * preprocessingQuantile(): an independent implementation of the same > ideas > > as in Tost et al. > > * bumphunter() for finding DMRs > > * blockFinder() for finding large hypo-methylated blocks on the 450k > array. > > * estimateCellCounts() for estimating cell type composition for whole > > blood samples. The function can be extended to work on other types of > > cells, provided suitable flow sorted data is available. > > * the annotation now includes SNP annotation for dbSNP v132, 135 and 137, > > independently annotated at JHU. > > * getSex(): you can now get sex repeatedly, irrespective of relationship > > status. > > * minfiQC: find and remove outlier samples based on a sample QC criteria > > we have found effective. > > > > Unfortunately, none of these handy changes are yet detailed in the > > vignette; we are working on this. > > > > A manuscript is in review detailing most of these functions. > > > > Full NEWS below > > > > Best, > > Kasper D Hansen > > > > o Added getMethSignal(), a convenience function for programming. > > > > o Changed the argument name of "type" to "what" for > getMethSignal(). > > > > o Added the class "RatioSet", like "GenomicRatioSet" but without > the > > genome information. > > > > o Bugfixes to the "GenomicRatioSet()" constructor. > > > > o Added the method ratioConvert(), for converting a "MethylSet" to > a > > "RatioSet" or a "GenomicMethylSet" to a "GenomicRatioSet". > > > > o Fixed an issue with GenomicMethylSet() and GenomicRatioSet() > caused > > by a recent change to a non-exported function in the > GenomicRanges > > package (Reported by Gustavo Fernandez Bayon <gba...@gmail.com> <javascript:> > > >). > > > > o Added fixMethOutliers for thresholding extreme observations in > the > > [un]methylation channels. > > > > o Added getSex, addSex, plotSex for estimating sex of the samples. > > > > o Added getQC, addQC, plotQC for a very simple quality control > > measure. > > > > o Added minfiQC for a one-stop function for quality control > measures. > > > > o Changed some verbose=TRUE output in various functions. > > > > o Added preprocessQuantile. > > > > o Added bumphunter method for "GenomicRatioSet". > > > > o Handling signed zero in minfi:::.digestMatrix which caused unit > > tests to fail on Windows. > > > > o addSex and addQC lead to sampleNames() being dropped because of a > > likely bug in cbind(DataFrame, DataFrame). Work-around has been > > implemented. > > > > o Re-ran the test data generator. > > > > o Fixed some Depends and Imports issues revealed by new features > of R > > CMD check. > > > > o Added blockFinder and cpgCollapse. > > > > o (internal) added convenience functions for argument checking. > > > > o Exposed and re-wrote getAnnotation(). > > > > o Changed getLocations() from being a method to a simple function. > > Arguments have been removed (for example, now the function always > > drops non-mapping loci). > > > > o Implemented getIslandStatus(), getProbeType(), getSnpInfo() and > > addSnpInfo(). The two later functions retrieve pre- computed SNP > > overlaps, and the new annotation object includes SNPs based on > > dbSNP 137, 135 and 132. > > > > o Changed the IlluminaMethylatioAnnotation class to now include > > genomeBuild information as well as defaults. > > > > o Added estimateCellCounts for deconvolution of cell types in whole > > blood. Thanks to Andrew Jaffe and Andres Houseman. > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]

SNP Annotation GO probe SNPlocs PROcess minfi bumphunter ChAMP SNP Annotation GO probe • 1.9k views

ADD COMMENT • link updated 10.2 years ago by Smeeta Shrestha ▴ 20 • written 10.2 years ago by Tim Triche ★ 4.2k

0

Entering edit mode

Tim Triche ★ 4.2k

@tim-triche-3561

Last seen 3.6 years ago

United States

nb. To clarify, > This was my workaround from 2-3 years ago to permit masking of TCGA Level 3 methylation data. We ended up masking probes with a common SNP <= 15bp from the interrogated CpG, or the entire 3' >= 15bp of the probe falling into known repeat regions from RepeatMasker, or on sex chromosomes, for the TCGA level 3 data. But if you want to use TCGA data I suggest that you start from IDATs and determine what degree of masking, what type of preprocessing, etc. is appropriate for your own experiment. That's why I redefined "level 1" data as raw IDATs sometime back in 2011 (with help from Anna Chu @ NIH). Raw should mean raw, no more and no less, and it's always better to start with the raw data (IMHO). When IDATs are available for other datasets (e.g. GEO) my suggestion is the same: use them. JMHO, --t Statistics is the grammar of science. Karl Pearson <http: en.wikipedia.org="" wiki="" the_grammar_of_science=""> On Sat, Feb 1, 2014 at 11:06 AM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > There are at least 3 different ways you could go about this; Tiffany > Morris' probe variant annotation package accompanying ChAMP is probably the > most comprehensive, while (I claim) the UCSC common variant track is the > most trivial and the SNPlocs packages are the most intensive. Let's assume > your data is in a GenomicRatioSet named grSet for all examples. > > Writing and testing out the code for this took a little while so I do hope > you'll work through each example. > > All things considered, I'd recommend option #1, but it would be even > better if the maintainers tweaked a few settings so that > bsseq::data.frame2GRangesprobe.450K.VCs.af) was more trivial. (Also: > consider merging $diVC.com1pop.F and $diVC.com1pop.R, making everything > that can reasonably be so boolean, etc... GRanges are so much easier to > work with than data.frames when dealing with genomic coordinates) > > You'll need to supply your own grSet, but if you didn't have one, I don't > imagine you'd have asked :-) > I have provided the dimensions (rows x columns) of the filtered grSets for > comparison in each case. You will have to decide what degree of > conservatism is most appropriate for your experiment. Read the docs! > > > 1) Use the ProbeVariants package alliuded to above. This is > comprehensive, but a bit of a PITA. > > require(minfi) > grSet > ## class: GenomicRatioSet > ## dim: 485512 34 > > require(Illumina450ProbeVariants.db) > ?probe.450K.VCs.af ## read the docs! > dataprobe.450K.VCs.af) > > commonSnpInProbe <- rownamesprobe.450K.VCs.af)[ probe.450K.VCs.af$probe50VC.com1pop > > 0 ] > grSet.noCommonSNPsInProbes <- grSet[ > setdiff(rownames(grSet), commonSnpInProbe), ] > grSet.noCommonSNPsInProbes > ## class: GenomicRatioSet > ## dim: 308901 34 > > commonVariantsEitherStrand <- !is.naprobe.450K.VCs.af$diVC.com1pop.F)|! > is.naprobe.450K.VCs.af$diVC.com1pop.R) > CpGsWithCommonVariants <- rownamesprobe.450K.VCs.af)[ commonVariantsEitherStrand > ] > grSet.noCommonVariantsAtCpGs <- grSet[ > setdiff(rownames(grSet),CpGsWithCommonVariants), ] > grSet.noCommonVariantsAtCpGs > ## class: GenomicRatioSet > ## dim: 445502 34 > > Consult the documentation for finer control over the process (e.g. > specific populations, etc.). > > ?probe.450K.VCs.af ## read the docs again > > It would not break my heart if the maintainers turned the data.frame here > into a GRanges; a few small changes followed by data.frame2GRanges() would > do the trick. But still, it's a very handy compilation. > See below for the reason I prefer GRanges (in short, because I'm impatient > and don't like debugging). > > > 2) Use the FDb package (> 1% MAF across all populations, dbSNP build 135). > This is a one-liner. > > require(minfi) > grSet > ## class: GenomicRatioSet > ## dim: 485512 34 > > require(FDb.UCSC.snp135common.hg19) > commonSNPs <- features(FDb.UCSC.snp135common.hg19) > > ## With a GRanges object, the previous rigamarole becomes a one- liner: > grSet.noCommonSNPsAtCpGs <- grSet[ grSet %outside% commonSNPs, ] > > grSet.noCommonSNPsAtCpGs > ## class: GenomicRatioSet > ## dim: 478385 34 > > This was my workaround from 2-3 years ago to permit masking of TCGA Level > 3 methylation data. It still works and it still works well, but it's been > superseded (IMHO) by more flexible approaches like #1. > > > 3) use SNPlocs (all SNPs in dbSNP; mildly annoying complication with 'ch' > vs. 'chr' seqlevels) > > require(minfi) > grSet > ## class: GenomicRatioSet > ## dim: 485512 34 > > ## work around the annoyance: > chroms <- seqlevels(grSet) > names(chroms) <- chroms > chroms <- gsub('chr', 'ch', chroms) > require(SNPlocs.Hsapiens.dbSNP.20120608) > SNPs.byChr <- GRangesList(lapply(chroms, getSNPlocs, as.GRanges=TRUE)) > ## time passes... > seqlevels(SNPs.byChr) <- gsub('ch', 'chr', seqlevels(SNPs.byChr)) ## back > to normal > genome(SNPs.byChr) <- 'hg19' ## GRCh37.p5 coordinates are identical to > hg19 save for chrMT > > ## once the above hoops have been jumped through, it's back to one- liners: > grSet.noSNPsAtCpGs <- grSet[ grSet %outside% SNPs.byChr, ] > grSet.noSNPsAtCpGs > ## class: GenomicRatioSet > ## dim: 444722 34 > > This used to be documented in the minfi code/manual somewhere, though I > don't know if it still is. > > > Statistics is the grammar of science. > Karl Pearson <http: en.wikipedia.org="" wiki="" the_grammar_of_science=""> > > > On Fri, Jan 31, 2014 at 1:06 PM, C T <offsubs@gmail.com> wrote: > >> Any tutorial on how to remove probes that contains SNPs? >> >> On Tuesday, November 12, 2013 7:12:46 PM UTC-5, Kasper Hansen wrote: >> > >> > As part of Bioconductor 2.13, we have released minfi 1.8.x. Due to a >> > number of last minute errors, the recommended version is 1.8.3 (or >> bigger). >> > >> > Users may find that their old objects cannot be linked to annotation. >> > Please run >> > OBJECT = updateObject(OBJECT) >> > to fix this. >> > >> > Highlights include >> > * preprocessingQuantile(): an independent implementation of the same >> ideas >> > as in Tost et al. >> > * bumphunter() for finding DMRs >> > * blockFinder() for finding large hypo-methylated blocks on the 450k >> array. >> > * estimateCellCounts() for estimating cell type composition for whole >> > blood samples. The function can be extended to work on other types of >> > cells, provided suitable flow sorted data is available. >> > * the annotation now includes SNP annotation for dbSNP v132, 135 and >> 137, >> > independently annotated at JHU. >> > * getSex(): you can now get sex repeatedly, irrespective of relationship >> > status. >> > * minfiQC: find and remove outlier samples based on a sample QC criteria >> > we have found effective. >> > >> > Unfortunately, none of these handy changes are yet detailed in the >> > vignette; we are working on this. >> > >> > A manuscript is in review detailing most of these functions. >> > >> > Full NEWS below >> > >> > Best, >> > Kasper D Hansen >> > >> > o Added getMethSignal(), a convenience function for programming. >> > >> > o Changed the argument name of "type" to "what" for >> getMethSignal(). >> > >> > o Added the class "RatioSet", like "GenomicRatioSet" but without >> the >> > genome information. >> > >> > o Bugfixes to the "GenomicRatioSet()" constructor. >> > >> > o Added the method ratioConvert(), for converting a "MethylSet" >> to a >> > "RatioSet" or a "GenomicMethylSet" to a "GenomicRatioSet". >> > >> > o Fixed an issue with GenomicMethylSet() and GenomicRatioSet() >> caused >> > by a recent change to a non-exported function in the >> GenomicRanges >> > package (Reported by Gustavo Fernandez Bayon <gba...@gmail.com>> <javascript:> >> > >). >> > >> > o Added fixMethOutliers for thresholding extreme observations in >> the >> > [un]methylation channels. >> > >> > o Added getSex, addSex, plotSex for estimating sex of the samples. >> > >> > o Added getQC, addQC, plotQC for a very simple quality control >> > measure. >> > >> > o Added minfiQC for a one-stop function for quality control >> measures. >> > >> > o Changed some verbose=TRUE output in various functions. >> > >> > o Added preprocessQuantile. >> > >> > o Added bumphunter method for "GenomicRatioSet". >> > >> > o Handling signed zero in minfi:::.digestMatrix which caused unit >> > tests to fail on Windows. >> > >> > o addSex and addQC lead to sampleNames() being dropped because of >> a >> > likely bug in cbind(DataFrame, DataFrame). Work-around has been >> > implemented. >> > >> > o Re-ran the test data generator. >> > >> > o Fixed some Depends and Imports issues revealed by new features >> of R >> > CMD check. >> > >> > o Added blockFinder and cpgCollapse. >> > >> > o (internal) added convenience functions for argument checking. >> > >> > o Exposed and re-wrote getAnnotation(). >> > >> > o Changed getLocations() from being a method to a simple function. >> > Arguments have been removed (for example, now the function >> always >> > drops non-mapping loci). >> > >> > o Implemented getIslandStatus(), getProbeType(), getSnpInfo() and >> > addSnpInfo(). The two later functions retrieve pre- computed SNP >> > overlaps, and the new annotation object includes SNPs based on >> > dbSNP 137, 135 and 132. >> > >> > o Changed the IlluminaMethylatioAnnotation class to now include >> > genomeBuild information as well as defaults. >> > >> > o Added estimateCellCounts for deconvolution of cell types in >> whole >> > blood. Thanks to Andrew Jaffe and Andres Houseman. >> > >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > [[alternative HTML version deleted]]

ADD COMMENT • link 10.2 years ago Tim Triche ★ 4.2k

0

Entering edit mode

Thank you, very much! This is very valuable. Thank you for taking the time sharing this. I will work through your examples and start from there. Cen On Sat, Feb 1, 2014 at 2:13 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > nb. To clarify, > > > This was my workaround from 2-3 years ago to permit masking of TCGA > Level 3 methylation data. > > We ended up masking probes with a common SNP <= 15bp from the interrogated > CpG, or the entire 3' >= 15bp of the probe falling into known repeat > regions from RepeatMasker, or on sex chromosomes, for the TCGA level 3 > data. But if you want to use TCGA data I suggest that you start from IDATs > and determine what degree of masking, what type of preprocessing, etc. is > appropriate for your own experiment. That's why I redefined "level 1" data > as raw IDATs sometime back in 2011 (with help from Anna Chu @ NIH). Raw > should mean raw, no more and no less, and it's always better to start with > the raw data (IMHO). > > When IDATs are available for other datasets (e.g. GEO) my suggestion is > the same: use them. > > JMHO, > > --t > > Statistics is the grammar of science. > Karl Pearson <http: en.wikipedia.org="" wiki="" the_grammar_of_science=""> > > > On Sat, Feb 1, 2014 at 11:06 AM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > >> There are at least 3 different ways you could go about this; Tiffany >> Morris' probe variant annotation package accompanying ChAMP is probably the >> most comprehensive, while (I claim) the UCSC common variant track is the >> most trivial and the SNPlocs packages are the most intensive. Let's assume >> your data is in a GenomicRatioSet named grSet for all examples. >> >> Writing and testing out the code for this took a little while so I do >> hope you'll work through each example. >> >> All things considered, I'd recommend option #1, but it would be even >> better if the maintainers tweaked a few settings so that >> bsseq::data.frame2GRangesprobe.450K.VCs.af) was more trivial. (Also: >> consider merging $diVC.com1pop.F and $diVC.com1pop.R, making everything >> that can reasonably be so boolean, etc... GRanges are so much easier to >> work with than data.frames when dealing with genomic coordinates) >> >> You'll need to supply your own grSet, but if you didn't have one, I don't >> imagine you'd have asked :-) >> I have provided the dimensions (rows x columns) of the filtered grSets >> for comparison in each case. You will have to decide what degree of >> conservatism is most appropriate for your experiment. Read the docs! >> >> >> 1) Use the ProbeVariants package alliuded to above. This is >> comprehensive, but a bit of a PITA. >> >> require(minfi) >> grSet >> ## class: GenomicRatioSet >> ## dim: 485512 34 >> >> require(Illumina450ProbeVariants.db) >> ?probe.450K.VCs.af ## read the docs! >> dataprobe.450K.VCs.af) >> >> commonSnpInProbe <- rownamesprobe.450K.VCs.af)[ probe.450K.VCs.af$probe50VC.com1pop >> > 0 ] >> grSet.noCommonSNPsInProbes <- grSet[ >> setdiff(rownames(grSet), commonSnpInProbe), ] >> grSet.noCommonSNPsInProbes >> ## class: GenomicRatioSet >> ## dim: 308901 34 >> >> commonVariantsEitherStrand <- !is.naprobe.450K.VCs.af$diVC.com1pop.F)|! >> is.naprobe.450K.VCs.af$diVC.com1pop.R) >> CpGsWithCommonVariants <- rownamesprobe.450K.VCs.af)[ commonVariantsEitherStrand >> ] >> grSet.noCommonVariantsAtCpGs <- grSet[ >> setdiff(rownames(grSet),CpGsWithCommonVariants), ] >> grSet.noCommonVariantsAtCpGs >> ## class: GenomicRatioSet >> ## dim: 445502 34 >> >> Consult the documentation for finer control over the process (e.g. >> specific populations, etc.). >> >> ?probe.450K.VCs.af ## read the docs again >> >> It would not break my heart if the maintainers turned the data.frame here >> into a GRanges; a few small changes followed by data.frame2GRanges() would >> do the trick. But still, it's a very handy compilation. >> See below for the reason I prefer GRanges (in short, because I'm >> impatient and don't like debugging). >> >> >> 2) Use the FDb package (> 1% MAF across all populations, dbSNP build >> 135). This is a one-liner. >> >> require(minfi) >> grSet >> ## class: GenomicRatioSet >> ## dim: 485512 34 >> >> require(FDb.UCSC.snp135common.hg19) >> commonSNPs <- features(FDb.UCSC.snp135common.hg19) >> >> ## With a GRanges object, the previous rigamarole becomes a one- liner: >> grSet.noCommonSNPsAtCpGs <- grSet[ grSet %outside% commonSNPs, ] >> >> grSet.noCommonSNPsAtCpGs >> ## class: GenomicRatioSet >> ## dim: 478385 34 >> >> This was my workaround from 2-3 years ago to permit masking of TCGA Level >> 3 methylation data. It still works and it still works well, but it's been >> superseded (IMHO) by more flexible approaches like #1. >> >> >> 3) use SNPlocs (all SNPs in dbSNP; mildly annoying complication with 'ch' >> vs. 'chr' seqlevels) >> >> require(minfi) >> grSet >> ## class: GenomicRatioSet >> ## dim: 485512 34 >> >> ## work around the annoyance: >> chroms <- seqlevels(grSet) >> names(chroms) <- chroms >> chroms <- gsub('chr', 'ch', chroms) >> require(SNPlocs.Hsapiens.dbSNP.20120608) >> SNPs.byChr <- GRangesList(lapply(chroms, getSNPlocs, as.GRanges=TRUE)) >> ## time passes... >> seqlevels(SNPs.byChr) <- gsub('ch', 'chr', seqlevels(SNPs.byChr)) ## back >> to normal >> genome(SNPs.byChr) <- 'hg19' ## GRCh37.p5 coordinates are identical to >> hg19 save for chrMT >> >> ## once the above hoops have been jumped through, it's back to one- liners: >> grSet.noSNPsAtCpGs <- grSet[ grSet %outside% SNPs.byChr, ] >> grSet.noSNPsAtCpGs >> ## class: GenomicRatioSet >> ## dim: 444722 34 >> >> This used to be documented in the minfi code/manual somewhere, though I >> don't know if it still is. >> >> >> Statistics is the grammar of science. >> Karl Pearson <http: en.wikipedia.org="" wiki="" the_grammar_of_science=""> >> >> >> On Fri, Jan 31, 2014 at 1:06 PM, C T <offsubs@gmail.com> wrote: >> >>> Any tutorial on how to remove probes that contains SNPs? >>> >>> On Tuesday, November 12, 2013 7:12:46 PM UTC-5, Kasper Hansen wrote: >>> > >>> > As part of Bioconductor 2.13, we have released minfi 1.8.x. Due to a >>> > number of last minute errors, the recommended version is 1.8.3 (or >>> bigger). >>> > >>> > Users may find that their old objects cannot be linked to annotation. >>> > Please run >>> > OBJECT = updateObject(OBJECT) >>> > to fix this. >>> > >>> > Highlights include >>> > * preprocessingQuantile(): an independent implementation of the same >>> ideas >>> > as in Tost et al. >>> > * bumphunter() for finding DMRs >>> > * blockFinder() for finding large hypo-methylated blocks on the 450k >>> array. >>> > * estimateCellCounts() for estimating cell type composition for whole >>> > blood samples. The function can be extended to work on other types of >>> > cells, provided suitable flow sorted data is available. >>> > * the annotation now includes SNP annotation for dbSNP v132, 135 and >>> 137, >>> > independently annotated at JHU. >>> > * getSex(): you can now get sex repeatedly, irrespective of >>> relationship >>> > status. >>> > * minfiQC: find and remove outlier samples based on a sample QC >>> criteria >>> > we have found effective. >>> > >>> > Unfortunately, none of these handy changes are yet detailed in the >>> > vignette; we are working on this. >>> > >>> > A manuscript is in review detailing most of these functions. >>> > >>> > Full NEWS below >>> > >>> > Best, >>> > Kasper D Hansen >>> > >>> > o Added getMethSignal(), a convenience function for programming. >>> > >>> > o Changed the argument name of "type" to "what" for >>> getMethSignal(). >>> > >>> > o Added the class "RatioSet", like "GenomicRatioSet" but without >>> the >>> > genome information. >>> > >>> > o Bugfixes to the "GenomicRatioSet()" constructor. >>> > >>> > o Added the method ratioConvert(), for converting a "MethylSet" >>> to a >>> > "RatioSet" or a "GenomicMethylSet" to a "GenomicRatioSet". >>> > >>> > o Fixed an issue with GenomicMethylSet() and GenomicRatioSet() >>> caused >>> > by a recent change to a non-exported function in the >>> GenomicRanges >>> > package (Reported by Gustavo Fernandez Bayon <gba...@gmail.com>>> <javascript:> >>> > >). >>> > >>> > o Added fixMethOutliers for thresholding extreme observations in >>> the >>> > [un]methylation channels. >>> > >>> > o Added getSex, addSex, plotSex for estimating sex of the >>> samples. >>> > >>> > o Added getQC, addQC, plotQC for a very simple quality control >>> > measure. >>> > >>> > o Added minfiQC for a one-stop function for quality control >>> measures. >>> > >>> > o Changed some verbose=TRUE output in various functions. >>> > >>> > o Added preprocessQuantile. >>> > >>> > o Added bumphunter method for "GenomicRatioSet". >>> > >>> > o Handling signed zero in minfi:::.digestMatrix which caused unit >>> > tests to fail on Windows. >>> > >>> > o addSex and addQC lead to sampleNames() being dropped because >>> of a >>> > likely bug in cbind(DataFrame, DataFrame). Work-around has >>> been >>> > implemented. >>> > >>> > o Re-ran the test data generator. >>> > >>> > o Fixed some Depends and Imports issues revealed by new features >>> of R >>> > CMD check. >>> > >>> > o Added blockFinder and cpgCollapse. >>> > >>> > o (internal) added convenience functions for argument checking. >>> > >>> > o Exposed and re-wrote getAnnotation(). >>> > >>> > o Changed getLocations() from being a method to a simple >>> function. >>> > Arguments have been removed (for example, now the function >>> always >>> > drops non-mapping loci). >>> > >>> > o Implemented getIslandStatus(), getProbeType(), getSnpInfo() and >>> > addSnpInfo(). The two later functions retrieve pre- computed >>> SNP >>> > overlaps, and the new annotation object includes SNPs based on >>> > dbSNP 137, 135 and 132. >>> > >>> > o Changed the IlluminaMethylatioAnnotation class to now include >>> > genomeBuild information as well as defaults. >>> > >>> > o Added estimateCellCounts for deconvolution of cell types in >>> whole >>> > blood. Thanks to Andrew Jaffe and Andres Houseman. >>> > >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> > [[alternative HTML version deleted]]

ADD REPLY • link 10.2 years ago C T ▴ 20

0

Entering edit mode

Smeeta Shrestha ▴ 20

@smeeta-shrestha-6393

Last seen 8.2 years ago

Hi Kasper, Could you please let me know how I can export the bumphunter "dmrs" output from the minfi . I managed to use write.table to export the "dmp". Since the file format is different for dmrs, i am not able to use that. Also can I use B=1(default) in the bumphunter function. I would like to export that to xcel. Thank you smeeta On Wednesday, November 13, 2013 5:42:46 AM UTC+5:30, Kasper Hansen wrote: > > As part of Bioconductor 2.13, we have released minfi 1.8.x. Due to a > number of last minute errors, the recommended version is 1.8.3 (or bigger). > > Users may find that their old objects cannot be linked to annotation. > Please run > OBJECT = updateObject(OBJECT) > to fix this. > > Highlights include > * preprocessingQuantile(): an independent implementation of the same ideas > as in Tost et al. > * bumphunter() for finding DMRs > * blockFinder() for finding large hypo-methylated blocks on the 450k array. > * estimateCellCounts() for estimating cell type composition for whole > blood samples. The function can be extended to work on other types of > cells, provided suitable flow sorted data is available. > * the annotation now includes SNP annotation for dbSNP v132, 135 and 137, > independently annotated at JHU. > * getSex(): you can now get sex repeatedly, irrespective of relationship > status. > * minfiQC: find and remove outlier samples based on a sample QC criteria > we have found effective. > > Unfortunately, none of these handy changes are yet detailed in the > vignette; we are working on this. > > A manuscript is in review detailing most of these functions. > > Full NEWS below > > Best, > Kasper D Hansen > > o Added getMethSignal(), a convenience function for programming. > > o Changed the argument name of "type" to "what" for getMethSignal(). > > o Added the class "RatioSet", like "GenomicRatioSet" but without the > genome information. > > o Bugfixes to the "GenomicRatioSet()" constructor. > > o Added the method ratioConvert(), for converting a "MethylSet" to a > "RatioSet" or a "GenomicMethylSet" to a "GenomicRatioSet". > > o Fixed an issue with GenomicMethylSet() and GenomicRatioSet() caused > by a recent change to a non-exported function in the GenomicRanges > package (Reported by Gustavo Fernandez Bayon <gba... at="" gmail.com<javascript:=""> > >). > > o Added fixMethOutliers for thresholding extreme observations in the > [un]methylation channels. > > o Added getSex, addSex, plotSex for estimating sex of the samples. > > o Added getQC, addQC, plotQC for a very simple quality control > measure. > > o Added minfiQC for a one-stop function for quality control measures. > > o Changed some verbose=TRUE output in various functions. > > o Added preprocessQuantile. > > o Added bumphunter method for "GenomicRatioSet". > > o Handling signed zero in minfi:::.digestMatrix which caused unit > tests to fail on Windows. > > o addSex and addQC lead to sampleNames() being dropped because of a > likely bug in cbind(DataFrame, DataFrame). Work-around has been > implemented. > > o Re-ran the test data generator. > > o Fixed some Depends and Imports issues revealed by new features of R > CMD check. > > o Added blockFinder and cpgCollapse. > > o (internal) added convenience functions for argument checking. > > o Exposed and re-wrote getAnnotation(). > > o Changed getLocations() from being a method to a simple function. > Arguments have been removed (for example, now the function always > drops non-mapping loci). > > o Implemented getIslandStatus(), getProbeType(), getSnpInfo() and > addSnpInfo(). The two later functions retrieve pre-computed SNP > overlaps, and the new annotation object includes SNPs based on > dbSNP 137, 135 and 132. > > o Changed the IlluminaMethylatioAnnotation class to now include > genomeBuild information as well as defaults. > > o Added estimateCellCounts for deconvolution of cell types in whole > blood. Thanks to Andrew Jaffe and Andres Houseman. >

ADD COMMENT • link 10.2 years ago Smeeta Shrestha ▴ 20

Login before adding your answer.