Entering edit mode
There are at least 3 different ways you could go about this; Tiffany
Morris' probe variant annotation package accompanying ChAMP is
probably the
most comprehensive, while (I claim) the UCSC common variant track is
the
most trivial and the SNPlocs packages are the most intensive. Let's
assume
your data is in a GenomicRatioSet named grSet for all examples.
Writing and testing out the code for this took a little while so I do
hope
you'll work through each example.
All things considered, I'd recommend option #1, but it would be even
better
if the maintainers tweaked a few settings so that
bsseq::data.frame2GRanges(
probe.450K.VCs.af) was more trivial. (Also: consider merging
$diVC.com1pop.F and $diVC.com1pop.R, making everything that can
reasonably
be so boolean, etc... GRanges are so much easier to work with than
data.frames when dealing with genomic coordinates)
You'll need to supply your own grSet, but if you didn't have one, I
don't
imagine you'd have asked :-)
I have provided the dimensions (rows x columns) of the filtered grSets
for
comparison in each case. You will have to decide what degree of
conservatism is most appropriate for your experiment. Read the docs!
1) Use the ProbeVariants package alliuded to above. This is
comprehensive,
but a bit of a PITA.
require(minfi)
grSet
## class: GenomicRatioSet
## dim: 485512 34
require(Illumina450ProbeVariants.db)
?probe.450K.VCs.af ## read the docs!
dataprobe.450K.VCs.af)
commonSnpInProbe <- rownamesprobe.450K.VCs.af)[
probe.450K.VCs.af$probe50VC.com1pop
> 0 ]
grSet.noCommonSNPsInProbes <- grSet[
setdiff(rownames(grSet), commonSnpInProbe), ]
grSet.noCommonSNPsInProbes
## class: GenomicRatioSet
## dim: 308901 34
commonVariantsEitherStrand <-
!is.naprobe.450K.VCs.af$diVC.com1pop.F)|!
is.naprobe.450K.VCs.af$diVC.com1pop.R)
CpGsWithCommonVariants <- rownamesprobe.450K.VCs.af)[
commonVariantsEitherStrand
]
grSet.noCommonVariantsAtCpGs <- grSet[
setdiff(rownames(grSet),CpGsWithCommonVariants), ]
grSet.noCommonVariantsAtCpGs
## class: GenomicRatioSet
## dim: 445502 34
Consult the documentation for finer control over the process (e.g.
specific
populations, etc.).
?probe.450K.VCs.af ## read the docs again
It would not break my heart if the maintainers turned the data.frame
here
into a GRanges; a few small changes followed by data.frame2GRanges()
would
do the trick. But still, it's a very handy compilation.
See below for the reason I prefer GRanges (in short, because I'm
impatient
and don't like debugging).
2) Use the FDb package (> 1% MAF across all populations, dbSNP build
135).
This is a one-liner.
require(minfi)
grSet
## class: GenomicRatioSet
## dim: 485512 34
require(FDb.UCSC.snp135common.hg19)
commonSNPs <- features(FDb.UCSC.snp135common.hg19)
## With a GRanges object, the previous rigamarole becomes a one-liner:
grSet.noCommonSNPsAtCpGs <- grSet[ grSet %outside% commonSNPs, ]
grSet.noCommonSNPsAtCpGs
## class: GenomicRatioSet
## dim: 478385 34
This was my workaround from 2-3 years ago to permit masking of TCGA
Level 3
methylation data. It still works and it still works well, but it's
been
superseded (IMHO) by more flexible approaches like #1.
3) use SNPlocs (all SNPs in dbSNP; mildly annoying complication with
'ch'
vs. 'chr' seqlevels)
require(minfi)
grSet
## class: GenomicRatioSet
## dim: 485512 34
## work around the annoyance:
chroms <- seqlevels(grSet)
names(chroms) <- chroms
chroms <- gsub('chr', 'ch', chroms)
require(SNPlocs.Hsapiens.dbSNP.20120608)
SNPs.byChr <- GRangesList(lapply(chroms, getSNPlocs, as.GRanges=TRUE))
## time passes...
seqlevels(SNPs.byChr) <- gsub('ch', 'chr', seqlevels(SNPs.byChr)) ##
back
to normal
genome(SNPs.byChr) <- 'hg19' ## GRCh37.p5 coordinates are identical
to
hg19 save for chrMT
## once the above hoops have been jumped through, it's back to one-
liners:
grSet.noSNPsAtCpGs <- grSet[ grSet %outside% SNPs.byChr, ]
grSet.noSNPsAtCpGs
## class: GenomicRatioSet
## dim: 444722 34
This used to be documented in the minfi code/manual somewhere, though
I
don't know if it still is.
Statistics is the grammar of science.
Karl Pearson <http: en.wikipedia.org="" wiki="" the_grammar_of_science="">
On Fri, Jan 31, 2014 at 1:06 PM, C T <offsubs@gmail.com> wrote:
> Any tutorial on how to remove probes that contains SNPs?
>
> On Tuesday, November 12, 2013 7:12:46 PM UTC-5, Kasper Hansen wrote:
> >
> > As part of Bioconductor 2.13, we have released minfi 1.8.x. Due
to a
> > number of last minute errors, the recommended version is 1.8.3 (or
> bigger).
> >
> > Users may find that their old objects cannot be linked to
annotation.
> > Please run
> > OBJECT = updateObject(OBJECT)
> > to fix this.
> >
> > Highlights include
> > * preprocessingQuantile(): an independent implementation of the
same
> ideas
> > as in Tost et al.
> > * bumphunter() for finding DMRs
> > * blockFinder() for finding large hypo-methylated blocks on the
450k
> array.
> > * estimateCellCounts() for estimating cell type composition for
whole
> > blood samples. The function can be extended to work on other types
of
> > cells, provided suitable flow sorted data is available.
> > * the annotation now includes SNP annotation for dbSNP v132, 135
and 137,
> > independently annotated at JHU.
> > * getSex(): you can now get sex repeatedly, irrespective of
relationship
> > status.
> > * minfiQC: find and remove outlier samples based on a sample QC
criteria
> > we have found effective.
> >
> > Unfortunately, none of these handy changes are yet detailed in the
> > vignette; we are working on this.
> >
> > A manuscript is in review detailing most of these functions.
> >
> > Full NEWS below
> >
> > Best,
> > Kasper D Hansen
> >
> > o Added getMethSignal(), a convenience function for
programming.
> >
> > o Changed the argument name of "type" to "what" for
> getMethSignal().
> >
> > o Added the class "RatioSet", like "GenomicRatioSet" but
without
> the
> > genome information.
> >
> > o Bugfixes to the "GenomicRatioSet()" constructor.
> >
> > o Added the method ratioConvert(), for converting a
"MethylSet" to
> a
> > "RatioSet" or a "GenomicMethylSet" to a "GenomicRatioSet".
> >
> > o Fixed an issue with GenomicMethylSet() and
GenomicRatioSet()
> caused
> > by a recent change to a non-exported function in the
> GenomicRanges
> > package (Reported by Gustavo Fernandez Bayon
<gba...@gmail.com> <javascript:>
> > >).
> >
> > o Added fixMethOutliers for thresholding extreme
observations in
> the
> > [un]methylation channels.
> >
> > o Added getSex, addSex, plotSex for estimating sex of the
samples.
> >
> > o Added getQC, addQC, plotQC for a very simple quality
control
> > measure.
> >
> > o Added minfiQC for a one-stop function for quality control
> measures.
> >
> > o Changed some verbose=TRUE output in various functions.
> >
> > o Added preprocessQuantile.
> >
> > o Added bumphunter method for "GenomicRatioSet".
> >
> > o Handling signed zero in minfi:::.digestMatrix which caused
unit
> > tests to fail on Windows.
> >
> > o addSex and addQC lead to sampleNames() being dropped
because of a
> > likely bug in cbind(DataFrame, DataFrame). Work-around
has been
> > implemented.
> >
> > o Re-ran the test data generator.
> >
> > o Fixed some Depends and Imports issues revealed by new
features
> of R
> > CMD check.
> >
> > o Added blockFinder and cpgCollapse.
> >
> > o (internal) added convenience functions for argument
checking.
> >
> > o Exposed and re-wrote getAnnotation().
> >
> > o Changed getLocations() from being a method to a simple
function.
> > Arguments have been removed (for example, now the function
always
> > drops non-mapping loci).
> >
> > o Implemented getIslandStatus(), getProbeType(),
getSnpInfo() and
> > addSnpInfo(). The two later functions retrieve pre-
computed SNP
> > overlaps, and the new annotation object includes SNPs
based on
> > dbSNP 137, 135 and 132.
> >
> > o Changed the IlluminaMethylatioAnnotation class to now
include
> > genomeBuild information as well as defaults.
> >
> > o Added estimateCellCounts for deconvolution of cell types
in whole
> > blood. Thanks to Andrew Jaffe and Andres Houseman.
> >
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
[[alternative HTML version deleted]]