Removing probes with low variance across samples (Infinium 450k)
1
1
Entering edit mode
@khadeeja-ismail-4711
Last seen 8.8 years ago
Hi, I have a question regarding the Illumina Human Methylation 450k array and the genefilter package. I used the 'nsfilter' function in gene filter to remove probes that have low variance across samples. When I checked the documentation for nsfilter, I found out that applying the function removes 50% of the probes by default. I computed the variance for each probe in the remaining probes and for the removed probes separately. When I plot the density for each set of variances, they overlap completely showing that both sets have most of their probes with variance close to zero and few with high variance. This leaves me wondering how nsfilter actually filters probes, as it doesn't appear from the plot that the probes with the lowest variances are removed. What would be the best way to filter out low variance probes in 450k data? If the default value in nsfilter is set to 50% assuming that 40% of genes in a cell are not expressed, what percentage cutoff can be used for methylation data? Would be great if anyone can explain it. Thanks, Khadeeja [[alternative HTML version deleted]]
probe genefilter probe genefilter • 2.2k views
ADD COMMENT
0
Entering edit mode
Tim Triche ★ 4.2k
@tim-triche-3561
Last seen 4.3 years ago
United States
Genomic DNA -- what you're assaying on these arrays, or at least what they're designed for -- need not be expressed. It's just... there, chopped up after extraction, bisulfite conversion, and whole-genome amplification, waiting to hybridize. Thus nsfilter's fundamental assumption -- that some large fraction of the probes on the array are in fact pure noise -- is violated. It may be that there is (almost always) local correlation between probes within +/- 1kb of each other, but if the protocols for these arrays are followed carefully, you can expect better than 99% of the probes to hybridize (which is NOT the case with expression arrays, and you would not expect 99% of the genome to align in an RNAseq experiment either). So the decision of how many probes to retain then comes down to your judgment. Biological annotation (e.g. from ChIP-seq peak calls for histone marks, transcription factors, or physical interactions) can become very useful in making sense of these data. If you lack normal samples (or don't know which ones are "normal") it is possible to see low variability in regions which are consistently aberrant, so that may not always be the best approach. I find the GenomicRanges, GenomicFeatures, and rtracklayer packages useful for this type of annotation, FWIW. Hope this helps, --t On Thu, Feb 9, 2012 at 2:17 PM, khadeeja ismail <hajjja@yahoo.com> wrote: > Hi, > > I have a question regarding the Illumina Human Methylation 450k array and > the genefilter package. > I used the 'nsfilter' function in gene filter to remove probes that have > low variance across samples. When I checked the documentation for nsfilter, > I found out that applying the function removes 50% of the probes by > default. > I computed the variance for each probe in the remaining probes and for the > removed probes separately. When I plot the density for each set of > variances, they overlap completely showing that both sets have most of > their probes with variance close to zero and few with high variance. > This leaves me wondering how nsfilter actually filters probes, as it > doesn't appear from the plot that the probes with the lowest variances are > removed. > What would be the best way to filter out low variance probes in 450k data? > If the default value in nsfilter is set to 50% assuming that 40% of genes > in a cell are not expressed, what percentage cutoff can be used for > methylation data? > Would be great if anyone can explain it. > > Thanks, > Khadeeja > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 455 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6