filtering Illumina data
2
0
Entering edit mode
Lana Schaffer ★ 1.3k
@lana-schaffer-1056
Last seen 10.3 years ago
Hi, I have filtered Illumina data from 46,633 probes to 6537 probes using the Detection Pval. I used a cutoff of .05 to call detection across all the arrays. Can someone tell me if this is reasonable? What is a better way of filtering? Lana Schaffer Biostatistics/Informatics The Scripps Research Institute DNA Array Core Facility La Jolla, CA 92037 (858) 784-2263 (858) 784-2994 schaffer at scripps.edu
• 1.2k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
On Wed, Aug 20, 2008 at 4:09 PM, Lana Schaffer <schaffer at="" scripps.edu=""> wrote: > Hi, > I have filtered Illumina data from 46,633 probes to 6537 probes > using the Detection Pval. I used a cutoff of .05 to call > detection across all the arrays. > Can someone tell me if this is reasonable? > What is a better way of filtering? I would definitely not use ALL the arrays in your cutoff. Perhaps having 10-20% of samples detected for a given probe is more appropriate. If you force all arrays to meet detection cutoffs, you are excluding potentially interesting probes that are "on" in some subset, but "off" in another. An alternative is to filter by variation (cv, for example). Sean
ADD COMMENT
0
Entering edit mode
Sean, I don't think that you understood my filtering procedure. I filter out probes for which all the arrays have an undetected call. My question is really how reliable is the detected pvalue from Illumina and my chosen 0.05 cutoff. Lana -----Original Message----- From: seandavi@gmail.com [mailto:seandavi@gmail.com] On Behalf Of Sean Davis Sent: Wednesday, August 20, 2008 1:20 PM To: Lana Schaffer Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] filtering Illumina data On Wed, Aug 20, 2008 at 4:09 PM, Lana Schaffer <schaffer at="" scripps.edu=""> wrote: > Hi, > I have filtered Illumina data from 46,633 probes to 6537 probes using > the Detection Pval. I used a cutoff of .05 to call detection across > all the arrays. > Can someone tell me if this is reasonable? > What is a better way of filtering? I would definitely not use ALL the arrays in your cutoff. Perhaps having 10-20% of samples detected for a given probe is more appropriate. If you force all arrays to meet detection cutoffs, you are excluding potentially interesting probes that are "on" in some subset, but "off" in another. An alternative is to filter by variation (cv, for example). Sean
ADD REPLY
0
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 3 months ago
Australia/Melbourne/Olivia Newton-John …
Hi Lana: An alternative way is to remove those probes which are not changing across all samples after your data is normalized. You may draw a mean vs sd plot (mean and standard deviation for each probe across all samples) to determine a reasonable sd threshold for the filtering. Hope this helps. Cheers, Wei Lana Schaffer wrote: > Hi, > I have filtered Illumina data from 46,633 probes to 6537 probes > using the Detection Pval. I used a cutoff of .05 to call > detection across all the arrays. > Can someone tell me if this is reasonable? > What is a better way of filtering? > > Lana Schaffer > Biostatistics/Informatics > The Scripps Research Institute > DNA Array Core Facility > La Jolla, CA 92037 > (858) 784-2263 > (858) 784-2994 > schaffer at scripps.edu > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
I typically use the lumi package where the default is <0.01 and only probes which are below this dectection threshold in all samples are deleted. This typically leaves between 14K to 16K of genes. 7000 genes is perhaps a little too aggressive. Illumina *I think* recommends a Th of .01 too (though be careful when you export the probe profiles it sometimes converts these to 1-Th. Cheers PAul -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Wei Shi Sent: Thursday, August 21, 2008 9:29 AM To: Lana Schaffer Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] filtering Illumina data Hi Lana: An alternative way is to remove those probes which are not changing across all samples after your data is normalized. You may draw a mean vs sd plot (mean and standard deviation for each probe across all samples) to determine a reasonable sd threshold for the filtering. Hope this helps. Cheers, Wei Lana Schaffer wrote: > Hi, > I have filtered Illumina data from 46,633 probes to 6537 probes > using the Detection Pval. I used a cutoff of .05 to call > detection across all the arrays. > Can someone tell me if this is reasonable? > What is a better way of filtering? > > Lana Schaffer > Biostatistics/Informatics > The Scripps Research Institute > DNA Array Core Facility > La Jolla, CA 92037 > (858) 784-2263 > (858) 784-2994 > schaffer at scripps.edu > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 777 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6