Hi,
I have filtered Illumina data from 46,633 probes to 6537 probes
using the Detection Pval. I used a cutoff of .05 to call
detection across all the arrays.
Can someone tell me if this is reasonable?
What is a better way of filtering?
Lana Schaffer
Biostatistics/Informatics
The Scripps Research Institute
DNA Array Core Facility
La Jolla, CA 92037
(858) 784-2263
(858) 784-2994
schaffer at scripps.edu
On Wed, Aug 20, 2008 at 4:09 PM, Lana Schaffer <schaffer at="" scripps.edu=""> wrote:
> Hi,
> I have filtered Illumina data from 46,633 probes to 6537 probes
> using the Detection Pval. I used a cutoff of .05 to call
> detection across all the arrays.
> Can someone tell me if this is reasonable?
> What is a better way of filtering?
I would definitely not use ALL the arrays in your cutoff. Perhaps
having 10-20% of samples detected for a given probe is more
appropriate. If you force all arrays to meet detection cutoffs, you
are excluding potentially interesting probes that are "on" in some
subset, but "off" in another. An alternative is to filter by
variation (cv, for example).
Sean
Sean,
I don't think that you understood my filtering procedure.
I filter out probes for which all the arrays have an undetected
call.
My question is really how reliable is the detected pvalue from
Illumina and my chosen 0.05 cutoff.
Lana
-----Original Message-----
From: seandavi@gmail.com [mailto:seandavi@gmail.com] On Behalf Of Sean
Davis
Sent: Wednesday, August 20, 2008 1:20 PM
To: Lana Schaffer
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] filtering Illumina data
On Wed, Aug 20, 2008 at 4:09 PM, Lana Schaffer <schaffer at="" scripps.edu="">
wrote:
> Hi,
> I have filtered Illumina data from 46,633 probes to 6537 probes
using
> the Detection Pval. I used a cutoff of .05 to call detection across
> all the arrays.
> Can someone tell me if this is reasonable?
> What is a better way of filtering?
I would definitely not use ALL the arrays in your cutoff. Perhaps
having 10-20% of samples detected for a given probe is more
appropriate.
If you force all arrays to meet detection cutoffs, you are excluding
potentially interesting probes that are "on" in some subset, but "off"
in another. An alternative is to filter by variation (cv, for
example).
Sean
Hi Lana:
An alternative way is to remove those probes which are not changing
across all samples after your data is normalized. You may draw a mean
vs
sd plot (mean and standard deviation for each probe across all
samples)
to determine a reasonable sd threshold for the filtering.
Hope this helps.
Cheers,
Wei
Lana Schaffer wrote:
> Hi,
> I have filtered Illumina data from 46,633 probes to 6537 probes
> using the Detection Pval. I used a cutoff of .05 to call
> detection across all the arrays.
> Can someone tell me if this is reasonable?
> What is a better way of filtering?
>
> Lana Schaffer
> Biostatistics/Informatics
> The Scripps Research Institute
> DNA Array Core Facility
> La Jolla, CA 92037
> (858) 784-2263
> (858) 784-2994
> schaffer at scripps.edu
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
I typically use the lumi package where the default is <0.01 and only
probes which are below this dectection threshold in all samples are
deleted. This typically leaves between 14K to 16K of genes. 7000 genes
is perhaps a little too aggressive. Illumina *I think* recommends a Th
of .01 too (though be careful when you export the probe profiles it
sometimes converts these to 1-Th.
Cheers
PAul
-----Original Message-----
From: bioconductor-bounces@stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Wei
Shi
Sent: Thursday, August 21, 2008 9:29 AM
To: Lana Schaffer
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] filtering Illumina data
Hi Lana:
An alternative way is to remove those probes which are not changing
across all samples after your data is normalized. You may draw a mean
vs
sd plot (mean and standard deviation for each probe across all
samples)
to determine a reasonable sd threshold for the filtering.
Hope this helps.
Cheers,
Wei
Lana Schaffer wrote:
> Hi,
> I have filtered Illumina data from 46,633 probes to 6537 probes
> using the Detection Pval. I used a cutoff of .05 to call
> detection across all the arrays.
> Can someone tell me if this is reasonable?
> What is a better way of filtering?
>
> Lana Schaffer
> Biostatistics/Informatics
> The Scripps Research Institute
> DNA Array Core Facility
> La Jolla, CA 92037
> (858) 784-2263
> (858) 784-2994
> schaffer at scripps.edu
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor