edgeR dataset filtering using pnas_expression.txt
1
0
Entering edit mode
Dave Tang ▴ 210
@dave-tang-4661
Last seen 6.5 years ago
Australia/Perth/UWA
Hi list, Just a question regarding edgeR and dataset processing/filtering prior to calling differential expression. Case Study 12 (RNA-seq of Hormone-Treated LNCaP Cells) from the edgeR manual mentions that: "We filter out lowly expressed tags and those which are only expressed in a small number of samples. We keep only those tags that have at least one count per million in at least three samples." Then in section 6 of the manual it mentions that: "The edgeR methodology needs to work with the original digital expression counts, so these should not be transformed in any way by users prior to analysis. edgeR automatically takes into account the total size (total read number) of each library in all calculations of fold-changes, concentration and statistical significance." My question is whether filtering counts as "transforming" the data. Since this would affect the total size of each library and thus affecting all downstream calculations, is it OK to use such filters? And what should one be cautious about when applying such filters e.g. at least n tags in n samples, prior to performing the edgeR analysis? Many thanks, -- Dave
edgeR edgeR • 1.3k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 3 months ago
EMBL European Molecular Biology Laborat…
Hi Dave Dave Tang scripsit 01/04/2012 03:04 PM: > Hi list, > > Just a question regarding edgeR and dataset processing/filtering prior > to calling differential expression. > > Case Study 12 (RNA-seq of Hormone-Treated LNCaP Cells) from the edgeR > manual mentions that: > > "We filter out lowly expressed tags and those which are only expressed > in a small number of samples. We keep only those tags that have at least > one count per million in at least three samples." > > Then in section 6 of the manual it mentions that: > > "The edgeR methodology needs to work with the original digital > expression counts, so these should not be transformed in any way by > users prior to analysis. edgeR automatically takes into account the > total size (total read number) of each library in all calculations of > fold-changes, concentration and statistical significance." > > My question is whether filtering counts as "transforming" the data. > Since this would affect the total size of each library and thus > affecting all downstream calculations, is it OK to use such filters? Typically, such filtering as suggested by the edgeR manual cited above has negligible impact on size factor and dispersion estimates, yet by doing away with lots of gene-by-gene tests that never have a chance of being rejected anyway, it will improve your statistical power experiment-wide. If your data were peculiar enough that the filtering would affect size factor or dispersion estimation, then you would have a problem. To address that, you would need to look more closely at data QA/QC and your overall analytical strategy. Some more on filtering is here: - http://www.pnas.org/content/107/21/9546.long (Bourgon et al., PNAS 2010) - Section 5 "Independent filtering" in the vignette of a recent DESeq package (e.g. version >= 1.7.3) Best wishes Wolfgang. > And > what should one be cautious about when applying such filters e.g. at > least n tags in n samples, prior to performing the edgeR analysis? > > Many thanks, > Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber
ADD COMMENT

Login before adding your answer.

Traffic: 834 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6