edgeR - filtering criteria
1
0
Entering edit mode
@catarina-almeida-6053
Last seen 9.6 years ago
Hello everyone! I'm using edgeR to detect DE genes on my data. I have 2 control samples and 4 mutated samples. I understand why I should filter them and I know the command to use (the tutorials Drs. Mark Robinson, Davs McCarthy, Yunshun Chen and Gordon K.Smyth made are pretty self explanatory in everything). What I don't understand however is the filtering criteria. I named my DGE object as "d", so the command I'm using is: d <- d[rowSums(1e+06 * d$counts/expandAsMatrix(d$samples$lib.size, dim(d)) > 1) >= ?, ] Meaning that I'm filtering out genes that don't have at least one count per million on "?" samples. What value should I use for "?" given that I have 2 control and 4 mutated samples. Thank you in advance for your help! C [[alternative HTML version deleted]]
edgeR edgeR • 619 views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 13 months ago
United States
Hi Catarina, Comments in line: On Mon, Jul 22, 2013 at 8:07 AM, Catarina Almeida <catarina.fa at="" gmail.com=""> wrote: > Hello everyone! > > I'm using edgeR to detect DE genes on my data. I have 2 control samples and > 4 mutated samples. > I understand why I should filter them and I know the command to use (the > tutorials Drs. Mark Robinson, Davs McCarthy, Yunshun Chen and Gordon > K.Smyth made are pretty self explanatory in everything). What I don't > understand however is the filtering criteria. > > I named my DGE object as "d", so the command I'm using is: > d <- d[rowSums(1e+06 * d$counts/expandAsMatrix(d$samples$lib.size, dim(d)) >> 1) >= ?, ] Perhaps you'd like to simplify that to the more intuitive: d <- d[rowSums(cpm(d) >= 1) >= ?, ] > Meaning that I'm filtering out genes that don't have at least one count per > million on "?" samples. What value should I use for "?" given that I have 2 > control and 4 mutated samples. I believe the rule of thumb (if there is one) with this strategy would be to use the number that is the minimum of the number of samples for the conditions you have replicates in, so since you have one condition with 2 replicates and another with 4, you'd pick 2. HTH, -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech
ADD COMMENT

Login before adding your answer.

Traffic: 560 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6