library size and fold changes
1
0
Entering edit mode
Bogdan ▴ 670
@bogdan-2367
Last seen 5 months ago
Palo Alto, CA, USA
Dear Mark, Gordon, probably a naive statistical question on edgeR : considering 3 samples of 3 library sizes : a) 10 mil reads, b) 30 mil reads, and c) 12 mil reads. after applying edgeR, I do obtain 1) > 2000 genes differentially expressed between a) and b) (FDR< 0.01, FC > 2), and 2) only ~ 200 genes differentially expressed between a) and c) (FDR < 0.01, FC >2). my question would be : given the fact that the number of differentially expressed genes is dependent on the library size, would it be valid to compare and contrast the set 1) of 2000 differentially expressed genes (FDR < 0.01, FC >2), with an expanded set 2) of 200+800 differentially expressed genes (FDR < 0.01, BUT FC > 1.2). thanks a lot, Bogdan [[alternative HTML version deleted]]
edgeR edgeR • 1.0k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 7 hours ago
WEHI, Melbourne, Australia
Dear Bogdan, Give that library size may affect FDR, but will not affect FC (even might increase it slightly), it would seem to me more natural to relax the FDR cutoff rather than the FC cutoff. I would use the same FC cutoff regardless of library size. This is especially so because, once counts get to a certain size, the p-value under the negative binomial model depends only on the fold change, further increases in count size making little or no difference. This is because the sequencing variability become negligible for large counts, after which biological inter-library variability is the only soure of variation. What is a sensible analysis for your current data might of course depend on many things, which we don't know from your email Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. Tel: (03) 9345 2326, Fax (03) 9347 0852, smyth at wehi.edu.au http://www.wehi.edu.au http://www.statsci.org/smyth On Tue, 6 Dec 2011, Bogdan Tanasa wrote: > Dear Mark, Gordon, > > probably a naive statistical question on edgeR : considering 3 samples of > 3 library sizes : a) 10 mil reads, b) 30 mil reads, and c) 12 mil reads. > > after applying edgeR, I do obtain 1) > 2000 genes differentially expressed > between a) and b) (FDR< 0.01, FC > 2), and 2) only ~ 200 genes > differentially expressed between a) and c) (FDR < 0.01, FC >2). > > my question would be : given the fact that the number of differentially > expressed genes is dependent on the library size, would it be valid to > compare and contrast the set 1) of 2000 differentially expressed genes (FDR > < 0.01, FC >2), with an expanded set 2) of 200+800 differentially expressed > genes (FDR < 0.01, BUT FC > 1.2). > > thanks a lot, > > Bogdan > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode

Hi Gordon,

I also have a similar question. I have two RNA-Seq sequenced at different times but exactly with the same protocol, except the library sizes are different.

In the first experiment, I have 4 samples with three replicates with average library sizes of 11 million reads:

untreated A and B cells.

Drug1 treated A and B cells.

In the second experiment, I have 4 samples with two replicates with average library sizes of 23 million reads:

untreated A and B cells.

Drug2 treated A and B cells.

I have done analysis for Experiment1 and Experiment2 separately.

Now, when I want to do comparison between Experiment1 and Experiment2 I have the following problem.

In Untreated cells of A(Experiment1) vs A(Experiment2) I have 3633 genes differentially expressed [abs(logFC) >= 1.0 and FDR < 0.1]. Similar results (4550 genes) also true for the comparisons between B cells. I am expecting some differences but these numbers are really really high. I think this is because of library sizes? Do you have any suggestion for the normalizations?

best,

ilyas. 

 

ADD REPLY

Login before adding your answer.

Traffic: 786 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6