Entering edit mode
Dear Mark, Gordon,
probably a naive statistical question on edgeR : considering 3
samples of
3 library sizes : a) 10 mil reads, b) 30 mil reads, and c) 12 mil
reads.
after applying edgeR, I do obtain 1) > 2000 genes differentially
expressed
between a) and b) (FDR< 0.01, FC > 2), and 2) only ~ 200 genes
differentially expressed between a) and c) (FDR < 0.01, FC >2).
my question would be : given the fact that the number of
differentially
expressed genes is dependent on the library size, would it be valid to
compare and contrast the set 1) of 2000 differentially expressed genes
(FDR
< 0.01, FC >2), with an expanded set 2) of 200+800 differentially
expressed
genes (FDR < 0.01, BUT FC > 1.2).
thanks a lot,
Bogdan
[[alternative HTML version deleted]]
Hi Gordon,
I also have a similar question. I have two RNA-Seq sequenced at different times but exactly with the same protocol, except the library sizes are different.
In the first experiment, I have 4 samples with three replicates with average library sizes of 11 million reads:
untreated A and B cells.
Drug1 treated A and B cells.
In the second experiment, I have 4 samples with two replicates with average library sizes of 23 million reads:
untreated A and B cells.
Drug2 treated A and B cells.
I have done analysis for Experiment1 and Experiment2 separately.
Now, when I want to do comparison between Experiment1 and Experiment2 I have the following problem.
In Untreated cells of A(Experiment1) vs A(Experiment2) I have 3633 genes differentially expressed [abs(logFC) >= 1.0 and FDR < 0.1]. Similar results (4550 genes) also true for the comparisons between B cells. I am expecting some differences but these numbers are really really high. I think this is because of library sizes? Do you have any suggestion for the normalizations?
best,
ilyas.