Dear Bogdan,
Give that library size may affect FDR, but will not affect FC (even might increase it slightly), it would seem to me more natural to relax the FDR cutoff rather than the FC cutoff. I would use the same FC cutoff regardless of library size.
This is especially so because, once counts get to a certain size, the p-value under the negative binomial model depends only on the fold change, further increases in count size making little or no difference. This is because the sequencing variability become negligible for large counts, after which biological inter-library variability is the only soure of variation.
What is a sensible analysis for your current data might of course depend on many things, which we don't know from your email
Best wishes
Gordon
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi Gordon,
I also have a similar question. I have two RNA-Seq sequenced at different times but exactly with the same protocol, except the library sizes are different.
In the first experiment, I have 4 samples with three replicates with average library sizes of 11 million reads:
untreated A and B cells.
Drug1 treated A and B cells.
In the second experiment, I have 4 samples with two replicates with average library sizes of 23 million reads:
untreated A and B cells.
Drug2 treated A and B cells.
I have done analysis for Experiment1 and Experiment2 separately.
Now, when I want to do comparison between Experiment1 and Experiment2 I have the following problem.
In Untreated cells of A(Experiment1) vs A(Experiment2) I have 3633 genes differentially expressed [abs(logFC) >= 1.0 and FDR < 0.1]. Similar results (4550 genes) also true for the comparisons between B cells. I am expecting some differences but these numbers are really really high. I think this is because of library sizes? Do you have any suggestion for the normalizations?
best,
ilyas.