I have the following scenario. We performed MACE experiments (like RNA-seq but you produce only one read per transcript) for three developmental stages of tomato pollen (3 biological replicates for each stage). On the one hand I produced RPM values for all genes by normalizing simply for the differing library size. On the other I performed an analysis with DESeq2. When comparing the RPM values of stage 1 and stage 3 I see for some genes nearly identical RPM values, whereas according to DESeq2 these genes are differentially expressed. In total I have around 16,000 (of around 19,000) differentially expressed genes when performing DESeq2 analysis with LRT test on all three stages
When looking at the venn diagram for identified genes between the 3 stages I see the following:
My problem is that I do not know whether I should trust the RPM values or DESeq2 because DESeq2 has for normalization the assumption that: "normalization is working under the assumption that most of the genes are not DE and only a small subset of genes are". But during the development of pollen there are drastic physiological and morphological changes going along with strong changes in expressed genes (as you can see from the venn diagram).
Or is DESeq2 fine and I have strong carry-over effects within the RPM values by genes exclusively expressed in one of the stages.
I further checked one of our housekeeping genes from qRT-PCRs. This gene has for the RPM values a log2FC between stage3 and 1 of ~-2.3 and for DESeq2 a log2FC of -0.89 (here I made a Wald test between stage3 and 1 and it is DE). So for the housekeeping gene DESeq2 is performing slightly better.
I really hope someone has any suggestions for me. I'm really desperate.
DESeq2's normalization will minimize the amount of differential expression. Essentially, any truly global changes across stage is normalized away (unless you have an idea about genes which do not change -- controlGenes -- and then you can input this information).
You may find it useful to look at the plotCounts for some of the DE genes. And the MA plots. If you are not interested in genes which are changing only slightly, you can use the lfcThreshold argument of results() to specify that the genes should change by a certain amount (on the log2 fold change scale).