Hi all,
I will have to perform a differential expression analysis on RNA-seq data obtained from purified chloroplasts.
That means that the mapping will be done on the chloroplastic genome (around 80 genes) and so the DE analysis.
I was wondering if there is a minimum number of genes to be considered to have a correct DESeq2 analysis. And if there are specific parameters to change to do this.
Thanks a lot for your help
Stefanie
I can't say for sure. There will probably be some genes with LFC around 0. It is a new experiment and we have no clue about how these genes are going to behave between the two conditions... Do you think we should add some "fake" counts with a LFC of 0 to better fit the null hypothesis of DEseq2 ?
Adding fake data isn't going to solve anything, it's just going to make the data worse. If you do so, you'll effectively be making an arbitrary determination of the size factors with no evidence while also disrupting the dispersion estimation.
Oh so bad idea :(
So in summary, I will check if there is a peak of genes around LFC 0 and if this is the case I can assume the analysis will be ok. And if not I will have to see how to normalise the data differently to compare the samples.
Thanks a lot for your quick answers.
You can’t tell from the data if global scaling is inappropriate.
You’d need to know some biology about the genes.