I am using edgeR to perform differential expression analysis. I have four different groups - three of them containing three samples and one of them containing two samples. I am interested in comparing one group vs the average of other three. To do that, for example, I set the contrast parameter to c(1,-1/3,-1/3,-1/3) when I wanted to compare group1 to the average of group2, group3 and group4. It generated more or less expected results, however I ended up having a lot of NA p values.
I can think about several possible causes, but I really want to have the exact reasons for NA p values in the results in order to see whether I can do something different and perhaps improve results. I did some visual inspection of count matrices (but this is not enough strong evidence). For example, all genes with NA pvalues have lower counts than most of the others (less than 200 per sample), but I also have genes with p values not being NA whose expression is this low (I am aware that low is not objective enough and it might not be the only reason).
My questions are:
- What are some possible reasons for p values being equal to NA?
- At what points of analysis are p values set to NA?
- What are the algorithms that edgeR uses to determine whether some gene will have p value = NA?