Hi! I am not sure if the following question has already been answered, but I haven't found it. Sorry if it is a reapeated question.
I am currently analysing a bulk RNAseq dataset. It contains 18 samples from 3 different patients (6 conditions per patient; 3 cell types and treatment/no-treatment). During the exploration of the data, I can see in the PC1 and PC2 that 2 of those conditions are much more similar to each other than any of the other conditions. As these 2 conditions are the ones that we are most interested in, I performed differential gene expression analysis with both edgeR and deseq2 both including the 18 samples or only the 6 of interest. I got different results doing that (expected) but I was surprised to see very few differentially expressed genes between both conditions when including all the samples to calculate the variance, especially with edgeR. I would imagine that this is due to an increase in the BCV when including the more variable samples, is this correct?
My question is: would it make sense to do the analysis using only the samples of interest for the BCV calculation? What if I would be interested in comparing how 2 cell types change differently before and after treatment? Could I do the ratio of the counts (or substraction of the log2 counts) manually and then calculate the BCV using those values?
Thanks for any help!