Hi,
Thank you for your continued development and active help with DESeq2. I have used DESeq2 for quite a long time and occasionally I find datasets with strange dispersion plots. Often these are due to outliers or single cell data aggregated to pseudobulk with low counts. However, in my current analysis I am using a public scRNA-seq dataset and performing a pseudobulk analysis. I have removed samples with fewer than 20 cells for every cell type prior to analysis. I get the following dispersion plot, which is similar for every cell type. I have never seen the 'pitchfork'-looking high dispersions before, and I was wondering if you think this indicates a poor fit to the DESeq2 model? I notice that it moves the maximum likelihood line higher, which seemed like it could be worrisome to me. What are your thoughts regarding the fit of my data to the model? Would you consider the DE results trustworthy?
I have looked at PCA, and I am getting about half my control samples and a few disease samples on one side of the PC and the majority (~10 disease and the other half of the control samples) on the other side. I do not know what is driving PC1, but PC2 has good separation of samples by disease and no obvious outliers. If I remove the samples on the right-hand side of the PC (with half my controls and a few disease samples), then my dispersions look as expected. I am still trying to figure out PC1, but I might not be successful.
Any thoughts would be helpful.
Thank you, Mary