Dear all,
I would like to use DESeq2
to do differential abundance analysis of the cell clusters from my CyTOF data. Previously, I have tried diffcyt
, which use edgeR
. However, I often found significant results that were actually driven by only a few outlying samples. Therefore, I would like to try out how DESeq2
performs.
My question: is there anything in particular that I have to keep in mind if I want to use DESeq2
for CyTOF data? Can I just use the default order DESeq()
, dds()
and then lfcShrink()
? As an example, when using edgeR
, I can input the total number of cells per sample into the lib.size
argument when constructing the DGEList
. How can I do the same in DEseq2
(and also for other particulars)?
Thank you in advance.
Regards, Mikhael
Dear Michael,
Thank you for the fast reply! I have tried
estimateGLMRobustDisp()
as well asestimateDisp
withrobust=TRUE
but the outlying cell clusters are still there. To add, the LFCs of these clusters are quite high, which further prompted me to look for an alternative.Good to know that I am on the right track! EDIT: Would it be reasonable to assign the total number of cells per sample to
sizeFactors(dds)
?Few more reasons why I would like to try DESeq2 are for the LFC shrinkage and s-values. So far, cell clusters that are statistically significant and have a high (shrunken) LFC looked convincing when I plotted the cell frequencies with boxplots, i.e. the significance is not driven by outliers.
I think to the extent that some cell clusters are present in roughly stable proportions across samples then median ratio based analysis will give reasonable results (and you can seed the median ratio with
controlGenes
). However, if composition is changing dramatically across samples and no such clusters are found, then I would consider multinomial modeling.I am not sure about the extent of changes in composition across samples. However, there are indeed cases where clusters of cells are highly abundant only in one of the groups. That being said, is this current approach a good enough way to analyse CyTOF data? I can imagine edgeR/limma-voom, which are RNA-Seq tools being applied to CyTOF, would also encounter similar problems.
I am not familiar with multinomial modelling, so I guess it is time to consult with a statistician.
"is this current approach a good enough way to analyse CyTOF data" => I really don't have any experience in this domain, so can't give solid recommendations. I can tell that diffcyt was extensively evaluated in its publication, and is now widely used.