Question

CyTOF Differential Abundance Analysis using DESeq2

0

Entering edit mode

mikhael.manurung ▴ 280

@mikhaelmanurung-17423

Last seen 3.1 years ago

Netherlands

Dear all,

I would like to use DESeq2 to do differential abundance analysis of the cell clusters from my CyTOF data. Previously, I have tried diffcyt, which use edgeR. However, I often found significant results that were actually driven by only a few outlying samples. Therefore, I would like to try out how DESeq2 performs.

My question: is there anything in particular that I have to keep in mind if I want to use DESeq2 for CyTOF data? Can I just use the default order DESeq(), dds() and then lfcShrink()? As an example, when using edgeR, I can input the total number of cells per sample into the lib.size argument when constructing the DGEList. How can I do the same in DEseq2 (and also for other particulars)?

Thank you in advance.

Regards, Mikhael

deseq2 mass cytometry CyTOF • 1.5k views

ADD COMMENT • link updated 4.8 years ago by Michael Love 43k • written 4.8 years ago by mikhael.manurung ▴ 280

score 1 · Answer 1 · 2020-10-06

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 21 days ago

United States

If you are using edgeR, I believe you could use estimateGLMRobustDisp [1] to minimize the effect of outlier samples, and can be followed by glmFit, glmLRT or the QL versions I believe.

If you are trying DESeq2, you can manually set the size factors, but I'm not sure exactly what you would set them to in this case for a direct comparison. The paper has:

Normalization for the total number of cells per sample (library sizes) is automatically performed by the edgeR functions.

And it looks like standard correction for library size is performed [2], so I'm not sure if you would need to modify any settings. The size factor estimation in DESeq2 will normalize based on the median ratio of each sample to a geometric mean sample, to the degree that this is an appropriate scaling for the cell counts then I think you could use this approach.

[1] https://pubmed.ncbi.nlm.nih.gov/24753412/ [2] https://github.com/lmweber/diffcyt/blob/master/R/testDA_edgeR.R#L174-L184

ADD COMMENT • link 4.8 years ago Michael Love 43k

0

Entering edit mode

Dear Michael,

Thank you for the fast reply! I have tried estimateGLMRobustDisp() as well as estimateDisp with robust=TRUE but the outlying cell clusters are still there. To add, the LFCs of these clusters are quite high, which further prompted me to look for an alternative.

Good to know that I am on the right track! EDIT: Would it be reasonable to assign the total number of cells per sample to sizeFactors(dds)?

Few more reasons why I would like to try DESeq2 are for the LFC shrinkage and s-values. So far, cell clusters that are statistically significant and have a high (shrunken) LFC looked convincing when I plotted the cell frequencies with boxplots, i.e. the significance is not driven by outliers.

ADD REPLY • link 4.8 years ago mikhael.manurung ▴ 280

0

Entering edit mode

I think to the extent that some cell clusters are present in roughly stable proportions across samples then median ratio based analysis will give reasonable results (and you can seed the median ratio with controlGenes). However, if composition is changing dramatically across samples and no such clusters are found, then I would consider multinomial modeling.

ADD REPLY • link 4.8 years ago Michael Love 43k

0

Entering edit mode

I am not sure about the extent of changes in composition across samples. However, there are indeed cases where clusters of cells are highly abundant only in one of the groups. That being said, is this current approach a good enough way to analyse CyTOF data? I can imagine edgeR/limma-voom, which are RNA-Seq tools being applied to CyTOF, would also encounter similar problems.

I am not familiar with multinomial modelling, so I guess it is time to consult with a statistician.

ADD REPLY • link 4.8 years ago mikhael.manurung ▴ 280

0

Entering edit mode

"is this current approach a good enough way to analyse CyTOF data" => I really don't have any experience in this domain, so can't give solid recommendations. I can tell that diffcyt was extensively evaluated in its publication, and is now widely used.

ADD REPLY • link 4.8 years ago Michael Love 43k