Question

Variance for paired design - DESeq2

0

Entering edit mode

gthm ▴ 30

@gthm-8377

Last seen 5.1 years ago

spain

Hi,

I have RNA-Seq and H3K27Ac ChiP seq data with a paired design for 7 human primary tissues in two conditions. I used DESeq2 and performed a paired analysis . For RNA-Seq, gene expression levels and for H3K27Ac number of reads mapping to peaks were used for differential analysis.

One of the reviewers is not satisfied with the differential analysis (though we used padj<0.05) and kept on insisting that the number of samples is too low ( 7 cadaveric organ donors samples cultured in 2 conditions (paired) for mRNA and ChIP-Seq) for differential analysis and asked for a measure of variance. We provided all the results ( mean signal, lfcSE, padj etc ) but he/she came back and asked for a measure of variance.

I would like to know what measure is better to provide to show that the differential results are robust. I could provide the normalized expression levels for each sample (or Mean or median per group) but its a "paired" design, so the "basal" levels might not be directly comparable. I saw that "mcols(dds)" has all the information we can extract, but not sure which measure to use for paired design.

Thanks, G

deseq2 • 497 views

ADD COMMENT • link updated 5.0 years ago by Michael Love 41k • written 5.1 years ago by gthm ▴ 30

score 0 · Answer 1 · 2019-04-09

Given that you have only 7 subjects, by definition your results aren't robust, where by robust I mean 'are representative of the underlying population rather than idiosyncratic results that are likely only to apply to the 7 subjects under study'. In other words, if you designed a study to see if a dietary intervention was 'good' for some definition of 'good', would you really enlist just 7 people and then try to convince people that the results were 'robust'? I think you would be hard pressed to get anybody to agree that those results were even preliminary.

Anyway, your question really has nothing to do with Bioconductor, nor even statistical analysis. You submitted a paper and a reviewer asked you for 'a measure of variance', which is sort of nonsensical, given you have measured like tens of thousands of things. But how would anybody here know what the reviewer wants? Isn't that a question for the reviewer?

score 0 · Answer 2 · 2019-04-09

If the reviewer means the variance in the observed counts with respect to the expected counts from the fitted model (again, echoing James, I have no idea what they mean from what you've given us), then you can report the square root of the dispersion.

For NB count, K:

Var = mu + dispersion * mu^2

Re-arranging, for large mu we have:

dispersion ~= Var / mu^2

sqrt(dispersion) ~= SD / mu

So the square root of the dispersion is approximately the coefficient of variation of the counts for large counts. You can report the mean counts per group and this coefficient of variation statistic. The edgeR group refers to this as the biological coefficient of variation (BCV), which has been picked up in the genomics literature.