Variance for paired design - DESeq2
2
0
Entering edit mode
gthm ▴ 30
@gthm-8377
Last seen 5.1 years ago
spain

Hi,

I have RNA-Seq and H3K27Ac ChiP seq data with a paired design for 7 human primary tissues in two conditions. I used DESeq2 and performed a paired analysis . For RNA-Seq, gene expression levels and for H3K27Ac number of reads mapping to peaks were used for differential analysis.

One of the reviewers is not satisfied with the differential analysis (though we used padj<0.05) and kept on insisting that the number of samples is too low ( 7 cadaveric organ donors samples cultured in 2 conditions (paired) for mRNA and ChIP-Seq) for differential analysis and asked for a measure of variance. We provided all the results ( mean signal, lfcSE, padj etc ) but he/she came back and asked for a measure of variance.

I would like to know what measure is better to provide to show that the differential results are robust. I could provide the normalized expression levels for each sample (or Mean or median per group) but its a "paired" design, so the "basal" levels might not be directly comparable. I saw that "mcols(dds)" has all the information we can extract, but not sure which measure to use for paired design.

Thanks, G

deseq2 • 497 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 hours ago
United States

Given that you have only 7 subjects, by definition your results aren't robust, where by robust I mean 'are representative of the underlying population rather than idiosyncratic results that are likely only to apply to the 7 subjects under study'. In other words, if you designed a study to see if a dietary intervention was 'good' for some definition of 'good', would you really enlist just 7 people and then try to convince people that the results were 'robust'? I think you would be hard pressed to get anybody to agree that those results were even preliminary.

Anyway, your question really has nothing to do with Bioconductor, nor even statistical analysis. You submitted a paper and a reviewer asked you for 'a measure of variance', which is sort of nonsensical, given you have measured like tens of thousands of things. But how would anybody here know what the reviewer wants? Isn't that a question for the reviewer?

ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

If the reviewer means the variance in the observed counts with respect to the expected counts from the fitted model (again, echoing James, I have no idea what they mean from what you've given us), then you can report the square root of the dispersion.

For NB count, K:

Var = mu + dispersion * mu^2

Re-arranging, for large mu we have:

dispersion ~= Var / mu^2

sqrt(dispersion) ~= SD / mu

So the square root of the dispersion is approximately the coefficient of variation of the counts for large counts. You can report the mean counts per group and this coefficient of variation statistic. The edgeR group refers to this as the biological coefficient of variation (BCV), which has been picked up in the genomics literature.

ADD COMMENT
0
Entering edit mode

Hi Michael, thanks for the answer. Sorry for not being clear before posting this but I also think reviewer is not familiar with statistical terminology. It may be biological variation across different replicates.

Should I use dispersions() for each group separately to get the CV ?

ADD REPLY

Login before adding your answer.

Traffic: 967 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6