Question

DESeq2 for analyzing differential expression based on dispersion across samples within groups, instead of based on expression means.

0

Entering edit mode

emilio.marmol • 0

@emiliomarmol-19545

Last seen 6.9 years ago

Hello,

I have a somewhat technical statistical question based on some analyses I have been dealing with while using DESeq2 software for determining differences in the expression of certain miRNAs between groups.

We performed a small-RNAseq experiment based on 4 different groups with 12 animals each, to determine possible differences in miRNA genes expression across our different states.

When analyzing our data, we realised that many of the miRNAs were very lowly expressed, as expected, with few of them being highly expressed. We calculated the Biological Coefficient of Variation (BCV) for each miRNA in each group, based on formula reported by edgeR developers, as the square root of the dispersions estimated for each gene with estimateDispersions() function. By doing so, we were able to se the expected result that miRNAs with low levels of expression tended to have higher BCV values, than highly expressed miRNAs, having low BCV values. However, we saw some of the miRNAs having BCVs higher than expected compared with their expression values, and more interestingly, that some of these abnormal miRNAs having high BCVs while highly expressed in one group, behaved normally in the other groups. To sum up, we detected that some miRNAs were behaving in a strange way according to their BCVs and gene expression levels in some groups, while behaving normally in other groups.

Dealing with this phenomenon, we tried to check if these differences were significant across groups for certain miRNAs. More or less the same than typically contrasting differences in gene expression across groups considering means, but considering dispersion of data, say, contrasting differences in gene dispersion, not gene expression, checking variance, not means.

For doing so, we calculated the dispersion from the mean miRNA expression value, in each sample, using this formula:

abs(normalized counts from gene expression - mean gene expression)

The further the expression value in each sample from the mean expression in the group, the higher the dispersion value, positivizing the negative values when expression levels were less than for the mean. This new matrix for dispersion in genes, had binomial negative distribution similarly to gene expression matrix, with a concentrated amount of genes with low dispersion values, quantitatively, and a tail of genes with high dispersion values.

This new matrix was inserted in the cannonical DESeq2 differential expression pipeline and results were obtained as genes performing differential "dispersion", not expression, with FoldChange values, P-values and FDR statistics.

My question is, would it be a statistically correct approach? I have not been able to find any software performing this kind of analysis, contrasting gene dispersion across groups, instead of gene expression values.

Is there an alternative approach to do this?

Many thanks

deseq2 dispersion • 2.0k views

ADD COMMENT • link updated 6.9 years ago by Michael Love 43k • written 7.0 years ago by emilio.marmol • 0

score 0 · Answer 1 · 2019-01-23

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 10 days ago

United States

Short answer: no, this is not a correct approach.

There are in fact methods for detecting differences in distribution, or dispersion across groups, and this is what I would recommend.

Here, for example is a method for detecting differences in distribution for scRNA-seq data. I'm asking the author if it would be appropriate for n=12 per group bulk RNA-seq:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5080738/

ADD COMMENT • link 6.9 years ago Michael Love 43k

0

Entering edit mode

I am not fully interested in assesing differences in global BCV distribution across groups, but in determining if differences in gene-wise BCV values across groups are different at a significant level.

As you can see in ploted BCVs for each analyzed gene across groups, I am interested in differences not in overall distribution, but on BCV values or BCVs that do fit the actual distribution in one group, but seems not to be the case for the other group.

Appying the approach I described in the fist message, I obtained these two genes as "Differentially Dispersed" across groups, among few others. The BCV value for ssc-miR-1285 is 0.6 in ALT2 groups, while 1.20 in ALT0 group. These are the differences I am interested in evaluating.

Similarly, BCV for ssc-miR-122-5p was around 0.5 in ALT0 and 1.30 in ART0 group.

I obtained a table with one BCV value for each gene (miRNA in this case) in each group, and wanted to test if these values were significantly different. As I did not have gene-wise repetitions for BCV values, this is the reason I opted for calculating dispersion values for each gene and each sample, so as to be able to implement a contrasting hypothesis test.

What kind of hypothesis testing would be the most reccomended for this case, was my actual question.

Many thanks.

ADD REPLY • link 6.9 years ago emilio.marmol • 0

0

Entering edit mode

I’m not sure what packages or methods will do this, but DESeq2 does not have such a test.

ADD REPLY • link 6.9 years ago Michael Love 43k

0

Entering edit mode

I know, this is the reason why I took a somewhat alternative approach to calculate a dispersion matrix for each gene in each sample. The distribution of this new matrix fits a negative binomial distribution, at least this is what my data tells me. I used this new matrix to be embeded in clasical DESeq2 differential expression test.

I know this is somewhat a non-cannonical and surely not reccomended approach as DESeq2 was not designed for this purpose, but I have not been able to find an alternative and better suited way to test my hypothesis...

ADD REPLY • link 6.9 years ago emilio.marmol • 0