Question

suggest to report differential estimate in edgeR

0

Entering edit mode

Yongqing • 0

@b01bffdc

Last seen 15 months ago

Hong Kong

Hi,

I am writing a post because I encountered an interesting question when using edgeR. I want to analyze RNA-seq data gained from samples of two groups: control and treatment (inhibitor of a kinase). I want to perform differential expression analysis and find out what the main role of this kinase is. I know many people would do it based on fold change. However, I noticed that some genes have very small expression counts (for example, 30-50 counts), though they have a large fold change. I am more interested in the differential counts of genes than the fold change because some genes that have large differential counts (for example, 30000 counts) might have a huge influence on the cell even without a large fold change. Also, in cells, living actually means a lot of chemical reactions going on. Suppose we have a molecule with a very large number in a cell, its number might be prevented from continuing to grow because of the regulatory networks within the cell - or if its number dropped, say four-fold in the cell, the cell would have died, which would make it harder to achieve as large a fold change as a molecule with a very small expression level, but this molecule is important. I know edgeR needs data to be normalized (TMM) first. However, if there is a fold change, this software has estimated the mean of the control group, the mean of the treatment group, and the differential expression number. With these numbers, I can further calculate the number I want for my analysis. If the software can report a fold change, it has estimated the mean of the control group, the mean of the treatment group, and the differential expression number. So I am writing to ask about the possibility of reporting these numbers in edgeR. I believe it would help a lot of downstream lab work and benefit future biomedical discoveries.

Many thanks,

Yongqing

edgeR • 591 views

ADD COMMENT • link 16 months ago • updated 15 months ago Yongqing • 0

score 0 · Answer 1 · 2022-12-27

edgeR already reports results in an appropriate way. You can already get the average CPM or average logCPM per group by using the cpmByGroup function.

You seem to assume that logFC are always computed as the difference between group means, but that is not correct. For many linear models (paired comparisons is one common example) there are no group means.

edgeR already solves the problem that you identify, that large fold-changes are often associated with small counts, in a sophisticated and principled manner. The DE list that edgeR presents already balances count sizes and fold-changes in an appropriate way, requiring larger fold-changes from genes with small counts before they are assessed as significantly DE. edgeR already compares counts rather than relying on the size of the fold-change. That's why we have always strongly recommended that DE be assessed by p-values rather than by fold-change.