Question

Calculation of logCPM in edgeR

1

Entering edit mode

H ▴ 20

@H-24669

Last seen 3.8 years ago

I am so new in Bioinformatics, using R and edgeR. I used the following code:

targets <- read.delim("cell_line_M.txt", stringsAsFactors = FALSE)

d <- readDGE(targets)
colnames(d) <- c("MG1","MG2", "MN1","MN2")
d <- estimateCommonDisp(d, verbose=TRUE)
d <- estimateTagwiseDisp(d, trend="none")

et <- exactTest(d,pair=c("MN","MG"))
print(et)

this is the first line of what I obtained :

          logFC   logCPM       PValue
A1CF  0.20103589 4.718215 0.8603790511

my problem is how the logCPM is calculated. the main file is

gene   MG1  MG2  MN1 MN2
A1CF    8   7    7   4

considering library size of all columns is 450000, the related cpm according to the formula:

CPM=count*1e6/(library size of that group)

will be

gene   MG1   MG2   MN1  MN2
A1CF  17.7  15.5  15.5  8.8

I expect that the logCPM should be calculated like what logFC is calculated from the strategy suggested in here like:

geo_mean_CMP=sqrt(17.7 15.5)/ sqrt(15.5 8.8) --> logCMP=log2(geo_mean_CMP)

is the strategy true or it is totally different?

logCPM edgeR • 1.7k views

ADD COMMENT • link updated 3.9 years ago by Kevin Blighe ★ 4.0k • written 3.9 years ago by H ▴ 20

0

Entering edit mode

Linking to duplicate post: Calculation of logFC in edgeR

ADD REPLY • link 3.9 years ago Kevin Blighe ★ 4.0k

score 1 · Answer 1 · 2021-01-28

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 15 hours ago

WEHI, Melbourne, Australia

The edgeR User's Guide explains extensively that edgeR uses negative binomial generalized linear models, so the simple calculations you give could not be correct.

The logCPM values in the topTags table are computed by aveLogCPM. See ?aveLogCPM.

ADD COMMENT • link 3.9 years ago Gordon Smyth 52k