Question

CQN offset and edgeR

0

Entering edit mode

Akula, Nirmala NIH/NIMH [C] ▴ 190

@akula-nirmala-nihnimh-c-5007

Last seen 5.3 years ago

Hi,

I am trying to use cqn offset in EdgeR and getting very low p-values (FDR < 10e-100). Is the offset working correctly? Or I am missing some steps in the code?

> mycqn=cqn(countData,lengths=uCovar$GeneLength,x=uCovar$GCcontent,
+ sizeFactors=sizeFactors$V2,verbose=TRUE)
> offset=mycqn$glm.offset
> design <- model.matrix(~coldata$condition + race + rin + gc)
> y=estimateCommonDisp(y,design=design)
> y <- estimateTrendedDisp(y)
> y <- estimateTagwiseDisp(y)
> glmQlfitcqn=glmQLFit(y,design)
> glmQlfTestcqn=glmQLFTest(glmQlfitcqn,contrast = c(-1, 1, 0, 0, 0, 0, 0))
> topTags(glmQlfTestcqn)
Coefficient:  -1*(Intercept) 1*coldata$conditionBipolar 
                   logFC   logCPM        F        PValue          FDR
ENSG00000130560 17.57000 4.165947 2101.518 9.093832e-102 1.930439e-97
ENSG00000012061 15.78226 4.633156 1774.383  1.149911e-95 1.220516e-91
ENSG00000103549 15.10992 5.061084 1753.249  3.085519e-95 2.183313e-91

Thanks,

Nirmala

CQN and edgeR • 2.1k views

ADD COMMENT • link updated 6.5 years ago by Steve Lianoglou ★ 13k • written 6.5 years ago by Akula, Nirmala NIH/NIMH [C] ▴ 190

score 1 · Answer 1 · 2018-08-30

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 13 hours ago

WEHI, Melbourne, Australia

Well, the reason you have such small p-values is that you've tested for a contrast that isn't sensible. You have compared the Bipolar effect coefficient with the intercept term, something that doesn't make any sense to do. Did you perhaps simply intend:

glmQLFTest(glmQlfitcqn, coef=2)

which would have tested for DE in Bipolar vs whatever is your reference condition?

According to the code given, you don't use the cqn offsets. You just store them in a matrix called offset, but never use that variable. Why not follow the edgeR code example in the cqn vignette?

ADD COMMENT • link 6.5 years ago Gordon Smyth 52k

0

Entering edit mode

Hi Gordon,

Thanks for your response. I get the following error when I use offset command in glmFit

> mydataGlmFit=glmFit(y,design,offset=mycqn$glm.offset)

Error in glmFit.default(y = y$counts, design = design, dispersion = dispersion, :

formal argument "offset" matched by multiple actual arguments

Thanks for your help.

Nirmala

ADD REPLY • link 6.5 years ago Akula, Nirmala NIH/NIMH [C] ▴ 190

0

Entering edit mode

The glmFit() function doesn't accept an offset argument when y is a DGEList object (as you can see from the help page for glmFit).

See Steve Lianoglou's answer or the cqn vignette for correct code.

ADD REPLY • link 6.5 years ago Gordon Smyth 52k

score 1 · Answer 2 · 2018-08-31

1

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 24 months ago

United States

You can make your life a bit easier by following the code provided in the cqn vignette a bit more closely.

There you'll see that the cqn glm.offset matrix is added to the DGEList directly

So, modifying your code a bit and using Gordon's suggest contrast:

mycqn <- cqn(countData,lengths=uCovar$GeneLength,x=uCovar$GCcontent,
             sizeFactors=sizeFactors$V2,verbose=TRUE)

# This next line is where your life got harder:
# offset=mycqn$glm.offset (this is where your life got harder)

# The vignette rather instructs you to add this as an element
# to your DGEList and edgeR will use it in "all the right plaes",
# like so:
y$offset <- mycqn$glm.offset

design <- model.matrix(~coldata$condition + race + rin + gc)
y <- estimateCommonDisp(y,design=design)
y <- estimateTrendedDisp(y)
y <- estimateTagwiseDisp(y)
glmQlfitcqn <- glmQLFit(y,design)

# Gordon's suggested test
glmQlfTestcqn <- glmQLFTest(glmQlfitcqn, coef=2)

ADD COMMENT • link 6.5 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Hi Steve,

Whether I use the cqn offset (y$offset <- mycqn$glm.offset) or not I get the same results

As CQN manual mentions that sometimes standard analysis gives the same results as when using CQN or whether the offset is working or not? Is there a way to test this?

Thanks,

Nirmala

ADD REPLY • link 6.5 years ago Akula, Nirmala NIH/NIMH [C] ▴ 190

0

Entering edit mode

Sure, why don't you try to scramble the values in your cqn offset matrix and see if those don't look any different. This is untested, but I think should work:

y2 <- y
y2$offset <- matrix(sample(y$offset), nrow = nrow(y$offset))
# ... redo the edgeR steps and test the same contrast

Also, how are you testing that the results are the same? What two measures/results are you specfically looking at and how are you testing for equality? In other words, please show the code you are using for that test.

ADD REPLY • link 6.5 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Dear Nirmala,

That cannot be true. edgeR does use the offset matrix and the results certainly will change, at least slightly, if you set offsets from cqn.

However you should not expect big changes. The changes from cqn normalization may be very small.

ADD REPLY • link 6.5 years ago Gordon Smyth 52k

0

Entering edit mode

Hi Gordon and Steve,

As you suggested the results are slightly different (only in the decimals). Does this mean CQN normalization is not important in this particular analysis?

The code with offset:

> design <- model.matrix(~coldata$condition + race + rin + gc)
> y <- DGEList(counts=countData,group=coldata$condition)
> y$offset=mycqn$glm.offset
> y=estimateGLMCommonDisp(y,design=design)
> y <- estimateTrendedDisp(y)
> y <- estimateTagwiseDisp(y)

> glmQlfitcqn <- glmQLFit(y,design)

> glmQlfTestcqn <- glmQLFTest(glmQlfitcqn, coef=2)
> topTags(glmQlfTestcqn)
Coefficient: coldata$conditionBipolar
logFC logCPM F PValue FDR
ENSG00000262902 5.8271297 0.9704256 54.74150 4.965934e-12 1.054168e-07
ENSG00000242613 -6.9341328 -4.6776882 47.59587 1.447763e-10 1.536656e-06
ENSG00000240661 -5.7728950 -0.7280092 42.98372 5.577964e-10 3.946968e-06
ENSG00000275043 1.2199644 -4.9797565 32.79179 7.438676e-07 3.947705e-03
ENSG00000242529 1.7135376 -3.5847925 22.80545 3.687357e-06 1.565504e-02

The code without offset:

> design <- model.matrix(~coldata$condition + race + rin + gc)

> y <- DGEList(counts=countData,group=coldata$condition)

> y=estimateGLMCommonDisp(y,design=design)

> y <- estimateTrendedDisp(y)

> y <- estimateTagwiseDisp(y)

> glmQlfitcqn <- glmQLFit(y,design)

> glmQlfTestcqnBPcontrast <- glmQLFTest(glmQlfitcqn, contrast = c(0,1,0, 0, 0, 0, 0))

> topTags(glmQlfTestcqnBPcontrast)

Coefficient: 1*coldata$conditionBipolar

logFC logCPM F PValue FDR

ENSG00000262902 5.5077647 0.9706013 54.22068 6.033903e-12 1.280877e-07

ENSG00000242613 -6.7783525 -4.6776200 47.41311 1.537793e-10 1.632213e-06

ENSG00000240661 -5.6740717 -0.7279688 43.42691 4.617483e-10 3.267331e-06

ENSG00000230289 2.9165150 -5.1530384 22.47538 4.287052e-06 2.275138e-02

ENSG00000183688 0.4888003 1.9599803 21.25202 7.571841e-06 2.744876e-02

Thanks,

Nirmala

ADD REPLY • link 6.5 years ago Akula, Nirmala NIH/NIMH [C] ▴ 190

0

Entering edit mode

From your two results, out of the five genes you are showing, I see three genes whose logFC's have changed slightly, and two genes that are completely different in their identity.

ADD REPLY • link 6.5 years ago Steve Lianoglou ★ 13k