CQN offset and edgeR
2
0
Entering edit mode
@akula-nirmala-nihnimh-c-5007
Last seen 4.4 years ago

Hi,

I am trying to use cqn offset in EdgeR and getting very low p-values (FDR < 10e-100). Is the offset working correctly?  Or I am missing some steps in the code?

> mycqn=cqn(countData,lengths=uCovar$GeneLength,x=uCovar$GCcontent,
+ sizeFactors=sizeFactors$V2,verbose=TRUE)
> offset=mycqn$glm.offset
> design <- model.matrix(~coldata$condition + race + rin + gc)
> y=estimateCommonDisp(y,design=design)
> y <- estimateTrendedDisp(y)
> y <- estimateTagwiseDisp(y)
> glmQlfitcqn=glmQLFit(y,design)
> glmQlfTestcqn=glmQLFTest(glmQlfitcqn,contrast = c(-1, 1, 0, 0, 0, 0, 0))
> topTags(glmQlfTestcqn)
Coefficient:  -1*(Intercept) 1*coldata$conditionBipolar 
                   logFC   logCPM        F        PValue          FDR
ENSG00000130560 17.57000 4.165947 2101.518 9.093832e-102 1.930439e-97
ENSG00000012061 15.78226 4.633156 1774.383  1.149911e-95 1.220516e-91
ENSG00000103549 15.10992 5.061084 1753.249  3.085519e-95 2.183313e-91

Thanks,

Nirmala

CQN and edgeR • 1.7k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 34 minutes ago
WEHI, Melbourne, Australia

Well, the reason you have such small p-values is that you've tested for a contrast that isn't sensible. You have compared the Bipolar effect coefficient with the intercept term, something that doesn't make any sense to do. Did you perhaps simply intend:

glmQLFTest(glmQlfitcqn, coef=2)

which would have tested for DE in Bipolar vs whatever is your reference condition?

According to the code given, you don't use the cqn offsets. You just store them in a matrix called offset, but never use that variable. Why not follow the edgeR code example in the cqn vignette?

ADD COMMENT
0
Entering edit mode

 

Hi Gordon,

Thanks for your response. I get the following error when I use offset command in glmFit

> mydataGlmFit=glmFit(y,design,offset=mycqn$glm.offset)

Error in glmFit.default(y = y$counts, design = design, dispersion = dispersion,  : 

  formal argument "offset" matched by multiple actual arguments


Thanks for your help.

Nirmala

ADD REPLY
0
Entering edit mode

The glmFit() function doesn't accept an offset argument when y is a DGEList object (as you can see from the help page for glmFit).

See Steve Lianoglou's answer or the cqn vignette for correct code.

ADD REPLY
1
Entering edit mode
@steve-lianoglou-2771
Last seen 13 months ago
United States

You can make your life a bit easier by following the code provided in the cqn vignette a bit more closely.

There you'll see that the cqn glm.offset matrix is added to the DGEList directly

So, modifying your code a bit and using Gordon's suggest contrast:

mycqn <- cqn(countData,lengths=uCovar$GeneLength,x=uCovar$GCcontent,
             sizeFactors=sizeFactors$V2,verbose=TRUE)

# This next line is where your life got harder:
# offset=mycqn$glm.offset (this is where your life got harder)

# The vignette rather instructs you to add this as an element
# to your DGEList and edgeR will use it in "all the right plaes",
# like so:
y$offset <- mycqn$glm.offset

design <- model.matrix(~coldata$condition + race + rin + gc)
y <- estimateCommonDisp(y,design=design)
y <- estimateTrendedDisp(y)
y <- estimateTagwiseDisp(y)
glmQlfitcqn <- glmQLFit(y,design)

# Gordon's suggested test
glmQlfTestcqn <- glmQLFTest(glmQlfitcqn, coef=2)
ADD COMMENT
0
Entering edit mode

Hi Steve,

Whether I use the cqn offset (y$offset <- mycqn$glm.offset) or not I get the same results

As CQN manual mentions that sometimes standard analysis gives the same results as when using CQN or whether the offset is working or not? Is there a way to test this?

Thanks,

Nirmala

ADD REPLY
0
Entering edit mode

Sure, why don't you try to scramble the values in your cqn offset matrix and see if those don't look any different. This is untested, but I think should work:

y2 <- y
y2$offset <- matrix(sample(y$offset), nrow = nrow(y$offset))
# ... redo the edgeR steps and test the same contrast

Also, how are you testing that the results are the same? What two measures/results are you specfically looking at and how are you testing for equality? In other words, please show the code you are using for that test.

 

ADD REPLY
0
Entering edit mode

Dear Nirmala,

That cannot be true. edgeR does use the offset matrix and the results certainly will change, at least slightly, if you set offsets from cqn.

However you should not expect big changes. The changes from cqn normalization may be very small.

ADD REPLY
0
Entering edit mode

Hi Gordon and Steve,

As you suggested the results are slightly different (only in the decimals). Does this mean CQN normalization is not important in this particular analysis?

The code with offset:

> design <- model.matrix(~coldata$condition + race + rin + gc)  
> y <- DGEList(counts=countData,group=coldata$condition)        
> y$offset=mycqn$glm.offset                                     
> y=estimateGLMCommonDisp(y,design=design)                      
> y <- estimateTrendedDisp(y)                                   
> y <- estimateTagwiseDisp(y)                                

> glmQlfitcqn <- glmQLFit(y,design)                             

> glmQlfTestcqn <- glmQLFTest(glmQlfitcqn, coef=2)
> topTags(glmQlfTestcqn)
Coefficient:  coldata$conditionBipolar 
                     logFC     logCPM        F       PValue          FDR
ENSG00000262902  5.8271297  0.9704256 54.74150 4.965934e-12 1.054168e-07
ENSG00000242613 -6.9341328 -4.6776882 47.59587 1.447763e-10 1.536656e-06
ENSG00000240661 -5.7728950 -0.7280092 42.98372 5.577964e-10 3.946968e-06
ENSG00000275043  1.2199644 -4.9797565 32.79179 7.438676e-07 3.947705e-03
ENSG00000242529  1.7135376 -3.5847925 22.80545 3.687357e-06 1.565504e-02
 

The code without offset:

> design <- model.matrix(~coldata$condition + race + rin + gc)                                                                                    

> y <- DGEList(counts=countData,group=coldata$condition)                                                                                          

> y=estimateGLMCommonDisp(y,design=design)                       

> y <- estimateTrendedDisp(y)                                    

> y <- estimateTagwiseDisp(y)                                    

> glmQlfitcqn <- glmQLFit(y,design)  

> glmQlfTestcqnBPcontrast <- glmQLFTest(glmQlfitcqn, contrast = c(0,1,0, 0, 0, 0, 0))

> topTags(glmQlfTestcqnBPcontrast)

Coefficient:  1*coldata$conditionBipolar 

                     logFC     logCPM        F       PValue          FDR

ENSG00000262902  5.5077647  0.9706013 54.22068 6.033903e-12 1.280877e-07

ENSG00000242613 -6.7783525 -4.6776200 47.41311 1.537793e-10 1.632213e-06

ENSG00000240661 -5.6740717 -0.7279688 43.42691 4.617483e-10 3.267331e-06

ENSG00000230289  2.9165150 -5.1530384 22.47538 4.287052e-06 2.275138e-02

ENSG00000183688  0.4888003  1.9599803 21.25202 7.571841e-06 2.744876e-02

Thanks,

Nirmala

ADD REPLY
0
Entering edit mode

From your two results, out of the five genes you are showing, I see three genes whose logFC's have changed slightly, and two genes that are completely different in their identity.

ADD REPLY

Login before adding your answer.

Traffic: 687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6