I am trying to use cqn offset in EdgeR and getting very low p-values (FDR < 10e-100). Is the offset working correctly? Or I am missing some steps in the code?
Well, the reason you have such small p-values is that you've tested for a contrast that isn't sensible. You have compared the Bipolar effect coefficient with the intercept term, something that doesn't make any sense to do. Did you perhaps simply intend:
glmQLFTest(glmQlfitcqn, coef=2)
which would have tested for DE in Bipolar vs whatever is your reference condition?
According to the code given, you don't use the cqn offsets. You just store them in a matrix called offset, but never use that variable. Why not follow the edgeR code example in the cqn vignette?
You can make your life a bit easier by following the code provided in the cqn vignette a bit more closely.
There you'll see that the cqn glm.offset matrix is added to the DGEList directly
So, modifying your code a bit and using Gordon's suggest contrast:
mycqn <- cqn(countData,lengths=uCovar$GeneLength,x=uCovar$GCcontent,
sizeFactors=sizeFactors$V2,verbose=TRUE)# This next line is where your life got harder:# offset=mycqn$glm.offset (this is where your life got harder)# The vignette rather instructs you to add this as an element# to your DGEList and edgeR will use it in "all the right plaes",# like so:
y$offset <- mycqn$glm.offset
design <- model.matrix(~coldata$condition + race + rin + gc)
y <- estimateCommonDisp(y,design=design)
y <- estimateTrendedDisp(y)
y <- estimateTagwiseDisp(y)
glmQlfitcqn <- glmQLFit(y,design)# Gordon's suggested test
glmQlfTestcqn <- glmQLFTest(glmQlfitcqn, coef=2)
Whether I use the cqn offset (y$offset <- mycqn$glm.offset) or not I get the same results
As CQN manual mentions that sometimes standard analysis gives the same results as when using CQN or whether the offset is working or not? Is there a way to test this?
Sure, why don't you try to scramble the values in your cqn offset matrix and see if those don't look any different. This is untested, but I think should work:
y2 <- y
y2$offset <- matrix(sample(y$offset), nrow = nrow(y$offset))
# ... redo the edgeR steps and test the same contrast
Also, how are you testing that the results are the same? What two measures/results are you specfically looking at and how are you testing for equality? In other words, please show the code you are using for that test.
As you suggested the results are slightly different (only in the decimals). Does this mean CQN normalization is not important in this particular analysis?
The code with offset:
> design <- model.matrix(~coldata$condition + race + rin + gc)
> y <- DGEList(counts=countData,group=coldata$condition)
> y$offset=mycqn$glm.offset
> y=estimateGLMCommonDisp(y,design=design)
> y <- estimateTrendedDisp(y)
> y <- estimateTagwiseDisp(y)
From your two results, out of the five genes you are showing, I see three genes whose logFC's have changed slightly, and two genes that are completely different in their identity.
Hi Gordon,
Thanks for your response. I get the following error when I use offset command in glmFit
> mydataGlmFit=glmFit(y,design,offset=mycqn$glm.offset)
Error in glmFit.default(y = y$counts, design = design, dispersion = dispersion, :
formal argument "offset" matched by multiple actual arguments
Thanks for your help.
Nirmala
The glmFit() function doesn't accept an offset argument when y is a DGEList object (as you can see from the help page for glmFit).
See Steve Lianoglou's answer or the cqn vignette for correct code.