using offsets from EDASeq and cqn for edgeR

0

Entering edit mode

Kasper Daniel Hansen ★ 6.5k

@kasper-daniel-hansen-2979

Last seen 10 months ago

United States

We have (very late) fixed this. The output of cqn now includes a new component glm.offset, which fixes the issues Michael points out (and one other). From the NEWS file \item THe output object from cqn() now has an additional component: glm.offset which is an offset matrix which can directly be used in a GLM type model (specifically edgeR). The usage is explained in the vignette secion on Import into edgeR. Previously the vignette recommended using the offset component of the cqn output, which is wrong, due to a scaling issue. The offset component of cqn is unchanged. This bug was found by Mike Love \email{love@molgen.mpg.de} and fixed in CQN 1.5.1. Kasper On Wed, Feb 20, 2013 at 4:11 AM, Michael Love <love@molgen.mpg.de> wrote: > hi, > > re: EDASeq, this was my mistake. The vignette does in fact have the > multiplication by -1, but my eyes scanned over this: > > > library(edgeR) > > design <- model.matrix(~conditions, data=pData(dataOffset)) > > disp <- estimateGLMCommonDisp(exprs(dataOffset), > + design, offset=-offst(dataOffset)) > > fit <- glmFit(exprs(dataOffset), design, disp, > offset=-offst(dataOffset)) > > Then, I still believe that the cqn offsets should be multiplied by > -log(2) to be handed to edgeR, which is not currently in the cqn vignette: > > > design <- model.matrix(~ d.mont$sample$group) > > d.mont.cqn <- estimateGLMCommonDisp(d.mont, design = design, > + offset = cqn.subset$offset) > > But I think it would be desirable for packages on both sides of the > hand-off to provide an explicit formula for what is meant by "offset", > which could be one of: > > log(mu) = X * beta + offset > log2(mu) = X * beta + offset > log(mu) + offset = X * beta > log2(mu) + offset = X * beta > > best, > > Mike > > > On 02/14/2013 06:53 PM, Mike Love wrote: > > One more thing, another wrinkle is log or log2. For this reason, I > > shouldn't have used correlation in the example. I believe cqn is > > giving log2 offsets, EDASeq natural log, and edgeR is accepting > > natural log. > > > > thanks, > > > > Mike > > > > On 2/14/13 5:21 PM, Michael Love wrote: > >> hi, > >> > >> I could have things totally mixed up here. I was thinking that GLM > >> offsets would be positively correlated with log counts. > >> > >> for instance in edgeR, mglmLS.R: > >> > >> mu <- exp(beta %*% t(X) + offset) > >> > >> but it seems offsets from packages EDASeq and cqn are negative > >> correlated with log counts. > >> > >> for example some poisson data: > >> > >> > n <- 1000 > >> > covariate <- rnorm(n,6,.5) > >> > counts <- replicate(2,rpois(n,exp(covariate))) > >> > >> > library(EDASeq) > >> > edaseq.offset <- > >> withinLaneNormalization(x=counts,y=covariate,which="full",offset=TRUE) > >> > cor(edaseq.offset[,1],log(counts[,1]+1)) > >> [1] -0.9765923 > >> > >> > library(cqn) > >> > cqn.offset <- > >> > cqn(counts,x=covariate,lengths=rep(1,n),lengthMethod="fixed",sizeFac tors=c(1,1))$offset > >> > cor(cqn.offset[,1],log(counts[,1]+1)) > >> [1] -0.9950717 > >> > >> > >> So EDASeq and cqn are giving similar results here: > >> > >> > cor(edaseq.offset[,1],cqn.offset[,1]) > >> [1] 0.9847672 > >> > >> If I give these to the standard glm function as offsets I would > >> expect the coefficient for 'covariate' to go from 1 to 0. But it > >> gives me something like a coefficient of 2, and to get zero I have to > >> give it -1 * offset: > >> > >> without offset: > >> > >> > coef(glm(counts[,1] ~ covariate, family="poisson")) > >> (Intercept) covariate > >> -0.02735053 1.00467285 > >> > >> with offset: > >> > >> > coef(glm(counts[,1] ~ covariate, offset=edaseq.offset[,1], > >> family="poisson")) > >> (Intercept) covariate > >> -5.860094 1.954997 > >> > coef(glm(counts[,1] ~ covariate, offset=cqn.offset[,1], > >> family="poisson")) > >> (Intercept) covariate > >> -8.752786 2.457637 > >> > >> with -1 * offset: > >> > >> > coef(glm(counts[,1] ~ covariate, offset=-edaseq.offset[,1], > >> family="poisson")) > >> (Intercept) covariate > >> 5.8253511 0.0498681 > >> > coef(glm(counts[,1] ~ covariate, offset=-cqn.offset[,1], > >> family="poisson")) > >> (Intercept) covariate > >> 8.6975896 -0.4482184 > >> > >> > >> thanks for any guidance, > >> > >> Mike > >> > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

GO edgeR cqn EDASeq GO edgeR cqn EDASeq • 1.9k views

ADD COMMENT • link 11.1 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

Kasper Daniel Hansen ★ 6.5k

@kasper-daniel-hansen-2979

Last seen 10 months ago

United States

I should probably add that our suggestion of using cqn$y + cqn$offset as the CQN corrected expression measures is unchanged and this is what we evaluate in our paper. Kasper On Tue, Apr 2, 2013 at 10:26 AM, Kasper Daniel Hansen <khansen@jhsph.edu>wrote: > > We have (very late) fixed this. The output of cqn now includes a new > component glm.offset, which fixes the issues Michael points out (and one > other). From the NEWS file > > \item THe output object from cqn() now has an additional component: > glm.offset which is an offset matrix which can directly be used in a > GLM type model (specifically edgeR). The usage is explained in the > vignette secion on Import into edgeR. Previously the vignette > recommended using the offset component of the cqn output, which is > wrong, due to a scaling issue. The offset component of cqn is > unchanged. This bug was found by Mike Love > \email{love@molgen.mpg.de} and fixed in CQN 1.5.1. > > Kasper > > > > On Wed, Feb 20, 2013 at 4:11 AM, Michael Love <love@molgen.mpg.de> wrote: > >> hi, >> >> re: EDASeq, this was my mistake. The vignette does in fact have the >> multiplication by -1, but my eyes scanned over this: >> >> > library(edgeR) >> > design <- model.matrix(~conditions, data=pData(dataOffset)) >> > disp <- estimateGLMCommonDisp(exprs(dataOffset), >> + design, offset=-offst(dataOffset)) >> > fit <- glmFit(exprs(dataOffset), design, disp, >> offset=-offst(dataOffset)) >> >> Then, I still believe that the cqn offsets should be multiplied by >> -log(2) to be handed to edgeR, which is not currently in the cqn vignette: >> >> > design <- model.matrix(~ d.mont$sample$group) >> > d.mont.cqn <- estimateGLMCommonDisp(d.mont, design = design, >> + offset = cqn.subset$offset) >> >> But I think it would be desirable for packages on both sides of the >> hand-off to provide an explicit formula for what is meant by "offset", >> which could be one of: >> >> log(mu) = X * beta + offset >> log2(mu) = X * beta + offset >> log(mu) + offset = X * beta >> log2(mu) + offset = X * beta >> >> best, >> >> Mike >> >> >> On 02/14/2013 06:53 PM, Mike Love wrote: >> > One more thing, another wrinkle is log or log2. For this reason, I >> > shouldn't have used correlation in the example. I believe cqn is >> > giving log2 offsets, EDASeq natural log, and edgeR is accepting >> > natural log. >> > >> > thanks, >> > >> > Mike >> > >> > On 2/14/13 5:21 PM, Michael Love wrote: >> >> hi, >> >> >> >> I could have things totally mixed up here. I was thinking that GLM >> >> offsets would be positively correlated with log counts. >> >> >> >> for instance in edgeR, mglmLS.R: >> >> >> >> mu <- exp(beta %*% t(X) + offset) >> >> >> >> but it seems offsets from packages EDASeq and cqn are negative >> >> correlated with log counts. >> >> >> >> for example some poisson data: >> >> >> >> > n <- 1000 >> >> > covariate <- rnorm(n,6,.5) >> >> > counts <- replicate(2,rpois(n,exp(covariate))) >> >> >> >> > library(EDASeq) >> >> > edaseq.offset <- >> >> withinLaneNormalization(x=counts,y=covariate,which="full",offset=TRUE) >> >> > cor(edaseq.offset[,1],log(counts[,1]+1)) >> >> [1] -0.9765923 >> >> >> >> > library(cqn) >> >> > cqn.offset <- >> >> >> cqn(counts,x=covariate,lengths=rep(1,n),lengthMethod="fixed",sizeFa ctors=c(1,1))$offset >> >> > cor(cqn.offset[,1],log(counts[,1]+1)) >> >> [1] -0.9950717 >> >> >> >> >> >> So EDASeq and cqn are giving similar results here: >> >> >> >> > cor(edaseq.offset[,1],cqn.offset[,1]) >> >> [1] 0.9847672 >> >> >> >> If I give these to the standard glm function as offsets I would >> >> expect the coefficient for 'covariate' to go from 1 to 0. But it >> >> gives me something like a coefficient of 2, and to get zero I have to >> >> give it -1 * offset: >> >> >> >> without offset: >> >> >> >> > coef(glm(counts[,1] ~ covariate, family="poisson")) >> >> (Intercept) covariate >> >> -0.02735053 1.00467285 >> >> >> >> with offset: >> >> >> >> > coef(glm(counts[,1] ~ covariate, offset=edaseq.offset[,1], >> >> family="poisson")) >> >> (Intercept) covariate >> >> -5.860094 1.954997 >> >> > coef(glm(counts[,1] ~ covariate, offset=cqn.offset[,1], >> >> family="poisson")) >> >> (Intercept) covariate >> >> -8.752786 2.457637 >> >> >> >> with -1 * offset: >> >> >> >> > coef(glm(counts[,1] ~ covariate, offset=-edaseq.offset[,1], >> >> family="poisson")) >> >> (Intercept) covariate >> >> 5.8253511 0.0498681 >> >> > coef(glm(counts[,1] ~ covariate, offset=-cqn.offset[,1], >> >> family="poisson")) >> >> (Intercept) covariate >> >> 8.6975896 -0.4482184 >> >> >> >> >> >> thanks for any guidance, >> >> >> >> Mike >> >> >> > >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]

ADD COMMENT • link 11.1 years ago Kasper Daniel Hansen ★ 6.5k

Login before adding your answer.