Question

Incorrect result in edgeR.calculateCommonDispersion

0

Entering edit mode

Jacob Silterra ▴ 20

@jacob-silterra-6587

Last seen 11.4 years ago

Hello all, I've encountered an issue with edgeR when it calculates dispersion, and there aren't any samples for a given group. I believe it happens with both tagwise and common dispersion; same idea. Basically splitIntoGroups will return an empty matrix for that group, which messes up the dispersion calculation. I think it would be better to ignore groups that have no data associated with them. Example attached. This might seem unnecessary, but I have a situation where I read in a matrix with samples of different classes and then remove some groups entirely Thanks, -- Jacob Silterra Associate Computational Biologist Broad Institute

edgeR edgeR • 863 views

ADD COMMENT • link updated 11.6 years ago by Gordon Smyth 53k • written 11.6 years ago by Jacob Silterra ▴ 20

score 0 · Answer 1 · 2014-06-05

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

Dear Jacob, There is no function called edgeR.calculateCommonDispersion in the edgeR package. There also wasn't any attachment with your posting. If you subset a DGEList in such a way that a group is removed entirely, you can prevent any problems by resetting the levels of the group factor: dge$samples$group <- factor(dge$samples$group) Best wishes Gordon ----------- original message ------------ Jacob Silterra jacob at broadinstitute.org Wed Jun 4 19:45:50 CEST 2014 Hello all, I've encountered an issue with edgeR when it calculates dispersion, and there aren't any samples for a given group. I believe it happens with both tagwise and common dispersion; same idea. Basically splitIntoGroups will return an empty matrix for that group, which messes up the dispersion calculation. I think it would be better to ignore groups that have no data associated with them. Example attached. This might seem unnecessary, but I have a situation where I read in a matrix with samples of different classes and then remove some groups entirely Thanks, -- Jacob Silterra Associate Computational Biologist Broad Institute ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 11.6 years ago Gordon Smyth 53k

0

Entering edit mode

Hi Gordon, Thanks for the info. My apologies for being unclear, I meant the function estimateCommonDisp (and estimateTagwiseDisp) in the package edgeR. I guess the attachment didn't go through, I've pasted it below -Jacob R script: library(edgeR) groups <- factor(c("A", "A", "B", "B", "C", "C")) rows <- 10 cols <- 6 counts <- matrix( rnorm(rows*cols,mean=100,sd=20), nrow=rows, ncol=cols) counts <- round(counts) #Everything runs smoothly y <- DGEList(counts=counts,group=groups) y <- calcNormFactors(y) y <- estimateCommonDisp(y) print(y$common.disp) #[1] 0.0310142 #Take out samples from group "B", estimating the dispersion fails sel_cols <- c(1,2,5,6) counts <- counts[,sel_cols] groups <- groups[sel_cols] y <- DGEList(counts=counts,group=groups) y <- calcNormFactors(y) y <- estimateCommonDisp(y) print(y$common.disp) #[1] 99.99477 print(warnings()) On Wed, Jun 4, 2014 at 8:21 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Jacob, > > There is no function called edgeR.calculateCommonDispersion in the edgeR > package. > > There also wasn't any attachment with your posting. > > If you subset a DGEList in such a way that a group is removed entirely, > you can prevent any problems by resetting the levels of the group factor: > > dge$samples$group <- factor(dge$samples$group) > > Best wishes > Gordon > > > > ----------- original message ------------ > Jacob Silterra jacob at broadinstitute.org > Wed Jun 4 19:45:50 CEST 2014 > > > Hello all, > > I've encountered an issue with edgeR when it calculates dispersion, and > there aren't any samples for a given group. I believe it happens with both > tagwise and common dispersion; same idea. Basically splitIntoGroups will > return an empty matrix for that group, which messes up the dispersion > calculation. I think it would be better to ignore groups that have no data > associated with them. Example attached. This might seem unnecessary, but I > have a situation where I read in a matrix with samples of different classes > and then remove some groups entirely > > Thanks, > -- > Jacob Silterra > Associate Computational Biologist > Broad Institute > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:17}}

ADD REPLY • link 11.6 years ago Jacob Silterra ▴ 20

0

Entering edit mode

Hi Jacob, Yes, I see the issue. The edgeR routines assume that y$samples$group doesn't have superfluous factor levels. The culprit is: groups <- groups[sel_cols] If you change this to groups <- factor(groups[sel_cols]) all will be well. Best wishes Gordon On Wed, 4 Jun 2014, Jacob Silterra wrote: > Hi Gordon, > > Thanks for the info. My apologies for being unclear, I meant the function > estimateCommonDisp (and estimateTagwiseDisp) in the package edgeR. I guess > the attachment didn't go through, I've pasted it below > > -Jacob > > R script: > library(edgeR) > > > groups <- factor(c("A", "A", "B", "B", "C", "C")) > rows <- 10 > cols <- 6 > counts <- matrix( rnorm(rows*cols,mean=100,sd=20), nrow=rows, ncol=cols) > counts <- round(counts) > > #Everything runs smoothly > y <- DGEList(counts=counts,group=groups) > y <- calcNormFactors(y) > y <- estimateCommonDisp(y) > print(y$common.disp) > #[1] 0.0310142 > > #Take out samples from group "B", estimating the dispersion fails > sel_cols <- c(1,2,5,6) > counts <- counts[,sel_cols] > groups <- groups[sel_cols] > y <- DGEList(counts=counts,group=groups) > y <- calcNormFactors(y) > y <- estimateCommonDisp(y) > print(y$common.disp) > #[1] 99.99477 > print(warnings()) > > > On Wed, Jun 4, 2014 at 8:21 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > >> Dear Jacob, >> >> There is no function called edgeR.calculateCommonDispersion in the edgeR >> package. >> >> There also wasn't any attachment with your posting. >> >> If you subset a DGEList in such a way that a group is removed entirely, >> you can prevent any problems by resetting the levels of the group factor: >> >> dge$samples$group <- factor(dge$samples$group) >> >> Best wishes >> Gordon >> >> >> >> ----------- original message ------------ >> Jacob Silterra jacob at broadinstitute.org >> Wed Jun 4 19:45:50 CEST 2014 >> >> >> Hello all, >> >> I've encountered an issue with edgeR when it calculates dispersion, and >> there aren't any samples for a given group. I believe it happens with both >> tagwise and common dispersion; same idea. Basically splitIntoGroups will >> return an empty matrix for that group, which messes up the dispersion >> calculation. I think it would be better to ignore groups that have no data >> associated with them. Example attached. This might seem unnecessary, but I >> have a situation where I read in a matrix with samples of different classes >> and then remove some groups entirely >> >> Thanks, >> -- >> Jacob Silterra >> Associate Computational Biologist >> Broad Institute ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 11.6 years ago Gordon Smyth 53k