EdgeR classic vs. glm

0

Entering edit mode

Son Pham ▴ 60

@son-pham-6437

Last seen 9.9 years ago

United States

Hi All, I have an experiment that is exactly the same as the one described in section 3.5 (Comparisons Both Between and Within Subjects) in the manual<http: www.bioconductor.org="" packages="" release="" bioc="" vignettes="" edg="" er="" inst="" doc="" edgerusersguide.pdf=""> : The experiment has 18 RNA samples collected from 9 subjects. The samples correspond to cells from 3 healthy patients, either treated or not with a hormone; cells from 3 patients with disease 1, either treated or not with the hormone; and cells from 3 patients with disease 2, either treated or not with the hormone. I now find genes responding to the hormone in disease1 patients using two approachs glm and edgeR classic. The results are dramatically different !!! glm gives hundreds of differentially expressed genes, while edgeR classic gives 0. Could anyone tell me that is what one can expect or I'm totally wrong ! data.frame(Disease,Patient,Treatment) design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) colnames(design) y <- estimateGLMCommonDisp(y,design) y <- estimateGLMTrendedDisp(y,design) y <- estimateGLMTagwiseDisp(y,design) fit <- glmFit(y, design) lrt <- glmLRT(fit, coef="DiseaseDisease1:TreatmentHormone") topTags(lrt) detags <- rownames(topTags(lrt)$table) cpm(y)[detags, order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) This gives 612 (-1) and 1129 (+1) genes. I now try to just use 3 patients with Disease1, treated and non treated with hormone as 2 groups and apply the classic edgeR to find the effect of hormon to disease1: targets <-readTargets() x <-read.delim("new.counts", row.names="id") x_c_n <- x[c("d1_1_hormone", "d1_2_hormone", "d1_3_hormone", "d_1", "d_2", "d_3")] g <- factor(c(1,1,1,2,2,2)) y <- DGEList(counts=x_c_n, group=g) y <- calcNormFactors(y) y <- estimateCommonDisp(y) y <- estimateTagwiseDisp(y) et <- exactTest(y) topTags(et) summary(de <- decideTestsDGE(et)) This gives 0 (-1) and also 0 (+1) gene !!!! How can there be such a significant difference between these two approaches? Or I'm wrong? Thank you, Son. [[alternative HTML version deleted]]

edgeR edgeR • 2.7k views

ADD COMMENT • link updated 10.7 years ago by Yunshun Chen ▴ 900 • written 10.7 years ago by Son Pham ▴ 60

0

Entering edit mode

Yunshun Chen ▴ 900

@yunshun-chen-5451

Last seen 5 days ago

Australia

Hi Son, First of all, under the classic edgeR you estimated the dispersions only from a subset of the data. The dispersion estimates were not as reliable as those estimated under the GLM approach (since the information of the other 12 samples were totally ignored). Secondly, your classic edgeR analysis did not account for the patient effect at all. These 6 samples are actually paired (by patient) and cannot be treated as independent samples. This is the main reason why you didn't get any DE genes under the classic edgeR. As stated in the users' guide, the classic edgeR is only applicable on datasets with a single factor design. Hence, you have to use the GLM approach for your data. Cheers, Yunshun ---------------------------------------------------------------------- ------ ---------------------------------------------------------------- Hi All, I have an experiment that is exactly the same as the one described in section 3.5 (Comparisons Both Between and Within Subjects) in the manual<http: www.bioconductor.org="" packages="" release="" bioc="" vignettes="" edg="" er="" ins="" t="" doc="" edgerusersguide.pdf=""> : The experiment has 18 RNA samples collected from 9 subjects. The samples correspond to cells from 3 healthy patients, either treated or not with a hormone; cells from 3 patients with disease 1, either treated or not with the hormone; and cells from 3 patients with disease 2, either treated or not with the hormone. I now find genes responding to the hormone in disease1 patients using two approachs glm and edgeR classic. The results are dramatically different !!! glm gives hundreds of differentially expressed genes, while edgeR classic gives 0. Could anyone tell me that is what one can expect or I'm totally wrong ! data.frame(Disease,Patient,Treatment) design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) colnames(design) y <- estimateGLMCommonDisp(y,design) y <- estimateGLMTrendedDisp(y,design) y <- estimateGLMTagwiseDisp(y,design) fit <- glmFit(y, design) lrt <- glmLRT(fit, coef="DiseaseDisease1:TreatmentHormone") topTags(lrt) detags <- rownames(topTags(lrt)$table) cpm(y)[detags, order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) This gives 612 (-1) and 1129 (+1) genes. I now try to just use 3 patients with Disease1, treated and non treated with hormone as 2 groups and apply the classic edgeR to find the effect of hormon to disease1: targets <-readTargets() x <-read.delim("new.counts", row.names="id") x_c_n <- x[c("d1_1_hormone", "d1_2_hormone", "d1_3_hormone", "d_1", "d_2", "d_3")] g <- factor(c(1,1,1,2,2,2)) y <- DGEList(counts=x_c_n, group=g) y <- calcNormFactors(y) y <- estimateCommonDisp(y) y <- estimateTagwiseDisp(y) et <- exactTest(y) topTags(et) summary(de <- decideTestsDGE(et)) This gives 0 (-1) and also 0 (+1) gene !!!! How can there be such a significant difference between these two approaches? Or I'm wrong? Thank you, Son. ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD COMMENT • link 10.7 years ago Yunshun Chen ▴ 900

0

Entering edit mode

Thank you Yunshun! But there are some confusion remains: I did a similar comparison: finding DEGs in disease2 vs. control: while classic EdgeR gives some DEGs, GLM approach gives 0 DEG, which is strange. I believe that when using GLM approach, we also consider the patients information and thus, the DEGs list should be larger than disregarding such information (edgeR classic). Also, when I want to find the response to hormone in any disease group, it yields exceptions: Also, the #Genes that response to hormone in any disease group lrt <- glmLRT(fit, coef=10:12) topTags(lrt) summary(de <- decideTestsDGE(lrt)) pdf("hormoneEffectInAnyThing.pdf") detags <- rownames(y)[as.logical(de)] plotSmear(lrt, de.tags=detags) abline(h=c(-1, 1), col="blue") dev.off() It yields an exception: > plotSmear(lrt, de.tags=detags) Error in plotSmear(lrt, de.tags = detags) : table$logFC slot in DGELRT object is NULL. We cannot produce an MA (smear) plot if more than one coefficient from the GLM is being tested in the likelihood ratio test as this results in more one logFC value per gene---one for each coefficient. > abline(h=c(-1, 1), col="blue") Any suspect? The design is absolutely the same as the one in the example. Son. On Tue, Mar 11, 2014 at 6:54 PM, Yunshun Chen <yuchen@wehi.edu.au> wrote: > Hi Son, > > > > First of all, under the classic edgeR you estimated the dispersions only > from a subset of the data. > > The dispersion estimates were not as reliable as those estimated under the > GLM approach (since the information of the other 12 samples were totally > ignored). > > > > Secondly, your classic edgeR analysis did not account for the patient > effect at all. > > These 6 samples are actually paired (by patient) and cannot be treated as > independent samples. > > This is the main reason why you didn't get any DE genes under the classic > edgeR. > > > > As stated in the users' guide, the classic edgeR is only applicable on > datasets with a single factor design. > > Hence, you have to use the GLM approach for your data. > > > > Cheers, > > Yunshun > > > > > > > > > -------------------------------------------------------------------- ---------------------------------------------------------------------- -- > > > > > > Hi All, > > I have an experiment that is exactly the same as the one described in > > section 3.5 (Comparisons Both Between and Within Subjects) in the > > manual< > http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/in st/doc/edgeRUsersGuide.pdf > > > > : > > The experiment has 18 RNA samples collected from 9 subjects. The samples > > correspond to cells from 3 healthy patients, either treated or not with a > > hormone; cells from 3 patients with disease 1, either treated or not with > > the hormone; and cells from 3 patients with disease 2, either treated or > > not with the hormone. > > > > I now find genes responding to the hormone in disease1 patients using two > > approachs glm and edgeR classic. The results are dramatically different !!! > > glm gives hundreds of differentially expressed genes, while edgeR classic > > gives 0. > > Could anyone tell me that is what one can expect or I'm totally wrong ! > > > > > > data.frame(Disease,Patient,Treatment) > > design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) > > colnames(design) > > y <- estimateGLMCommonDisp(y,design) > > y <- estimateGLMTrendedDisp(y,design) > > y <- estimateGLMTagwiseDisp(y,design) > > fit <- glmFit(y, design) > > > > lrt <- glmLRT(fit, coef="DiseaseDisease1:TreatmentHormone") > > topTags(lrt) > > detags <- rownames(topTags(lrt)$table) > > cpm(y)[detags, order(y$samples$group)] > > summary(de <- decideTestsDGE(lrt)) > > This gives 612 (-1) and 1129 (+1) genes. > > > > I now try to just use 3 patients with Disease1, treated and non treated > > with hormone as 2 groups and apply the classic edgeR to find the effect of > > hormon to disease1: > > > > targets <-readTargets() > > x <-read.delim("new.counts", row.names="id") > > x_c_n <- x[c("d1_1_hormone", "d1_2_hormone", "d1_3_hormone", "d_1", "d_2", > > "d_3")] > > g <- factor(c(1,1,1,2,2,2)) > > y <- DGEList(counts=x_c_n, group=g) > > y <- calcNormFactors(y) > > y <- estimateCommonDisp(y) > > y <- estimateTagwiseDisp(y) > > et <- exactTest(y) > > topTags(et) > > summary(de <- decideTestsDGE(et)) > > This gives 0 (-1) and also 0 (+1) gene !!!! > > > > How can there be such a significant difference between these two > > approaches? Or I'm wrong? > > > > Thank you, > > > > Son. > > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}

ADD REPLY • link 10.7 years ago Son Pham ▴ 60

0

Entering edit mode

With regard to the exception that you get, it means what it says. You are testing 3 coefficients, so you have 3 logFC values, and it wouldn't make sense to just pick one at random and plot it. Try running glmLRT again with only one coef at a time, and you will be able to do plotSmear with the result. On Thu 13 Mar 2014 03:09:47 PM PDT, Son Pham wrote: > Thank you Yunshun! > But there are some confusion remains: > I did a similar comparison: finding DEGs in disease2 vs. control: while > classic EdgeR gives some DEGs, GLM approach gives 0 DEG, which is strange. > I believe that when using GLM approach, we also consider the patients > information and thus, the DEGs list should be larger than disregarding such > information (edgeR classic). > > > Also, when I want to find the response to hormone in any disease group, it > yields exceptions: > > Also, the > #Genes that response to hormone in any disease group > lrt <- glmLRT(fit, coef=10:12) > topTags(lrt) > summary(de <- decideTestsDGE(lrt)) > pdf("hormoneEffectInAnyThing.pdf") > detags <- rownames(y)[as.logical(de)] > plotSmear(lrt, de.tags=detags) > abline(h=c(-1, 1), col="blue") > dev.off() > > > It yields an exception: > >> plotSmear(lrt, de.tags=detags) > Error in plotSmear(lrt, de.tags = detags) : > table$logFC slot in DGELRT object is NULL. We cannot produce an MA > (smear) plot if more than one coefficient from the GLM is being tested in > the likelihood ratio test as this results in more one logFC value per > gene---one for each coefficient. >> abline(h=c(-1, 1), col="blue") > > Any suspect? The design is absolutely the same as the one in the example. > Son. > > > On Tue, Mar 11, 2014 at 6:54 PM, Yunshun Chen <yuchen at="" wehi.edu.au=""> wrote: > >> Hi Son, >> >> >> >> First of all, under the classic edgeR you estimated the dispersions only >> from a subset of the data. >> >> The dispersion estimates were not as reliable as those estimated under the >> GLM approach (since the information of the other 12 samples were totally >> ignored). >> >> >> >> Secondly, your classic edgeR analysis did not account for the patient >> effect at all. >> >> These 6 samples are actually paired (by patient) and cannot be treated as >> independent samples. >> >> This is the main reason why you didn't get any DE genes under the classic >> edgeR. >> >> >> >> As stated in the users' guide, the classic edgeR is only applicable on >> datasets with a single factor design. >> >> Hence, you have to use the GLM approach for your data. >> >> >> >> Cheers, >> >> Yunshun >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------- ---------------------------------------------------------------------- --- >> >> >> >> >> >> Hi All, >> >> I have an experiment that is exactly the same as the one described in >> >> section 3.5 (Comparisons Both Between and Within Subjects) in the >> >> manual< >> http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/i nst/doc/edgeRUsersGuide.pdf >>> >> >> : >> >> The experiment has 18 RNA samples collected from 9 subjects. The samples >> >> correspond to cells from 3 healthy patients, either treated or not with a >> >> hormone; cells from 3 patients with disease 1, either treated or not with >> >> the hormone; and cells from 3 patients with disease 2, either treated or >> >> not with the hormone. >> >> >> >> I now find genes responding to the hormone in disease1 patients using two >> >> approachs glm and edgeR classic. The results are dramatically different !!! >> >> glm gives hundreds of differentially expressed genes, while edgeR classic >> >> gives 0. >> >> Could anyone tell me that is what one can expect or I'm totally wrong ! >> >> >> >> >> >> data.frame(Disease,Patient,Treatment) >> >> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) >> >> colnames(design) >> >> y <- estimateGLMCommonDisp(y,design) >> >> y <- estimateGLMTrendedDisp(y,design) >> >> y <- estimateGLMTagwiseDisp(y,design) >> >> fit <- glmFit(y, design) >> >> >> >> lrt <- glmLRT(fit, coef="DiseaseDisease1:TreatmentHormone") >> >> topTags(lrt) >> >> detags <- rownames(topTags(lrt)$table) >> >> cpm(y)[detags, order(y$samples$group)] >> >> summary(de <- decideTestsDGE(lrt)) >> >> This gives 612 (-1) and 1129 (+1) genes. >> >> >> >> I now try to just use 3 patients with Disease1, treated and non treated >> >> with hormone as 2 groups and apply the classic edgeR to find the effect of >> >> hormon to disease1: >> >> >> >> targets <-readTargets() >> >> x <-read.delim("new.counts", row.names="id") >> >> x_c_n <- x[c("d1_1_hormone", "d1_2_hormone", "d1_3_hormone", "d_1", "d_2", >> >> "d_3")] >> >> g <- factor(c(1,1,1,2,2,2)) >> >> y <- DGEList(counts=x_c_n, group=g) >> >> y <- calcNormFactors(y) >> >> y <- estimateCommonDisp(y) >> >> y <- estimateTagwiseDisp(y) >> >> et <- exactTest(y) >> >> topTags(et) >> >> summary(de <- decideTestsDGE(et)) >> >> This gives 0 (-1) and also 0 (+1) gene !!!! >> >> >> >> How can there be such a significant difference between these two >> >> approaches? Or I'm wrong? >> >> >> >> Thank you, >> >> >> >> Son. >> >> >> >> ______________________________________________________________________ >> The information in this email is confidential and inte...{{dropped:10}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 10.7 years ago Ryan C. Thompson ★ 7.9k

Login before adding your answer.