edgeR glm
2
0
Entering edit mode
@shreyartha-mukherjee-4378
Last seen 9.7 years ago
Hi Gondon,Bioconductor experts, I have RNA-seq data for 3 genotypes(each genotype having 3 biological replicates) and 30,000 genes. I was trying to find out differentially expressed genes across all genotypes. Here is the code I am using. library(edgeR) y<-read.table('test_12339.txt' ) d <- DGEList(y, group = rep(1:3, each = 3), lib.size = lib.sizes) d <- calcNormFactors(d) times <- rep(c("p1","p2","f1"),c(3,3,3)) times <- factor(times,levels=c("p1","p2","f1")) design <- model.matrix(~factor(times)) disp <- d2$tagwise.dispersion fit <- glmFit(d,design,dispersion=disp) lrt <- glmLRT(d,fit) topTags(lrt) Does this code provide me with tags differentially expressed across all conditions? (I am not looking for pairwise differential expression but across all conditions) Thanks, Shreyartha [[alternative HTML version deleted]]
• 1.3k views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 14 months ago
United States
Hi, On Tue, Apr 5, 2011 at 5:04 PM, Shreyartha Mukherjee <shreyartha at="" gmail.com=""> wrote: > Hi Gondon,Bioconductor experts, > > I have RNA-seq data for 3 genotypes(each genotype having 3 biological > replicates) and 30,000 genes. > I was trying to find out differentially expressed genes across all > genotypes. Here is the code I am using. > > ?library(edgeR) > y<-read.table('test_12339.txt' > ) > ?d <- DGEList(y, group = rep(1:3, each = 3), lib.size = lib.sizes) > ?d <- calcNormFactors(d) > ?times <- rep(c("p1","p2","f1"),c(3,3,3)) > ?times <- factor(times,levels=c("p1","p2","f1")) > ?design <- model.matrix(~factor(times)) > disp <- d2$tagwise.dispersion > ?fit <- glmFit(d,design,dispersion=disp) > ?lrt <- glmLRT(d,fit) > topTags(lrt) > > Does this code provide me with tags differentially expressed across all > conditions? (I am not looking for pairwise differential expression > but across all conditions) Out of curiosity, what does it mean for a gene to be differentially expressed in all (three) conditions? I mean -- what would its expression pattern look like across your three genotypes? If G1, G2, and G3 are your three different genotypes, are you looking for a gene A that's differentially expressed when you compare G1 vs. G2 AND G2 vs G3 AND G1 vs G3 (or something(?)) -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD COMMENT
0
Entering edit mode
Hi, I want to test the effect of having three different genotypes as opposed to having no different genotypes. I hope this clarifies my question. Thanks, Shreyartha On Tue, Apr 5, 2011 at 4:52 PM, Steve Lianoglou < mailinglist.honeypot@gmail.com> wrote: > Hi, > > On Tue, Apr 5, 2011 at 5:04 PM, Shreyartha Mukherjee > <shreyartha@gmail.com> wrote: > > Hi Gondon,Bioconductor experts, > > > > I have RNA-seq data for 3 genotypes(each genotype having 3 biological > > replicates) and 30,000 genes. > > I was trying to find out differentially expressed genes across all > > genotypes. Here is the code I am using. > > > > library(edgeR) > > y<-read.table('test_12339.txt' > > ) > > d <- DGEList(y, group = rep(1:3, each = 3), lib.size = lib.sizes) > > d <- calcNormFactors(d) > > times <- rep(c("p1","p2","f1"),c(3,3,3)) > > times <- factor(times,levels=c("p1","p2","f1")) > > design <- model.matrix(~factor(times)) > > disp <- d2$tagwise.dispersion > > fit <- glmFit(d,design,dispersion=disp) > > lrt <- glmLRT(d,fit) > > topTags(lrt) > > > > Does this code provide me with tags differentially expressed across all > > conditions? (I am not looking for pairwise differential expression > > but across all conditions) > > Out of curiosity, what does it mean for a gene to be differentially > expressed in all (three) conditions? > > I mean -- what would its expression pattern look like across your > three genotypes? > > If G1, G2, and G3 are your three different genotypes, are you looking > for a gene A that's differentially expressed when you compare G1 vs. > G2 AND G2 vs G3 AND G1 vs G3 (or something(?)) > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia
Dear Shreyartha, You code is actually testing for genes that are DE between f1 and p1. By default, glmLRT tests for the last coefficient in the linear model and in your model this happens to represent f1-p1. See help("glmLRT"). What you probably want is lrt <- glmLRT(d,fit,coef=c(2,3)) This will test whether f1=p1 and p2=p1 simultaneously, i.e., whether all three genotypes have the same expression. This will find genes that are differentially expressed between any of the genotypes. This is analogous to doing genewise oneway ANOVA tests for differences between the three genotypes. Note that the above test does not guarantee that all three genotypes are different. There is actually no statistical test that can do that. Rather it detects genes for which the genotypes are not all equal. Unfortunately, the current release version of topTags() won't work properly when you test for several coefficients as above. We'll fix this for the next Bioconductor release in a couple of weeks time. In the meantime, you could use the following code: o <- order(lrt$table$p.value) ntags <- 10 lrt$table[o[1:ntags],] to see the most significant tags. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, NHMRC Senior Research Fellow, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. Tel: (03) 9345 2326, Fax (03) 9347 0852, smyth at wehi.edu.au http://www.wehi.edu.au http://www.statsci.org/smyth On Tue, 5 Apr 2011, Shreyartha Mukherjee wrote: > Hi Gondon,Bioconductor experts, > > I have RNA-seq data for 3 genotypes(each genotype having 3 biological > replicates) and 30,000 genes. I was trying to find out differentially > expressed genes across all genotypes. Here is the code I am using. > > library(edgeR) > y<-read.table('test_12339.txt' > ) > d <- DGEList(y, group = rep(1:3, each = 3), lib.size = lib.sizes) > d <- calcNormFactors(d) > times <- rep(c("p1","p2","f1"),c(3,3,3)) > times <- factor(times,levels=c("p1","p2","f1")) > design <- model.matrix(~factor(times)) > disp <- d2$tagwise.dispersion > fit <- glmFit(d,design,dispersion=disp) > lrt <- glmLRT(d,fit) > topTags(lrt) > > Does this code provide me with tags differentially expressed across all > conditions? (I am not looking for pairwise differential expression > but across all conditions) > > Thanks, > Shreyartha ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
Hi Gordon, Thanks for your help. I had another question about the lrt table (obtained from glmlrt). How do we interpret the logConc values for 3 genotypes, I understand for a pairwise comparison it should be (logA + logB)/2. Is it (logA + logB+ logC)/3 ? Thanks, Shreyartha On Tue, Apr 5, 2011 at 9:11 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Shreyartha, > > You code is actually testing for genes that are DE between f1 and p1. By > default, glmLRT tests for the last coefficient in the linear model and in > your model this happens to represent f1-p1. See help("glmLRT"). > > What you probably want is > > lrt <- glmLRT(d,fit,coef=c(2,3)) > > This will test whether f1=p1 and p2=p1 simultaneously, i.e., whether all > three genotypes have the same expression. This will find genes that are > differentially expressed between any of the genotypes. This is analogous to > doing genewise oneway ANOVA tests for differences between the three > genotypes. > > Note that the above test does not guarantee that all three genotypes are > different. There is actually no statistical test that can do that. Rather > it detects genes for which the genotypes are not all equal. > > Unfortunately, the current release version of topTags() won't work properly > when you test for several coefficients as above. We'll fix this for the > next Bioconductor release in a couple of weeks time. In the meantime, you > could use the following code: > > o <- order(lrt$table$p.value) > ntags <- 10 > lrt$table[o[1:ntags],] > > to see the most significant tags. > > Best wishes > Gordon > > --------------------------------------------- > Professor Gordon K Smyth, > NHMRC Senior Research Fellow, > Bioinformatics Division, > Walter and Eliza Hall Institute of Medical Research, > 1G Royal Parade, Parkville, Vic 3052, Australia. > Tel: (03) 9345 2326, Fax (03) 9347 0852, > smyth@wehi.edu.au > http://www.wehi.edu.au > http://www.statsci.org/smyth > > > On Tue, 5 Apr 2011, Shreyartha Mukherjee wrote: > > Hi Gondon,Bioconductor experts, >> >> I have RNA-seq data for 3 genotypes(each genotype having 3 biological >> replicates) and 30,000 genes. I was trying to find out differentially >> expressed genes across all genotypes. Here is the code I am using. >> >> library(edgeR) >> y<-read.table('test_12339.txt' >> ) >> d <- DGEList(y, group = rep(1:3, each = 3), lib.size = lib.sizes) >> d <- calcNormFactors(d) >> times <- rep(c("p1","p2","f1"),c(3,3,3)) >> times <- factor(times,levels=c("p1","p2","f1")) >> design <- model.matrix(~factor(times)) >> disp <- d2$tagwise.dispersion >> fit <- glmFit(d,design,dispersion=disp) >> lrt <- glmLRT(d,fit) >> topTags(lrt) >> >> Does this code provide me with tags differentially expressed across all >> conditions? (I am not looking for pairwise differential expression >> but across all conditions) >> >> Thanks, >> Shreyartha >> > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}
ADD REPLY
0
Entering edit mode
Dear Shreyartha, logConc is a summary measure of average concentration for each tag over all treatment conditions. It is the same value regardless of what comparison is being made. In the version of edgeR about to be released, it is computed by mglmOneGroup() and is close to log(mean(count/lib.size)). Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, NHMRC Senior Research Fellow, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. Tel: (03) 9345 2326, Fax (03) 9347 0852, smyth at wehi.edu.au http://www.wehi.edu.au http://www.statsci.org/smyth On Wed, 6 Apr 2011, Shreyartha Mukherjee wrote: > Hi Gordon, > > Thanks for your help. I had another question about the lrt table (obtained > from glmlrt). > > How do we interpret the logConc values for 3 genotypes, I understand for a > pairwise comparison it should be (logA + logB)/2. > Is it (logA + logB+ logC)/3 ? > > Thanks, > Shreyartha > > > > > > On Tue, Apr 5, 2011 at 9:11 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > >> Dear Shreyartha, >> >> You code is actually testing for genes that are DE between f1 and p1. By >> default, glmLRT tests for the last coefficient in the linear model and in >> your model this happens to represent f1-p1. See help("glmLRT"). >> >> What you probably want is >> >> lrt <- glmLRT(d,fit,coef=c(2,3)) >> >> This will test whether f1=p1 and p2=p1 simultaneously, i.e., whether all >> three genotypes have the same expression. This will find genes that are >> differentially expressed between any of the genotypes. This is analogous to >> doing genewise oneway ANOVA tests for differences between the three >> genotypes. >> >> Note that the above test does not guarantee that all three genotypes are >> different. There is actually no statistical test that can do that. Rather >> it detects genes for which the genotypes are not all equal. >> >> Unfortunately, the current release version of topTags() won't work properly >> when you test for several coefficients as above. We'll fix this for the >> next Bioconductor release in a couple of weeks time. In the meantime, you >> could use the following code: >> >> o <- order(lrt$table$p.value) >> ntags <- 10 >> lrt$table[o[1:ntags],] >> >> to see the most significant tags. >> >> Best wishes >> Gordon >> >> --------------------------------------------- >> Professor Gordon K Smyth, >> NHMRC Senior Research Fellow, >> Bioinformatics Division, >> Walter and Eliza Hall Institute of Medical Research, >> 1G Royal Parade, Parkville, Vic 3052, Australia. >> Tel: (03) 9345 2326, Fax (03) 9347 0852, >> smyth at wehi.edu.au >> http://www.wehi.edu.au >> http://www.statsci.org/smyth >> >> >> On Tue, 5 Apr 2011, Shreyartha Mukherjee wrote: >> >> Hi Gondon,Bioconductor experts, >>> >>> I have RNA-seq data for 3 genotypes(each genotype having 3 biological >>> replicates) and 30,000 genes. I was trying to find out differentially >>> expressed genes across all genotypes. Here is the code I am using. >>> >>> library(edgeR) >>> y<-read.table('test_12339.txt' >>> ) >>> d <- DGEList(y, group = rep(1:3, each = 3), lib.size = lib.sizes) >>> d <- calcNormFactors(d) >>> times <- rep(c("p1","p2","f1"),c(3,3,3)) >>> times <- factor(times,levels=c("p1","p2","f1")) >>> design <- model.matrix(~factor(times)) >>> disp <- d2$tagwise.dispersion >>> fit <- glmFit(d,design,dispersion=disp) >>> lrt <- glmLRT(d,fit) >>> topTags(lrt) >>> >>> Does this code provide me with tags differentially expressed across all >>> conditions? (I am not looking for pairwise differential expression >>> but across all conditions) >>> >>> Thanks, >>> Shreyartha >>> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of >> the sender. >> ______________________________________________________________________ >> > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD REPLY

Login before adding your answer.

Traffic: 382 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6