How to specify factors for a RCBD in edgeR
1
0
Entering edit mode
@gordon-smyth
Last seen 9 hours ago
WEHI, Melbourne, Australia
Dear Tilahun, There is no way to specify a random block in edgeR, nor in any other existing statistical software designed to analyse RNA-Seq counts. However there is no purpose in doing so for your experiment. The only reason that you might want to define block as random rather than fixed is to recover inter-block information about the experimental treatments. Your experiment is entirely balanced, in that all treatment conditions appear in each block, so there is no inter-block information to recover. Hence the edgeR analysis that I have suggested, which uses intra-block information only, is fully efficient. If you want to test for a treatment effect in each individual tissue: design <- model.matrix(~Tissue+Tissue:Treatment+Block) Then lrtA <- glmLRT(d, fit, coef="TissueA:Treatmentstressed") finds genes DE between stress and control in tissue A, lrtB <- glmLRT(d, fit, coef="TissueB:Treatmentstressed") finds genes DE between stress and control in tissue B. And so on. You will find it useful to examine colnames(design). Best wishes Gordon > Date: Tue, 31 Jan 2012 11:24:49 -0600 > From: Tilahun Abebe <tilahun.abebe at="" uni.edu=""> > To: bioconductor at r-project.org > Subject: Re: [BioC] Bioconductor Digest, Vol 107, Issue 31 > > Dear Gordon, > > Thank you for the tip. I think I am making some progress. I should have > phrased my last question better. I meant what code do I use to get genes > differentially expressed in one tissue under control and stress conditions. > > I have one more question. Is it possible to specify the random effect in > the design matrix in edgeR? I am thinking of something similar to the > RANDOM statement in SAS that will allow me to treat "block" as a random > effect? > > Cheers. > > Tilahun > ---------------------------- > >> Date: Tue, 31 Jan 2012 11:18:59 +1100 (AUS Eastern Daylight Time) >> From: Gordon K Smyth <smyth at="" wehi.edu.au=""> >> To: Tilahun Abebe <tilahun.abebe at="" uni.edu=""> >> Cc: Bioconductor mailing list <bioconductor at="" r-project.org=""> >> Subject: [BioC] Please help! How to specify factors for a RCBD in >> edgeR >> >> Dear Tilahun, >> >> The first step is that you need to create a data frame containing the >> experimental factors, just as you would for a SAS analysis. So you need >> to create three factors: >> >> Tissue >> Treatment >> Block >> >> each containing 24 values, one for each RNA sample. Then the design marix >> is formed by: >> >> design <- model.matrix(~Tissue*Treatment+Block) >> >> Type colnames(design) to see how the coefficients are defined. You will >> see that the interaction coefficients are coefficients 6 to 8. >> >> After fitting your linear model, you could find genes that show >> significant Tissue*Treatment interaction (on 3df) by >> >> lrt <- glmLRT(d, fit, coef=6:8) >> >> and so on. >> >> I don't understand your question "How do I obtain differentially expressed >> genes in each Tissue*Stress combination?", so I can't give specific advice >> on that. Differentially expressed compared to what? >> >> Best wishes >> Gordon >> >> ---------------- original message ------------------- >> [BioC] Please help! How to specify factors for a RCBD in edgeR >> Tilahun Abebe tilahun.abebe at uni.edu >> Wed Jan 25 22:16:41 CET 2012 >> >> Hi, >> >> I am learning how to use edgeR to analyze RNA-seq data generated from >> Illumina GAII. The experimental design is fairly complex. >> >> I have a mixed 4x2 factorial randomized complete block design (RCBD) >> consisting of: >> >> 4 tissues: A, B, C, D >> 2 treatments: control, stressed >> 3 blocks: Block1, Block2, Block3 >> >> Tissue and treatment are fixed effects and block is a random effect. >> >> Here is the code I tried to use in edgeR: >> >>> counts <- read.delim( file = "Mycounts.txt", header = TRUE) >>> rownames <-counts [ , 1 ] >>> d <- counts [, 2:25] # counts are in columns 2-25 >>> d >>> group <- c(rep("Control", 3), rep("Stress", 3), rep("Control", 3), >> rep("Stress", 3), rep("Control", 3), rep("Stress", 3), rep("Control", 3), >> rep("Stress", 3)) >>> d <- DGEList(counts = d, group = group) >>> design <- model.matrix(~group) >>> d <- calcNormFactors(d) >>> d$samples >>> d <- estimateGLMCommonDisp(d, design) >>> d <- estimateGLMTagwiseDisp(d, design) >>> d$common.dispersion >>> fit <- glmFit(d, design) >>> lrt <- glmLRT(d, fit, coef=2) >>> topTags(lrt, n=4) >> >> I am interested to know genes differentially expressed in each of the four >> tissues under stress. However, I feel like I am not specifying the factors >> correctly in the design statement. My questions are: >> >> 1) How do I specify the fixed effects Tissue, Stress, and Tissue*Stress >> interaction in the model? >> 2) How do I tell edgeR to use block as a random effect? >> 3) How do I obtain differentially expressed genes in each Tissue*Stress >> combination? >> >> I appreciate your help. >> >> Cheers. >> >> Tilahun Abebe, Ph.D. >> University of northern Iowa >> Cedar Falls, IA ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
edgeR edgeR • 1.5k views
ADD COMMENT
0
Entering edit mode
@tilahun-abebe-5072
Last seen 10.0 years ago
Dear Gordon, Thank you for the wonderful help. I will Run edgeR based on your suggestions and let you know if I have any problems/questions. Again, thank you for the wonderful support you are providing to the research community. Tilahun ----------------------- On Thu, Feb 2, 2012 at 7:16 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Tilahun, > > There is no way to specify a random block in edgeR, nor in any other > existing statistical software designed to analyse RNA-Seq counts. However > there is no purpose in doing so for your experiment. The only reason that > you might want to define block as random rather than fixed is to recover > inter-block information about the experimental treatments. Your experiment > is entirely balanced, in that all treatment conditions appear in each > block, so there is no inter-block information to recover. Hence the edgeR > analysis that I have suggested, which uses intra-block information only, is > fully efficient. > > If you want to test for a treatment effect in each individual tissue: > > design <- model.matrix(~Tissue+Tissue:**Treatment+Block) > > Then > > lrtA <- glmLRT(d, fit, coef="TissueA:**Treatmentstressed") > > finds genes DE between stress and control in tissue A, > > lrtB <- glmLRT(d, fit, coef="TissueB:**Treatmentstressed") > > finds genes DE between stress and control in tissue B. And so on. > > You will find it useful to examine colnames(design). > > Best wishes > Gordon > > Date: Tue, 31 Jan 2012 11:24:49 -0600 >> From: Tilahun Abebe <tilahun.abebe@uni.edu> >> To: bioconductor@r-project.org >> Subject: Re: [BioC] Bioconductor Digest, Vol 107, Issue 31 >> >> Dear Gordon, >> >> Thank you for the tip. I think I am making some progress. I should have >> phrased my last question better. I meant what code do I use to get genes >> differentially expressed in one tissue under control and stress >> conditions. >> >> I have one more question. Is it possible to specify the random effect in >> the design matrix in edgeR? I am thinking of something similar to the >> RANDOM statement in SAS that will allow me to treat "block" as a random >> effect? >> >> Cheers. >> >> Tilahun >> ---------------------------- >> >> Date: Tue, 31 Jan 2012 11:18:59 +1100 (AUS Eastern Daylight Time) >>> From: Gordon K Smyth <smyth@wehi.edu.au> >>> To: Tilahun Abebe <tilahun.abebe@uni.edu> >>> Cc: Bioconductor mailing list <bioconductor@r-project.org> >>> Subject: [BioC] Please help! How to specify factors for a RCBD in >>> edgeR >>> >>> Dear Tilahun, >>> >>> The first step is that you need to create a data frame containing the >>> experimental factors, just as you would for a SAS analysis. So you need >>> to create three factors: >>> >>> Tissue >>> Treatment >>> Block >>> >>> each containing 24 values, one for each RNA sample. Then the design >>> marix >>> is formed by: >>> >>> design <- model.matrix(~Tissue***Treatment+Block) >>> >>> Type colnames(design) to see how the coefficients are defined. You will >>> see that the interaction coefficients are coefficients 6 to 8. >>> >>> After fitting your linear model, you could find genes that show >>> significant Tissue*Treatment interaction (on 3df) by >>> >>> lrt <- glmLRT(d, fit, coef=6:8) >>> >>> and so on. >>> >>> I don't understand your question "How do I obtain differentially >>> expressed >>> genes in each Tissue*Stress combination?", so I can't give specific >>> advice >>> on that. Differentially expressed compared to what? >>> >>> Best wishes >>> Gordon >>> >>> ---------------- original message ------------------- >>> [BioC] Please help! How to specify factors for a RCBD in edgeR >>> Tilahun Abebe tilahun.abebe at uni.edu >>> Wed Jan 25 22:16:41 CET 2012 >>> >>> Hi, >>> >>> I am learning how to use edgeR to analyze RNA-seq data generated from >>> Illumina GAII. The experimental design is fairly complex. >>> >>> I have a mixed 4x2 factorial randomized complete block design (RCBD) >>> consisting of: >>> >>> 4 tissues: A, B, C, D >>> 2 treatments: control, stressed >>> 3 blocks: Block1, Block2, Block3 >>> >>> Tissue and treatment are fixed effects and block is a random effect. >>> >>> Here is the code I tried to use in edgeR: >>> >>> counts <- read.delim( file = "Mycounts.txt", header = TRUE) >>>> rownames <-counts [ , 1 ] >>>> d <- counts [, 2:25] # counts are in columns 2-25 >>>> d >>>> group <- c(rep("Control", 3), rep("Stress", 3), rep("Control", 3), >>>> >>> rep("Stress", 3), rep("Control", 3), rep("Stress", 3), rep("Control", 3), >>> rep("Stress", 3)) >>> >>>> d <- DGEList(counts = d, group = group) >>>> design <- model.matrix(~group) >>>> d <- calcNormFactors(d) >>>> d$samples >>>> d <- estimateGLMCommonDisp(d, design) >>>> d <- estimateGLMTagwiseDisp(d, design) >>>> d$common.dispersion >>>> fit <- glmFit(d, design) >>>> lrt <- glmLRT(d, fit, coef=2) >>>> topTags(lrt, n=4) >>>> >>> >>> I am interested to know genes differentially expressed in each of the >>> four >>> tissues under stress. However, I feel like I am not specifying the >>> factors >>> correctly in the design statement. My questions are: >>> >>> 1) How do I specify the fixed effects Tissue, Stress, and Tissue*Stress >>> interaction in the model? >>> 2) How do I tell edgeR to use block as a random effect? >>> 3) How do I obtain differentially expressed genes in each Tissue*Stress >>> combination? >>> >>> I appreciate your help. >>> >>> Cheers. >>> >>> Tilahun Abebe, Ph.D. >>> University of northern Iowa >>> Cedar Falls, IA >>> >> > ______________________________**______________________________**____ ______ > The information in this email is confidential and inte...{{dropped:10}}
ADD COMMENT

Login before adding your answer.

Traffic: 522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6