Search
Question: EdgeR additive linear model, errors
0
gravatar for Colleen Burge
5.7 years ago by
Colleen Burge30 wrote:
Dear all, I'm trying to set up my analysis in EdgeR to look at differential expression at 4 time points (J, N, M, N) in three samples (A,B,C). Our hypothesis is that environmental conditions at "N" should have an affect on gene expression, and make gene expression significantly different at this time point than the pre: J or post M &N time points. When I plot the MDS, I can see that my experiment is having a "batch" affect, where samples are grouping by genotype. I would like to test for differences, particularly between N and the other 3 time points accounting for the affect of the samples. I believe an additive linear model with genotype as the blocking factor is appropriate (but PLEASE correct me if I'm wrong), is the correct analysis. I have not been able to get to contrasts, due to error messages, and I may also need help with this :) I have tried the analysis a couple of different ways: model1: ~genotype +group or model ~genotype:genotype + group *Script for model1* #Consideration that treatment is the same for all genotypes setwd ("~/Desktop/CURRENT FILES/R Scripts/Bleaching_data") library(edgeR) # making a matrix of factors called "targets",Time=Time , SF = genotype targets <-readTargets("TargetsSF.txt") targets # importing raw data rawdata <- read.delim("SF_BLEACH.txt",row.names="BLContig") head(rawdata) # setting groups equal to time differences group <- targets$Time group <- as.factor(group) group # making my DGE list y <- DGEList(counts=rawdata,group=group) head(y$counts) # filtering out lowly expressed genes keep <- rowSums(cpm(y)>1) >= 4 y <- y[keep,] table(keep) dim(y) # retained 67416 genes #recompute the library sizes: y$samples$lib.size <- colSums(y$counts) #calculating normalization factors y <- calcNormFactors(y) y$samples # looks good n <- y$samples$lib.size # examining sample for outliers plotMDS(y) ## OK adding indiv SF targets genotype <- factor(targets$SF) genotype design <- model.matrix(~ genotype + group) design y <- estimateGLMCommonDisp(y, design, verbose=TRUE) # Disp = 0.19995 , BCV = 0.4472 y <- estimateGLMTrendedDisp(y,design) # Loading required package: splines y <- estimateGLMTagwiseDisp(y,design) plotBCV(y) fit <- glmFit(y, design) lrt <- glmLRT(y,fit, coef=5:6) #error here: Error in beta[k, ] <- betaj[decr, ] : NAs are not allowed in subscripted assignments *Script for model2: consideration that treatment affects are different between genotypes* setwd ("~/Desktop/CURRENT FILES/R Scripts/Bleaching_data") library(edgeR) # making a matrix of factors called "targets",Time=Time , SF = subject targets <-readTargets("TargetsSF.txt") targets # importing raw data rawdata <- read.delim("SF_BLEACH.txt",row.names="BLContig") head(rawdata) # setting groups equal to time differences group <- targets$Time group <- as.factor(group) group # making my DGE list y <- DGEList(counts=rawdata,group=group) head(y$counts) # filtering out lowly expressed genes keep <- rowSums(cpm(y)>1) >= 4 y <- y[keep,] dim(y) # retained 67416 genes #recompute the library sizes: y$samples$lib.size <- colSums(y$counts) #calculating normalization factors y <- calcNormFactors(y) y$samples # looks good n <- y$samples$lib.size # examining sample for outliers plotMDS(y) ## OK adding indiv SF targets genotype <- factor(targets$SF) genotype design <- model.matrix(~ genotype *group) design y <- estimateGLMCommonDisp(y, design, verbose=TRUE) #ERROR HERE: Warning message: In estimateGLMCommonDisp.default(y = y$counts, design = design, :No residual df: setting dispersion to NA #Can clearly see that the model is parameterizing each sample separately #Tried this iteration of the model design2 <- model.matrix (~ genotype + genotype:group) design2 y <- estimateGLMCommonDisp(y, design2, verbose=TRUE) #Same error as above!! Here is the version of R/edgeR R version 2.15.1 (2012-06-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C/en_US.UTF-8/C/C/C/C attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_2.6.12 limma_3.12.3 loaded via a namespace (and not attached): [1] tools_2.15.1 Thanks you!!!! Colleen Burge Postdoctoral Research Associate Dept. Ecol & Evol. Biology Cornell University [[alternative HTML version deleted]]
ADD COMMENTlink modified 5.7 years ago by Gordon Smyth35k • written 5.7 years ago by Colleen Burge30
0
gravatar for Colleen Burge
5.7 years ago by
Colleen Burge30 wrote:
Dear all, Sorry, my error, the time points should be J, N, O, and M. I wrote "N" twice in the top of my e-mail but not my data :) Colleen On Thu, Apr 4, 2013 at 9:49 AM, Colleen Burge <cab433@cornell.edu> wrote: > Dear all, > > I'm trying to set up my analysis in EdgeR to look at differential > expression at 4 time points (J, N, M, N) in three samples (A,B,C). Our > hypothesis is that environmental conditions at "N" should have an affect on > gene expression, and make gene expression significantly different at this > time point than the pre: J or post M &N time points. When I plot the MDS, > I can see that my experiment is having a "batch" affect, where samples are > grouping by genotype. I would like to test for differences, particularly > between N and the other 3 time points accounting for the affect of the > samples. I believe an additive linear model with genotype as the blocking > factor is appropriate (but PLEASE correct me if I'm wrong), is the correct > analysis. I have not been able to get to contrasts, due to error messages, > and I may also need help with this :) > > I have tried the analysis a couple of different ways: > > model1: ~genotype +group > > or > > model ~genotype:genotype + group > > *Script for model1* > > #Consideration that treatment is the same for all genotypes > > setwd ("~/Desktop/CURRENT FILES/R Scripts/Bleaching_data") > library(edgeR) > > # making a matrix of factors called "targets",Time=Time , SF = genotype > targets <-readTargets("TargetsSF.txt") > targets > > # importing raw data > rawdata <- read.delim("SF_BLEACH.txt",row.names="BLContig") > head(rawdata) > > # setting groups equal to time differences > group <- targets$Time > group <- as.factor(group) > group > > > # making my DGE list > y <- DGEList(counts=rawdata,group=group) > head(y$counts) > > # filtering out lowly expressed genes > keep <- rowSums(cpm(y)>1) >= 4 > y <- y[keep,] > table(keep) > dim(y) > # retained 67416 genes > > #recompute the library sizes: > y$samples$lib.size <- colSums(y$counts) > > #calculating normalization factors > y <- calcNormFactors(y) > y$samples > # looks good > n <- y$samples$lib.size > > # examining sample for outliers > plotMDS(y) > > > ## OK adding indiv SF > targets > genotype <- factor(targets$SF) > genotype > > design <- model.matrix(~ genotype + group) > design > > y <- estimateGLMCommonDisp(y, design, verbose=TRUE) > # Disp = 0.19995 , BCV = 0.4472 > y <- estimateGLMTrendedDisp(y,design) > # Loading required package: splines > y <- estimateGLMTagwiseDisp(y,design) > plotBCV(y) > fit <- glmFit(y, design) > lrt <- glmLRT(y,fit, coef=5:6) > #error here: Error in beta[k, ] <- betaj[decr, ] : NAs are not allowed in > subscripted assignments > > *Script for model2: consideration that treatment affects are different > between genotypes* > > setwd ("~/Desktop/CURRENT FILES/R Scripts/Bleaching_data") > library(edgeR) > > # making a matrix of factors called "targets",Time=Time , SF = subject > targets <-readTargets("TargetsSF.txt") > targets > > # importing raw data > rawdata <- read.delim("SF_BLEACH.txt",row.names="BLContig") > head(rawdata) > > # setting groups equal to time differences > group <- targets$Time > group <- as.factor(group) > group > > > # making my DGE list > y <- DGEList(counts=rawdata,group=group) > head(y$counts) > > # filtering out lowly expressed genes > keep <- rowSums(cpm(y)>1) >= 4 > y <- y[keep,] > dim(y) > # retained 67416 genes > > #recompute the library sizes: > y$samples$lib.size <- colSums(y$counts) > > #calculating normalization factors > y <- calcNormFactors(y) > y$samples > # looks good > n <- y$samples$lib.size > > # examining sample for outliers > plotMDS(y) > > > ## OK adding indiv SF > targets > genotype <- factor(targets$SF) > genotype > > design <- model.matrix(~ genotype *group) > design > y <- estimateGLMCommonDisp(y, design, verbose=TRUE) > > #ERROR HERE: Warning message: In estimateGLMCommonDisp.default(y = > y$counts, design = design, :No residual df: setting dispersion to NA > #Can clearly see that the model is parameterizing each sample separately > > > #Tried this iteration of the model > design2 <- model.matrix (~ genotype + genotype:group) > design2 > > > y <- estimateGLMCommonDisp(y, design2, verbose=TRUE) > #Same error as above!! > > > > Here is the version of R/edgeR > > R version 2.15.1 (2012-06-22) > > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > > > locale: > > [1] C/en_US.UTF-8/C/C/C/C > > > > attached base packages: > > [1] splines stats graphics grDevices utils datasets methods > base > > > > other attached packages: > > [1] edgeR_2.6.12 limma_3.12.3 > > > > loaded via a namespace (and not attached): > > [1] tools_2.15.1 > > > > Thanks you!!!! > > Colleen Burge > > Postdoctoral Research Associate > > Dept. Ecol & Evol. Biology > > Cornell University > [[alternative HTML version deleted]]
ADD COMMENTlink written 5.7 years ago by Colleen Burge30
0
gravatar for Gordon Smyth
5.7 years ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:
Dear Colleen, Note also that your version of edgeR is fairly old. There have been many improvements to the package since the version you're using. Best wishes Gordon > Date: Thu, 4 Apr 2013 09:58:48 -0400 > From: Colleen Burge <cab433 at="" cornell.edu=""> > To: bioconductor at r-project.org > Subject: Re: [BioC] EdgeR additive linear model, errors > > Dear all, > > Sorry, my error, the time points should be J, N, O, and M. I wrote "N" > twice in the top of my e-mail but not my data :) > > Colleen > > > On Thu, Apr 4, 2013 at 9:49 AM, Colleen Burge <cab433 at="" cornell.edu=""> wrote: > >> Dear all, >> >> I'm trying to set up my analysis in EdgeR to look at differential >> expression at 4 time points (J, N, M, N) in three samples (A,B,C). Our >> hypothesis is that environmental conditions at "N" should have an affect on >> gene expression, and make gene expression significantly different at this >> time point than the pre: J or post M &N time points. When I plot the MDS, >> I can see that my experiment is having a "batch" affect, where samples are >> grouping by genotype. I would like to test for differences, particularly >> between N and the other 3 time points accounting for the affect of the >> samples. I believe an additive linear model with genotype as the blocking >> factor is appropriate (but PLEASE correct me if I'm wrong), is the correct >> analysis. I have not been able to get to contrasts, due to error messages, >> and I may also need help with this :) >> >> I have tried the analysis a couple of different ways: >> >> model1: ~genotype +group >> >> or >> >> model ~genotype:genotype + group >> >> *Script for model1* >> >> #Consideration that treatment is the same for all genotypes >> >> setwd ("~/Desktop/CURRENT FILES/R Scripts/Bleaching_data") >> library(edgeR) >> >> # making a matrix of factors called "targets",Time=Time , SF = genotype >> targets <-readTargets("TargetsSF.txt") >> targets >> >> # importing raw data >> rawdata <- read.delim("SF_BLEACH.txt",row.names="BLContig") >> head(rawdata) >> >> # setting groups equal to time differences >> group <- targets$Time >> group <- as.factor(group) >> group >> >> >> # making my DGE list >> y <- DGEList(counts=rawdata,group=group) >> head(y$counts) >> >> # filtering out lowly expressed genes >> keep <- rowSums(cpm(y)>1) >= 4 >> y <- y[keep,] >> table(keep) >> dim(y) >> # retained 67416 genes >> >> #recompute the library sizes: >> y$samples$lib.size <- colSums(y$counts) >> >> #calculating normalization factors >> y <- calcNormFactors(y) >> y$samples >> # looks good >> n <- y$samples$lib.size >> >> # examining sample for outliers >> plotMDS(y) >> >> >> ## OK adding indiv SF >> targets >> genotype <- factor(targets$SF) >> genotype >> >> design <- model.matrix(~ genotype + group) >> design >> >> y <- estimateGLMCommonDisp(y, design, verbose=TRUE) >> # Disp = 0.19995 , BCV = 0.4472 >> y <- estimateGLMTrendedDisp(y,design) >> # Loading required package: splines >> y <- estimateGLMTagwiseDisp(y,design) >> plotBCV(y) >> fit <- glmFit(y, design) >> lrt <- glmLRT(y,fit, coef=5:6) >> #error here: Error in beta[k, ] <- betaj[decr, ] : NAs are not allowed in >> subscripted assignments >> >> *Script for model2: consideration that treatment affects are different >> between genotypes* >> >> setwd ("~/Desktop/CURRENT FILES/R Scripts/Bleaching_data") >> library(edgeR) >> >> # making a matrix of factors called "targets",Time=Time , SF = subject >> targets <-readTargets("TargetsSF.txt") >> targets >> >> # importing raw data >> rawdata <- read.delim("SF_BLEACH.txt",row.names="BLContig") >> head(rawdata) >> >> # setting groups equal to time differences >> group <- targets$Time >> group <- as.factor(group) >> group >> >> >> # making my DGE list >> y <- DGEList(counts=rawdata,group=group) >> head(y$counts) >> >> # filtering out lowly expressed genes >> keep <- rowSums(cpm(y)>1) >= 4 >> y <- y[keep,] >> dim(y) >> # retained 67416 genes >> >> #recompute the library sizes: >> y$samples$lib.size <- colSums(y$counts) >> >> #calculating normalization factors >> y <- calcNormFactors(y) >> y$samples >> # looks good >> n <- y$samples$lib.size >> >> # examining sample for outliers >> plotMDS(y) >> >> >> ## OK adding indiv SF >> targets >> genotype <- factor(targets$SF) >> genotype >> >> design <- model.matrix(~ genotype *group) >> design >> y <- estimateGLMCommonDisp(y, design, verbose=TRUE) >> >> #ERROR HERE: Warning message: In estimateGLMCommonDisp.default(y = >> y$counts, design = design, :No residual df: setting dispersion to NA >> #Can clearly see that the model is parameterizing each sample separately >> >> >> #Tried this iteration of the model >> design2 <- model.matrix (~ genotype + genotype:group) >> design2 >> >> >> y <- estimateGLMCommonDisp(y, design2, verbose=TRUE) >> #Same error as above!! >> >> >> >> Here is the version of R/edgeR >> >> R version 2.15.1 (2012-06-22) >> >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> >> >> locale: >> >> [1] C/en_US.UTF-8/C/C/C/C >> >> >> >> attached base packages: >> >> [1] splines stats graphics grDevices utils datasets methods >> base >> >> >> >> other attached packages: >> >> [1] edgeR_2.6.12 limma_3.12.3 >> >> >> >> loaded via a namespace (and not attached): >> >> [1] tools_2.15.1 >> >> >> >> Thanks you!!!! >> >> Colleen Burge >> >> Postdoctoral Research Associate >> >> Dept. Ecol & Evol. Biology >> >> Cornell University > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENTlink written 5.7 years ago by Gordon Smyth35k
Dear Gordon and listers, Thank you so much, upgrading EdgeR helped a lot, no more strange errors! I'm wondering if you (or anyone else) can comment on the appropriateness of my model? I think, I have to use the first option I described design <- model.matrix(~ genotype + group) as opposed to: design2 <- model.matrix (~ genotype + genotype:group) as I don't have replicates for each genotype at time point, is this correct? My samples are grouped as follows: X SF Time 1 JULA A J 2 JULB B J 3 JULC C J 4 NOVA A N 5 NOVB B N 6 NOVC C N 7 MARA A M 8 MARB B M 9 MARC C M 10 OCTA A O 11 OCTB B O 12 OCTC C O Also, for contrasts, my samples were taken over a time series with July representing pre-treatment, November during treatment, March after treatment and October after treatment. I think my contrasts should be to compare each group to November I currently have the following as contrasts: my.contrasts <- makeContrasts(groupNvsgroupJ=groupN-groupJ, groupNvsgroupM=groupN-groupM, groupNvsgroupO=groupN-groupO, levels=design4) lrt.groupNvsgroupJ <- glmLRT(fit, contrast=my.contrasts[,"groupNvsgroupJ"]) lrt.groupNvsgroupM <- glmLRT(fit, contrast=my.contrasts [, "groupNvsgroupM"]) lrt.groupNvsgroupO <- glmLRT(fit, contrast=my.contrasts [, "groupNvsgroupO"] Does this makes sense? I essentially want to know if there are major differences in differential expression during the November time point. With these contrasts and the model: design <- model.matrix(~ genotype + group) I have a large # of down-reg genes in November versus each of the other groups. Thank you in advance, Colleen:) On Fri, Apr 5, 2013 at 6:05 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Colleen, > > Note also that your version of edgeR is fairly old. There have been many > improvements to the package since the version you're using. > > Best wishes > Gordon > > Date: Thu, 4 Apr 2013 09:58:48 -0400 >> From: Colleen Burge <cab433@cornell.edu> >> To: bioconductor@r-project.org >> Subject: Re: [BioC] EdgeR additive linear model, errors >> >> Dear all, >> >> Sorry, my error, the time points should be J, N, O, and M. I wrote "N" >> twice in the top of my e-mail but not my data :) >> >> Colleen >> >> >> On Thu, Apr 4, 2013 at 9:49 AM, Colleen Burge <cab433@cornell.edu> wrote: >> >> Dear all, >>> >>> I'm trying to set up my analysis in EdgeR to look at differential >>> expression at 4 time points (J, N, M, N) in three samples (A,B,C). Our >>> hypothesis is that environmental conditions at "N" should have an affect >>> on >>> gene expression, and make gene expression significantly different at this >>> time point than the pre: J or post M &N time points. When I plot the >>> MDS, >>> I can see that my experiment is having a "batch" affect, where samples >>> are >>> grouping by genotype. I would like to test for differences, particularly >>> between N and the other 3 time points accounting for the affect of the >>> samples. I believe an additive linear model with genotype as the >>> blocking >>> factor is appropriate (but PLEASE correct me if I'm wrong), is the >>> correct >>> analysis. I have not been able to get to contrasts, due to error >>> messages, >>> and I may also need help with this :) >>> >>> I have tried the analysis a couple of different ways: >>> >>> model1: ~genotype +group >>> >>> or >>> >>> model ~genotype:genotype + group >>> >>> *Script for model1* >>> >>> #Consideration that treatment is the same for all genotypes >>> >>> setwd ("~/Desktop/CURRENT FILES/R Scripts/Bleaching_data") >>> library(edgeR) >>> >>> # making a matrix of factors called "targets",Time=Time , SF = genotype >>> targets <-readTargets("TargetsSF.txt") >>> targets >>> >>> # importing raw data >>> rawdata <- read.delim("SF_BLEACH.txt",**row.names="BLContig") >>> head(rawdata) >>> >>> # setting groups equal to time differences >>> group <- targets$Time >>> group <- as.factor(group) >>> group >>> >>> >>> # making my DGE list >>> y <- DGEList(counts=rawdata,group=**group) >>> head(y$counts) >>> >>> # filtering out lowly expressed genes >>> keep <- rowSums(cpm(y)>1) >= 4 >>> y <- y[keep,] >>> table(keep) >>> dim(y) >>> # retained 67416 genes >>> >>> #recompute the library sizes: >>> y$samples$lib.size <- colSums(y$counts) >>> >>> #calculating normalization factors >>> y <- calcNormFactors(y) >>> y$samples >>> # looks good >>> n <- y$samples$lib.size >>> >>> # examining sample for outliers >>> plotMDS(y) >>> >>> >>> ## OK adding indiv SF >>> targets >>> genotype <- factor(targets$SF) >>> genotype >>> >>> design <- model.matrix(~ genotype + group) >>> design >>> >>> y <- estimateGLMCommonDisp(y, design, verbose=TRUE) >>> # Disp = 0.19995 , BCV = 0.4472 >>> y <- estimateGLMTrendedDisp(y,**design) >>> # Loading required package: splines >>> y <- estimateGLMTagwiseDisp(y,**design) >>> plotBCV(y) >>> fit <- glmFit(y, design) >>> lrt <- glmLRT(y,fit, coef=5:6) >>> #error here: Error in beta[k, ] <- betaj[decr, ] : NAs are not allowed >>> in >>> subscripted assignments >>> >>> *Script for model2: consideration that treatment affects are different >>> between genotypes* >>> >>> setwd ("~/Desktop/CURRENT FILES/R Scripts/Bleaching_data") >>> library(edgeR) >>> >>> # making a matrix of factors called "targets",Time=Time , SF = subject >>> targets <-readTargets("TargetsSF.txt") >>> targets >>> >>> # importing raw data >>> rawdata <- read.delim("SF_BLEACH.txt",**row.names="BLContig") >>> head(rawdata) >>> >>> # setting groups equal to time differences >>> group <- targets$Time >>> group <- as.factor(group) >>> group >>> >>> >>> # making my DGE list >>> y <- DGEList(counts=rawdata,group=**group) >>> head(y$counts) >>> >>> # filtering out lowly expressed genes >>> keep <- rowSums(cpm(y)>1) >= 4 >>> y <- y[keep,] >>> dim(y) >>> # retained 67416 genes >>> >>> #recompute the library sizes: >>> y$samples$lib.size <- colSums(y$counts) >>> >>> #calculating normalization factors >>> y <- calcNormFactors(y) >>> y$samples >>> # looks good >>> n <- y$samples$lib.size >>> >>> # examining sample for outliers >>> plotMDS(y) >>> >>> >>> ## OK adding indiv SF >>> targets >>> genotype <- factor(targets$SF) >>> genotype >>> >>> design <- model.matrix(~ genotype *group) >>> design >>> y <- estimateGLMCommonDisp(y, design, verbose=TRUE) >>> >>> #ERROR HERE: Warning message: In estimateGLMCommonDisp.default(**y = >>> y$counts, design = design, :No residual df: setting dispersion to NA >>> #Can clearly see that the model is parameterizing each sample separately >>> >>> >>> #Tried this iteration of the model >>> design2 <- model.matrix (~ genotype + genotype:group) >>> design2 >>> >>> >>> y <- estimateGLMCommonDisp(y, design2, verbose=TRUE) >>> #Same error as above!! >>> >>> >>> >>> Here is the version of R/edgeR >>> >>> R version 2.15.1 (2012-06-22) >>> >>> Platform: x86_64-apple-darwin9.8.0/x86_**64 (64-bit) >>> >>> >>> >>> locale: >>> >>> [1] C/en_US.UTF-8/C/C/C/C >>> >>> >>> >>> attached base packages: >>> >>> [1] splines stats graphics grDevices utils datasets methods >>> base >>> >>> >>> >>> other attached packages: >>> >>> [1] edgeR_2.6.12 limma_3.12.3 >>> >>> >>> >>> loaded via a namespace (and not attached): >>> >>> [1] tools_2.15.1 >>> >>> >>> >>> Thanks you!!!! >>> >>> Colleen Burge >>> >>> Postdoctoral Research Associate >>> >>> Dept. Ecol & Evol. Biology >>> >>> Cornell University >>> >> >> > ______________________________**______________________________**____ ______ > The information in this email is confidential and inte...{{dropped:10}}
ADD REPLYlink written 5.7 years ago by Colleen Burge30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 357 users visited in the last hour