Question

EdgeR additive linear model, errors

0

Entering edit mode

Colleen Burge ▴ 30

@colleen-burge-5872

Last seen 9.6 years ago

Dear all, I'm trying to set up my analysis in EdgeR to look at differential expression at 4 time points (J, N, M, N) in three samples (A,B,C). Our hypothesis is that environmental conditions at "N" should have an affect on gene expression, and make gene expression significantly different at this time point than the pre: J or post M &N time points. When I plot the MDS, I can see that my experiment is having a "batch" affect, where samples are grouping by genotype. I would like to test for differences, particularly between N and the other 3 time points accounting for the affect of the samples. I believe an additive linear model with genotype as the blocking factor is appropriate (but PLEASE correct me if I'm wrong), is the correct analysis. I have not been able to get to contrasts, due to error messages, and I may also need help with this :) I have tried the analysis a couple of different ways: model1: ~genotype +group or model ~genotype:genotype + group *Script for model1* #Consideration that treatment is the same for all genotypes setwd ("~/Desktop/CURRENT FILES/R Scripts/Bleaching_data") library(edgeR) # making a matrix of factors called "targets",Time=Time , SF = genotype targets <-readTargets("TargetsSF.txt") targets # importing raw data rawdata <- read.delim("SF_BLEACH.txt",row.names="BLContig") head(rawdata) # setting groups equal to time differences group <- targets$Time group <- as.factor(group) group # making my DGE list y <- DGEList(counts=rawdata,group=group) head(y$counts) # filtering out lowly expressed genes keep <- rowSums(cpm(y)>1) >= 4 y <- y[keep,] table(keep) dim(y) # retained 67416 genes #recompute the library sizes: y$samples$lib.size <- colSums(y$counts) #calculating normalization factors y <- calcNormFactors(y) y$samples # looks good n <- y$samples$lib.size # examining sample for outliers plotMDS(y) ## OK adding indiv SF targets genotype <- factor(targets$SF) genotype design <- model.matrix(~ genotype + group) design y <- estimateGLMCommonDisp(y, design, verbose=TRUE) # Disp = 0.19995 , BCV = 0.4472 y <- estimateGLMTrendedDisp(y,design) # Loading required package: splines y <- estimateGLMTagwiseDisp(y,design) plotBCV(y) fit <- glmFit(y, design) lrt <- glmLRT(y,fit, coef=5:6) #error here: Error in beta[k, ] <- betaj[decr, ] : NAs are not allowed in subscripted assignments *Script for model2: consideration that treatment affects are different between genotypes* setwd ("~/Desktop/CURRENT FILES/R Scripts/Bleaching_data") library(edgeR) # making a matrix of factors called "targets",Time=Time , SF = subject targets <-readTargets("TargetsSF.txt") targets # importing raw data rawdata <- read.delim("SF_BLEACH.txt",row.names="BLContig") head(rawdata) # setting groups equal to time differences group <- targets$Time group <- as.factor(group) group # making my DGE list y <- DGEList(counts=rawdata,group=group) head(y$counts) # filtering out lowly expressed genes keep <- rowSums(cpm(y)>1) >= 4 y <- y[keep,] dim(y) # retained 67416 genes #recompute the library sizes: y$samples$lib.size <- colSums(y$counts) #calculating normalization factors y <- calcNormFactors(y) y$samples # looks good n <- y$samples$lib.size # examining sample for outliers plotMDS(y) ## OK adding indiv SF targets genotype <- factor(targets$SF) genotype design <- model.matrix(~ genotype *group) design y <- estimateGLMCommonDisp(y, design, verbose=TRUE) #ERROR HERE: Warning message: In estimateGLMCommonDisp.default(y = y$counts, design = design, :No residual df: setting dispersion to NA #Can clearly see that the model is parameterizing each sample separately #Tried this iteration of the model design2 <- model.matrix (~ genotype + genotype:group) design2 y <- estimateGLMCommonDisp(y, design2, verbose=TRUE) #Same error as above!! Here is the version of R/edgeR R version 2.15.1 (2012-06-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C/en_US.UTF-8/C/C/C/C attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_2.6.12 limma_3.12.3 loaded via a namespace (and not attached): [1] tools_2.15.1 Thanks you!!!! Colleen Burge Postdoctoral Research Associate Dept. Ecol & Evol. Biology Cornell University [[alternative HTML version deleted]]

Normalization edgeR Normalization edgeR • 1.6k views

ADD COMMENT • link updated 11.0 years ago by Gordon Smyth 50k • written 11.0 years ago by Colleen Burge ▴ 30

score 0 · Answer 1 · 2013-04-04

Dear all, Sorry, my error, the time points should be J, N, O, and M. I wrote "N" twice in the top of my e-mail but not my data :) Colleen On Thu, Apr 4, 2013 at 9:49 AM, Colleen Burge <cab433@cornell.edu> wrote: > Dear all, > > I'm trying to set up my analysis in EdgeR to look at differential > expression at 4 time points (J, N, M, N) in three samples (A,B,C). Our > hypothesis is that environmental conditions at "N" should have an affect on > gene expression, and make gene expression significantly different at this > time point than the pre: J or post M &N time points. When I plot the MDS, > I can see that my experiment is having a "batch" affect, where samples are > grouping by genotype. I would like to test for differences, particularly > between N and the other 3 time points accounting for the affect of the > samples. I believe an additive linear model with genotype as the blocking > factor is appropriate (but PLEASE correct me if I'm wrong), is the correct > analysis. I have not been able to get to contrasts, due to error messages, > and I may also need help with this :) > > I have tried the analysis a couple of different ways: > > model1: ~genotype +group > > or > > model ~genotype:genotype + group > > *Script for model1* > > #Consideration that treatment is the same for all genotypes > > setwd ("~/Desktop/CURRENT FILES/R Scripts/Bleaching_data") > library(edgeR) > > # making a matrix of factors called "targets",Time=Time , SF = genotype > targets <-readTargets("TargetsSF.txt") > targets > > # importing raw data > rawdata <- read.delim("SF_BLEACH.txt",row.names="BLContig") > head(rawdata) > > # setting groups equal to time differences > group <- targets$Time > group <- as.factor(group) > group > > > # making my DGE list > y <- DGEList(counts=rawdata,group=group) > head(y$counts) > > # filtering out lowly expressed genes > keep <- rowSums(cpm(y)>1) >= 4 > y <- y[keep,] > table(keep) > dim(y) > # retained 67416 genes > > #recompute the library sizes: > y$samples$lib.size <- colSums(y$counts) > > #calculating normalization factors > y <- calcNormFactors(y) > y$samples > # looks good > n <- y$samples$lib.size > > # examining sample for outliers > plotMDS(y) > > > ## OK adding indiv SF > targets > genotype <- factor(targets$SF) > genotype > > design <- model.matrix(~ genotype + group) > design > > y <- estimateGLMCommonDisp(y, design, verbose=TRUE) > # Disp = 0.19995 , BCV = 0.4472 > y <- estimateGLMTrendedDisp(y,design) > # Loading required package: splines > y <- estimateGLMTagwiseDisp(y,design) > plotBCV(y) > fit <- glmFit(y, design) > lrt <- glmLRT(y,fit, coef=5:6) > #error here: Error in beta[k, ] <- betaj[decr, ] : NAs are not allowed in > subscripted assignments > > *Script for model2: consideration that treatment affects are different > between genotypes* > > setwd ("~/Desktop/CURRENT FILES/R Scripts/Bleaching_data") > library(edgeR) > > # making a matrix of factors called "targets",Time=Time , SF = subject > targets <-readTargets("TargetsSF.txt") > targets > > # importing raw data > rawdata <- read.delim("SF_BLEACH.txt",row.names="BLContig") > head(rawdata) > > # setting groups equal to time differences > group <- targets$Time > group <- as.factor(group) > group > > > # making my DGE list > y <- DGEList(counts=rawdata,group=group) > head(y$counts) > > # filtering out lowly expressed genes > keep <- rowSums(cpm(y)>1) >= 4 > y <- y[keep,] > dim(y) > # retained 67416 genes > > #recompute the library sizes: > y$samples$lib.size <- colSums(y$counts) > > #calculating normalization factors > y <- calcNormFactors(y) > y$samples > # looks good > n <- y$samples$lib.size > > # examining sample for outliers > plotMDS(y) > > > ## OK adding indiv SF > targets > genotype <- factor(targets$SF) > genotype > > design <- model.matrix(~ genotype *group) > design > y <- estimateGLMCommonDisp(y, design, verbose=TRUE) > > #ERROR HERE: Warning message: In estimateGLMCommonDisp.default(y = > y$counts, design = design, :No residual df: setting dispersion to NA > #Can clearly see that the model is parameterizing each sample separately > > > #Tried this iteration of the model > design2 <- model.matrix (~ genotype + genotype:group) > design2 > > > y <- estimateGLMCommonDisp(y, design2, verbose=TRUE) > #Same error as above!! > > > > Here is the version of R/edgeR > > R version 2.15.1 (2012-06-22) > > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > > > locale: > > [1] C/en_US.UTF-8/C/C/C/C > > > > attached base packages: > > [1] splines stats graphics grDevices utils datasets methods > base > > > > other attached packages: > > [1] edgeR_2.6.12 limma_3.12.3 > > > > loaded via a namespace (and not attached): > > [1] tools_2.15.1 > > > > Thanks you!!!! > > Colleen Burge > > Postdoctoral Research Associate > > Dept. Ecol & Evol. Biology > > Cornell University > [[alternative HTML version deleted]]

score 0 · Answer 2 · 2013-04-06

Dear Colleen, Note also that your version of edgeR is fairly old. There have been many improvements to the package since the version you're using. Best wishes Gordon > Date: Thu, 4 Apr 2013 09:58:48 -0400 > From: Colleen Burge <cab433 at="" cornell.edu=""> > To: bioconductor at r-project.org > Subject: Re: [BioC] EdgeR additive linear model, errors > > Dear all, > > Sorry, my error, the time points should be J, N, O, and M. I wrote "N" > twice in the top of my e-mail but not my data :) > > Colleen > > > On Thu, Apr 4, 2013 at 9:49 AM, Colleen Burge <cab433 at="" cornell.edu=""> wrote: > >> Dear all, >> >> I'm trying to set up my analysis in EdgeR to look at differential >> expression at 4 time points (J, N, M, N) in three samples (A,B,C). Our >> hypothesis is that environmental conditions at "N" should have an affect on >> gene expression, and make gene expression significantly different at this >> time point than the pre: J or post M &N time points. When I plot the MDS, >> I can see that my experiment is having a "batch" affect, where samples are >> grouping by genotype. I would like to test for differences, particularly >> between N and the other 3 time points accounting for the affect of the >> samples. I believe an additive linear model with genotype as the blocking >> factor is appropriate (but PLEASE correct me if I'm wrong), is the correct >> analysis. I have not been able to get to contrasts, due to error messages, >> and I may also need help with this :) >> >> I have tried the analysis a couple of different ways: >> >> model1: ~genotype +group >> >> or >> >> model ~genotype:genotype + group >> >> *Script for model1* >> >> #Consideration that treatment is the same for all genotypes >> >> setwd ("~/Desktop/CURRENT FILES/R Scripts/Bleaching_data") >> library(edgeR) >> >> # making a matrix of factors called "targets",Time=Time , SF = genotype >> targets <-readTargets("TargetsSF.txt") >> targets >> >> # importing raw data >> rawdata <- read.delim("SF_BLEACH.txt",row.names="BLContig") >> head(rawdata) >> >> # setting groups equal to time differences >> group <- targets$Time >> group <- as.factor(group) >> group >> >> >> # making my DGE list >> y <- DGEList(counts=rawdata,group=group) >> head(y$counts) >> >> # filtering out lowly expressed genes >> keep <- rowSums(cpm(y)>1) >= 4 >> y <- y[keep,] >> table(keep) >> dim(y) >> # retained 67416 genes >> >> #recompute the library sizes: >> y$samples$lib.size <- colSums(y$counts) >> >> #calculating normalization factors >> y <- calcNormFactors(y) >> y$samples >> # looks good >> n <- y$samples$lib.size >> >> # examining sample for outliers >> plotMDS(y) >> >> >> ## OK adding indiv SF >> targets >> genotype <- factor(targets$SF) >> genotype >> >> design <- model.matrix(~ genotype + group) >> design >> >> y <- estimateGLMCommonDisp(y, design, verbose=TRUE) >> # Disp = 0.19995 , BCV = 0.4472 >> y <- estimateGLMTrendedDisp(y,design) >> # Loading required package: splines >> y <- estimateGLMTagwiseDisp(y,design) >> plotBCV(y) >> fit <- glmFit(y, design) >> lrt <- glmLRT(y,fit, coef=5:6) >> #error here: Error in beta[k, ] <- betaj[decr, ] : NAs are not allowed in >> subscripted assignments >> >> *Script for model2: consideration that treatment affects are different >> between genotypes* >> >> setwd ("~/Desktop/CURRENT FILES/R Scripts/Bleaching_data") >> library(edgeR) >> >> # making a matrix of factors called "targets",Time=Time , SF = subject >> targets <-readTargets("TargetsSF.txt") >> targets >> >> # importing raw data >> rawdata <- read.delim("SF_BLEACH.txt",row.names="BLContig") >> head(rawdata) >> >> # setting groups equal to time differences >> group <- targets$Time >> group <- as.factor(group) >> group >> >> >> # making my DGE list >> y <- DGEList(counts=rawdata,group=group) >> head(y$counts) >> >> # filtering out lowly expressed genes >> keep <- rowSums(cpm(y)>1) >= 4 >> y <- y[keep,] >> dim(y) >> # retained 67416 genes >> >> #recompute the library sizes: >> y$samples$lib.size <- colSums(y$counts) >> >> #calculating normalization factors >> y <- calcNormFactors(y) >> y$samples >> # looks good >> n <- y$samples$lib.size >> >> # examining sample for outliers >> plotMDS(y) >> >> >> ## OK adding indiv SF >> targets >> genotype <- factor(targets$SF) >> genotype >> >> design <- model.matrix(~ genotype *group) >> design >> y <- estimateGLMCommonDisp(y, design, verbose=TRUE) >> >> #ERROR HERE: Warning message: In estimateGLMCommonDisp.default(y = >> y$counts, design = design, :No residual df: setting dispersion to NA >> #Can clearly see that the model is parameterizing each sample separately >> >> >> #Tried this iteration of the model >> design2 <- model.matrix (~ genotype + genotype:group) >> design2 >> >> >> y <- estimateGLMCommonDisp(y, design2, verbose=TRUE) >> #Same error as above!! >> >> >> >> Here is the version of R/edgeR >> >> R version 2.15.1 (2012-06-22) >> >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> >> >> locale: >> >> [1] C/en_US.UTF-8/C/C/C/C >> >> >> >> attached base packages: >> >> [1] splines stats graphics grDevices utils datasets methods >> base >> >> >> >> other attached packages: >> >> [1] edgeR_2.6.12 limma_3.12.3 >> >> >> >> loaded via a namespace (and not attached): >> >> [1] tools_2.15.1 >> >> >> >> Thanks you!!!! >> >> Colleen Burge >> >> Postdoctoral Research Associate >> >> Dept. Ecol & Evol. Biology >> >> Cornell University > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}