Question

Paired comparisons in EdgeR

0

Entering edit mode

Jussi Salmi ▴ 20

@jussi-salmi-6542

Last seen 10.1 years ago

Hello! Thank you for the nice software and clear user guide. I am using EdgeR to analyse RNA-seq data. In the experiment there are several cell cultures with different knock-downs. The cell cultures are stimulated in two different ways. In the end, I want to compare the treated cultures to the untreated ones. The same untreated knock-down is compared to the same treated knock-down culture. I think that the study design produces paired comparisons because the same cultures are first used as the untreated ones and then they are treated and compared against the untreated. I have come up with the following code, based on the user guide: x<-read.delim("htseqout.edger", sep=" ",row.names="Symbol") group<-factor(c(21,21,19,19,6,6,8,8,12,12,18,18,20,20,11,11,9,9,23... y<-DGEList(counts=x,group=group) design<-model.matrix(~0+group, data=y$samples) colnames(design)<-levels(y$samples$group) y<-estimateGLMTrendedDisp(y,design) y<-estimateGLMTagwiseDisp(y,design) fit<-glmFit(y,design) ?lrt0203<-glmLRT(fit,contrast=c(0,1,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0)) tt0203<-topTags(lrt0203, n=13000) write.table(tt0203,file="tt0203") [several other comparisons) I tried to understand whether this is a good way to analyse paired samples or is there a better way? Thanks, Jussi Salmi PhD computer Science Centre for Biotechnology, Turku, Finland > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C LC_TIME=fi_FI.UTF-8 [4] LC_COLLATE=fi_FI.UTF-8 LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=fi_FI.UTF-8 [7] LC_PAPER=fi_FI.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.4.2 limma_3.18.13 loaded via a namespace (and not attached): [1] tools_3.1.0 -- Jussi Salmi, PhD http://www.btk.fi/index.php?id=12&sort=&pid=282

• 2.0k views

ADD COMMENT • link updated 10.4 years ago by Jussi Salmi ▴ 10 • written 10.4 years ago by Jussi Salmi ▴ 20

score 0 · Answer 1 · 2014-05-08

Hi Jussi, On 5/8/2014 7:24 AM, Jussi Salmi wrote: > Hello! > > > Thank you for the nice software and clear user guide. I am using EdgeR to analyse RNA-seq data. In the experiment there are several cell cultures with different knock-downs. The cell cultures are stimulated in two different ways. In the end, I want to compare the treated cultures to the untreated ones. The same untreated knock-down is compared to the same treated knock-down culture. I think that the study design produces paired comparisons because the same cultures are first used as the untreated ones and then they are treated and compared against the untreated. > > I have come up with the following code, based on the user guide: > > x<-read.delim("htseqout.edger", sep=" ",row.names="Symbol") > group<-factor(c(21,21,19,19,6,6,8,8,12,12,18,18,20,20,11,11,9,9,23... > y<-DGEList(counts=x,group=group) > design<-model.matrix(~0+group, data=y$samples) You want to include a factor for treatment as well. Something like trt <- factor(rep(1:2, length(group)/2)) design <- model.matrix(~group+trt) fit <- glmFit(y, design) lrt <- glmLRT(fit) topTags(lrt) Will give you the genes that change between treatments, after controlling for the paired nature of your experiment. Best, Jim > colnames(design)<-levels(y$samples$group) > y<-estimateGLMTrendedDisp(y,design) > y<-estimateGLMTagwiseDisp(y,design) > fit<-glmFit(y,design) > ?lrt0203<-glmLRT(fit,contrast=c(0,1,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0)) > tt0203<-topTags(lrt0203, n=13000) > write.table(tt0203,file="tt0203") > > [several other comparisons) > > I tried to understand whether this is a good way to analyse paired samples or is there a better way? > > Thanks, > > Jussi Salmi > PhD computer Science > Centre for Biotechnology, Turku, Finland > >> sessionInfo() > R version 3.1.0 (2014-04-10) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C LC_TIME=fi_FI.UTF-8 > [4] LC_COLLATE=fi_FI.UTF-8 LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=fi_FI.UTF-8 > [7] LC_PAPER=fi_FI.UTF-8 LC_NAME=C LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_3.4.2 limma_3.18.13 > > loaded via a namespace (and not attached): > [1] tools_3.1.0 > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

score 0 · Answer 2 · 2014-05-12

Thank you for the advice! There is a further problem, though. I got the error "Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset, : Design matrix not of full rank. The following coefficients not estimable: treatment2 treatment3" My code is now: x<-read.delim(".. group<-factor(c(21,19,6,8,12,18,20,11,9,... treatment<-factor(c(3,3,3,1,1,2,3,1,1,... y<-DGEList(counts=x,group=group) design<-model.matrix(~group+treatment, data=y$samples) y<-estimateGLMTrendedDisp(y,design) y<-estimateGLMTagwiseDisp(y,design) fit<-glmFit(y,design) lrt <- glmLRT(fit) tt<-topTags(lrt) write.table(tt,file="tt") I have been looking at this all day, but it seems I am too stupid to understand this. I have 72 samples in 24 groups (3 replicates in each group). There are 3 treatments, and factor "treatment"'s length is 72. For each sample I have given the group in factor "group". There are three treatments: no treatment (1), treatment A (2) and treatment B (3). In the design matrix there is an intercept column, then the 24 groups and finally two treatment columns. Jussi PhD, Computer Science University of Turku, Finland 08.05.2014 17:23, James W. MacDonald kirjoitti: > Hi Jussi, > > On 5/8/2014 7:24 AM, Jussi Salmi wrote: >> Hello! >> >> >> Thank you for the nice software and clear user guide. I am using EdgeR to analyse RNA-seq data. In the experiment there are several cell cultures with different knock-downs. The cell cultures are stimulated in two different ways. In the end, I want to compare the treated cultures to the untreated ones. The same untreated knock-down is compared to the same treated knock-down culture. I think that the study design produces paired comparisons because the same cultures are first used as the untreated ones and then they are treated and compared against the untreated. >> >> I have come up with the following code, based on the user guide: >> >> x<-read.delim("htseqout.edger", sep=" ",row.names="Symbol") >> group<-factor(c(21,21,19,19,6,6,8,8,12,12,18,18,20,20,11,11,9,9,23... >> y<-DGEList(counts=x,group=group) >> design<-model.matrix(~0+group, data=y$samples) > > You want to include a factor for treatment as well. Something like > > trt <- factor(rep(1:2, length(group)/2)) > > design <- model.matrix(~group+trt) > fit <- glmFit(y, design) > lrt <- glmLRT(fit) > topTags(lrt) > > Will give you the genes that change between treatments, after > controlling for the paired nature of your experiment. > > Best, > > Jim > > >> colnames(design)<-levels(y$samples$group) >> y<-estimateGLMTrendedDisp(y,design) >> y<-estimateGLMTagwiseDisp(y,design) >> fit<-glmFit(y,design) >> ?lrt0203<-glmLRT(fit,contrast=c(0,1,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0)) >> tt0203<-topTags(lrt0203, n=13000) >> write.table(tt0203,file="tt0203") >> >> [several other comparisons) >> >> I tried to understand whether this is a good way to analyse paired samples or is there a better way? >> >> Thanks, >> >> Jussi Salmi >> PhD computer Science >> Centre for Biotechnology, Turku, Finland >> >>> sessionInfo() >> R version 3.1.0 (2014-04-10) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C LC_TIME=fi_FI.UTF-8 >> [4] LC_COLLATE=fi_FI.UTF-8 LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=fi_FI.UTF-8 >> [7] LC_PAPER=fi_FI.UTF-8 LC_NAME=C LC_ADDRESS=C >> [10] LC_TELEPHONE=C LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] splines stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] edgeR_3.4.2 limma_3.18.13 >> >> loaded via a namespace (and not attached): >> [1] tools_3.1.0 >> > -- Jussi Salmi, PhD http://www.btk.fi/index.php?id=12&sort=&pid=282

score 0 · Answer 3 · 2014-05-12

Thank you for the advice! There is a further problem, though. I got the error "Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset, : Design matrix not of full rank. The following coefficients not estimable: treatment2 treatment3" My code is now: x<-read.delim(".. group<-factor(c(21,19,6,8,12,18,20,11,9,... treatment<-factor(c(3,3,3,1,1,2,3,1,1,... y<-DGEList(counts=x,group=group) design<-model.matrix(~group+treatment, data=y$samples) y<-estimateGLMTrendedDisp(y,design) y<-estimateGLMTagwiseDisp(y,design) fit<-glmFit(y,design) lrt <- glmLRT(fit) tt<-topTags(lrt) write.table(tt,file="tt") I have been looking at this all day, but it seems I am too stupid to understand this. I have 72 samples in 24 groups (3 replicates in each group). There are 3 treatments, and factor "treatment"'s length is 72. For each sample I have given the group in factor "group". There are three treatments: no treatment (1), treatment A (2) and treatment B (3). In the design matrix there is an intercept column, then the 24 groups and finally two treatment columns. Jussi PhD, Computer Science University of Turku, Finland 08.05.2014 17:23, James W. MacDonald kirjoitti: > Hi Jussi, > > On 5/8/2014 7:24 AM, Jussi Salmi wrote: >> Hello! >> >> >> Thank you for the nice software and clear user guide. I am using EdgeR to analyse RNA-seq data. In the experiment there are several cell cultures with different knock-downs. The cell cultures are stimulated in two different ways. In the end, I want to compare the treated cultures to the untreated ones. The same untreated knock-down is compared to the same treated knock-down culture. I think that the study design produces paired comparisons because the same cultures are first used as the untreated ones and then they are treated and compared against the untreated. >> >> I have come up with the following code, based on the user guide: >> >> x<-read.delim("htseqout.edger", sep=" ",row.names="Symbol") >> group<-factor(c(21,21,19,19,6,6,8,8,12,12,18,18,20,20,11,11,9,9,23... >> y<-DGEList(counts=x,group=group) >> design<-model.matrix(~0+group, data=y$samples) > > You want to include a factor for treatment as well. Something like > > trt <- factor(rep(1:2, length(group)/2)) > > design <- model.matrix(~group+trt) > fit <- glmFit(y, design) > lrt <- glmLRT(fit) > topTags(lrt) > > Will give you the genes that change between treatments, after > controlling for the paired nature of your experiment. > > Best, > > Jim > > >> colnames(design)<-levels(y$samples$group) >> y<-estimateGLMTrendedDisp(y,design) >> y<-estimateGLMTagwiseDisp(y,design) >> fit<-glmFit(y,design) >> ?lrt0203<-glmLRT(fit,contrast=c(0,1,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0)) >> tt0203<-topTags(lrt0203, n=13000) >> write.table(tt0203,file="tt0203") >> >> [several other comparisons) >> >> I tried to understand whether this is a good way to analyse paired samples or is there a better way? >> >> Thanks, >> >> Jussi Salmi >> PhD computer Science >> Centre for Biotechnology, Turku, Finland >> >>> sessionInfo() >> R version 3.1.0 (2014-04-10) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C LC_TIME=fi_FI.UTF-8 >> [4] LC_COLLATE=fi_FI.UTF-8 LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=fi_FI.UTF-8 >> [7] LC_PAPER=fi_FI.UTF-8 LC_NAME=C LC_ADDRESS=C >> [10] LC_TELEPHONE=C LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] splines stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] edgeR_3.4.2 limma_3.18.13 >> >> loaded via a namespace (and not attached): >> [1] tools_3.1.0 >> > -- Jussi Salmi, PhD http://www.btk.fi/index.php?id=12&sort=&pid=282