Question: some help requested for constructing an appropriate design matrix in LIMMA
0
7.5 years ago by
steven segbroek20 wrote:
Dear R-users, I want to analyse a single channel micro array experiment which looks like the following: > targets File cellline fenotype 1 A 1 R 2 B 2 R 3 C 3 R 4 D 1 S 5 E 2 S 6 F 3 S There are three different cell lines, each of which comes in two versions. Every cell line has a variant that is resistant to a specific drug and another variant that is sensitive to this drug. We treated both variant of the three cell lines with this drug and then extracted RNA which was then hybridised to a micro array. The question we want to resolve is: which genes are differentially regulated between resistant (R) and sensitive (S) versions of these cell lines. There is quite some biological variation between the cell lines, so grouping them by fenotype and then searching for differentially regulated genes would be a bad idea. So, the idea is to to construct a model that accounts for this biological variation between the cell lines and looks which genes are consistently up or down regulated between resistant and sensitive versions of these three cell lines. I am a bit puzzled on how to setup an appropriate design matrix for this particular setup. I have come up with the following code: > design celline fenotR fenotS 1 1 1 0 2 2 1 0 3 3 1 0 4 1 0 1 5 2 0 1 6 3 0 1 attr(,"assign") [1] 1 2 2 attr(,"contrasts") attr(,"contrasts")$fenot [1] "contr.treatment" >block<-c(1,2,3,1,2,3) >eset<-exprs(BSData.log2.quantile) >cor<-duplicateCorrelation(eset, ndups=1, block=block, design=design) >fit <- lmFit(eset, design, block=block, cor=cor$consensus) >fit<- eBayes(fit) >cont.matrix<-makeContrasts(resvssens= fenotR - fenotS, levels=design) >fit2<-contrasts.fit(fit,cont.matrix) >fit2<-eBayes(fit2) >topTable(fit2) However, this code results in an adj.p-value that is "0.9999541" for every gene. Is there a better way to analyse this? Kind regards, Steven [[alternative HTML version deleted]]
• 1.1k views
modified 7.5 years ago by Belinda Phipson130 • written 7.5 years ago by steven segbroek20
Answer: some help requested for constructing an appropriate design matrix in LIMMA
0
7.5 years ago by
Belinda Phipson130 wrote:
Hi Steven You could just include cell line in your linear model rather than using duplicateCorrelation(). > design <- model.matrix(~factor(targets$cellline)+factor(targets$fenotype)) > fit <- lmFit(eset,design) > fit <- eBayes(fit) This will test R vs S taking into account cell line. You could also filter out lowly expressed genes across all samples to improve your power to detect differentially expressed genes as your sample size is quite small. Cheers, Belinda -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of steven segbroek Sent: Friday, 15 June 2012 2:13 AM To: bioconductor at r-project.org Subject: [BioC] some help requested for constructing an appropriate design matrix in LIMMA Dear R-users, I want to analyse a single channel micro array experiment which looks like the following: > targets File cellline fenotype 1 A 1 R 2 B 2 R 3 C 3 R 4 D 1 S 5 E 2 S 6 F 3 S There are three different cell lines, each of which comes in two versions. Every cell line has a variant that is resistant to a specific drug and another variant that is sensitive to this drug. We treated both variant of the three cell lines with this drug and then extracted RNA which was then hybridised to a micro array. The question we want to resolve is: which genes are differentially regulated between resistant (R) and sensitive (S) versions of these cell lines. There is quite some biological variation between the cell lines, so grouping them by fenotype and then searching for differentially regulated genes would be a bad idea. So, the idea is to to construct a model that accounts for this biological variation between the cell lines and looks which genes are consistently up or down regulated between resistant and sensitive versions of these three cell lines. I am a bit puzzled on how to setup an appropriate design matrix for this particular setup. I have come up with the following code: > design celline fenotR fenotS 1 1 1 0 2 2 1 0 3 3 1 0 4 1 0 1 5 2 0 1 6 3 0 1 attr(,"assign") [1] 1 2 2 attr(,"contrasts") attr(,"contrasts")$fenot [1] "contr.treatment" >block<-c(1,2,3,1,2,3) >eset<-exprs(BSData.log2.quantile) >cor<-duplicateCorrelation(eset, ndups=1, block=block, design=design) >fit <- lmFit(eset, design, block=block, cor=cor$consensus) >fit<- eBayes(fit) >cont.matrix<-makeContrasts(resvssens= fenotR - fenotS, levels=design) >fit2<-contrasts.fit(fit,cont.matrix) >fit2<-eBayes(fit2) >topTable(fit2) However, this code results in an adj.p-value that is "0.9999541" for every gene. Is there a better way to analyse this? Kind regards, Steven [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
Hi Belinda, Thank you for the suggestion. I filtered out the probes with very low expression values and then used your design matrix. Next I used: >myresults<-topTable(fit, coef=4, number=50000) This list gives me not a single significant differentially expressed probe after correction for mult hypothesis testing. When I used multi dimensional plotting on my dataset, I noticed that there was a very big biological difference between the cell lines (first dimension), but that the difference between resistent and sensitive (2nd dimension) are quite small. Could this be the cause of the relatively high p-values? Is there a way to correct for this? 2012/6/18 Belinda Phipson <phipson@wehi.edu.au> > Hi Steven > > You could just include cell line in your linear model rather than using > duplicateCorrelation(). > > > design <- > model.matrix(~factor(targets$cellline)+factor(targets$fenotype)) > > fit <- lmFit(eset,design) > > fit <- eBayes(fit) > > This will test R vs S taking into account cell line. You could also filter > out lowly expressed genes across all samples to improve your power to > detect > differentially expressed genes as your sample size is quite small. > > Cheers, > Belinda > > -----Original Message----- > From: bioconductor-bounces@r-project.org > [mailto:bioconductor-bounces@r-project.org] On Behalf Of steven segbroek > Sent: Friday, 15 June 2012 2:13 AM > To: bioconductor@r-project.org > Subject: [BioC] some help requested for constructing an appropriate design > matrix in LIMMA > > Dear R-users, > > I want to analyse a single channel micro array experiment which looks like > the following: > > > targets > File cellline fenotype > 1 A 1 R > 2 B 2 R > 3 C 3 R > 4 D 1 S > 5 E 2 S > 6 F 3 S > > There are three different cell lines, each of which comes in two versions. > > Every cell line has a variant that is resistant to a specific drug and > another variant that is sensitive to this drug. > > We treated both variant of the three cell lines with this drug and then > extracted RNA which was then hybridised to a micro array. > > The question we want to resolve is: which genes are differentially > regulated between resistant (R) and sensitive (S) versions of these cell > lines. > > There is quite some biological variation between the cell lines, so > grouping them by fenotype and then searching for differentially regulated > genes would be a bad idea. > > So, the idea is to to construct a model that accounts for this biological > variation between the cell lines and looks which genes are consistently up > or down regulated between resistant and sensitive versions of these three > cell lines. > > I am a bit puzzled on how to setup an appropriate design matrix for this > particular setup. > > I have come up with the following code: > > design > celline fenotR fenotS > 1 1 1 0 > 2 2 1 0 > 3 3 1 0 > 4 1 0 1 > 5 2 0 1 > 6 3 0 1 > attr(,"assign") > [1] 1 2 2 > attr(,"contrasts") > attr(,"contrasts")$fenot > [1] "contr.treatment" > > >block<-c(1,2,3,1,2,3) > >eset<-exprs(BSData.log2.quantile) > >cor<-duplicateCorrelation(eset, ndups=1, block=block, design=design) > >fit <- lmFit(eset, design, block=block, cor=cor$consensus) > >fit<- eBayes(fit) > >cont.matrix<-makeContrasts(resvssens= fenotR - fenotS, levels=design) > >fit2<-contrasts.fit(fit,cont.matrix) > >fit2<-eBayes(fit2) > >topTable(fit2) > > However, this code results in an adj.p-value that is "0.9999541" for every > gene. > > Is there a better way to analyse this? > > Kind regards, > Steven > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}
Hi Steven A common problem with small sample sizes! There are some things you can try: 1) You can try using a function called combat() in the sva package to remove the cell line effect. 2) You can have a look at what fit$df.prior is giving you. A larger value will result in more significant differential expression, and a small value will result in less differential expression. If you plot log(fit$sigma) on the y-axis and fit$Amean on the x-axis you may find some strange highly variable genes that may need to be filtered out. This can affect the estimate of the prior degrees of freedom. You may also decide to filter your genes based on a variance (fit$sigma) cut-off and re-run eBayes(). 3) You could try running eBayes with a trend: > fit <- eBayes(fit,trend=T) It may not make any difference, but you can try! Other than that have a closer look at the individual expression values of the top 10 genes, even if they're not significant, to determine whether there is a difference between R and S. They may still be worth following up. I find a barplot() helpful to visualise the individual sample values. Good luck! Cheers. Belinda > Hi Belinda, > > Thank you for the suggestion. > I filtered out the probes with very low expression values and then used > your design matrix. > Next I used: >>myresults<-topTable(fit, coef=4, number=50000) > > This list gives me not a single significant differentially expressed probe > after correction for mult hypothesis testing. > When I used multi dimensional plotting on my dataset, I noticed that there > was a very big biological difference between the cell lines (first > dimension), but that the difference between resistent and sensitive (2nd > dimension) are quite small. > Could this be the cause of the relatively high p-values? Is there a way to > correct for this? > > 2012/6/18 Belinda Phipson <phipson at="" wehi.edu.au=""> > >> Hi Steven >> >> You could just include cell line in your linear model rather than using >> duplicateCorrelation(). >> >> > design <- >> model.matrix(~factor(targets$cellline)+factor(targets$fenotype)) >> > fit <- lmFit(eset,design) >> > fit <- eBayes(fit) >> >> This will test R vs S taking into account cell line. You could also >> filter >> out lowly expressed genes across all samples to improve your power to >> detect >> differentially expressed genes as your sample size is quite small. >> >> Cheers, >> Belinda >> >> -----Original Message----- >> From: bioconductor-bounces at r-project.org >> [mailto:bioconductor-bounces at r-project.org] On Behalf Of steven segbroek >> Sent: Friday, 15 June 2012 2:13 AM >> To: bioconductor at r-project.org >> Subject: [BioC] some help requested for constructing an appropriate >> design >> matrix in LIMMA >> >> Dear R-users, >> >> I want to analyse a single channel micro array experiment which looks >> like >> the following: >> >> > targets >> File cellline fenotype >> 1 A 1 R >> 2 B 2 R >> 3 C 3 R >> 4 D 1 S >> 5 E 2 S >> 6 F 3 S >> >> There are three different cell lines, each of which comes in two >> versions. >> >> Every cell line has a variant that is resistant to a specific drug and >> another variant that is sensitive to this drug. >> >> We treated both variant of the three cell lines with this drug and then >> extracted RNA which was then hybridised to a micro array. >> >> The question we want to resolve is: which genes are differentially >> regulated between resistant (R) and sensitive (S) versions of these cell >> lines. >> >> There is quite some biological variation between the cell lines, so >> grouping them by fenotype and then searching for differentially >> regulated >> genes would be a bad idea. >> >> So, the idea is to to construct a model that accounts for this >> biological >> variation between the cell lines and looks which genes are consistently >> up >> or down regulated between resistant and sensitive versions of these >> three >> cell lines. >> >> I am a bit puzzled on how to setup an appropriate design matrix for this >> particular setup. >> >> I have come up with the following code: >> > design >> celline fenotR fenotS >> 1 1 1 0 >> 2 2 1 0 >> 3 3 1 0 >> 4 1 0 1 >> 5 2 0 1 >> 6 3 0 1 >> attr(,"assign") >> [1] 1 2 2 >> attr(,"contrasts") >> attr(,"contrasts")$fenot >> [1] "contr.treatment" >> >> >block<-c(1,2,3,1,2,3) >> >eset<-exprs(BSData.log2.quantile) >> >cor<-duplicateCorrelation(eset, ndups=1, block=block, design=design) >> >fit <- lmFit(eset, design, block=block, cor=cor$consensus) >> >fit<- eBayes(fit) >> >cont.matrix<-makeContrasts(resvssens= fenotR - fenotS, levels=design) >> >fit2<-contrasts.fit(fit,cont.matrix) >> >fit2<-eBayes(fit2) >> >topTable(fit2) >> >> However, this code results in an adj.p-value that is "0.9999541" for >> every >> gene. >> >> Is there a better way to analyse this? >> >> Kind regards, >> Steven >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for >> the >> addressee. >> You must not disclose, forward, print or use it without the permission >> of >> the sender. >> ______________________________________________________________________ >> > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}