Limma; matrix of microarray data and design matrix

0

Entering edit mode

john herbert ▴ 560

@john-herbert-4612

Last seen 9.6 years ago

Dear Bioconductors. I have a six column matrix of one colour array data (first 3 columns are case, second 3 are control), quantile normalized. I would like to do simple differential gene expression using limma. Is there a line or two of code that generates a simple design matrix for this scenario? I usually use a design matrix created from a targets file, and I never really understand lines like... design <- model.matrix(~0+f) (what is ~0+f)? [[alternative HTML version deleted]]

limma limma • 3.2k views

ADD COMMENT • link 12.9 years ago john herbert ▴ 560

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 8 hours ago

United States

Hi John, On 6/3/2011 9:20 AM, john herbert wrote: > Dear Bioconductors. > I have a six column matrix of one colour array data (first 3 columns are > case, second 3 are control), quantile normalized. > > I would like to do simple differential gene expression using limma. > > Is there a line or two of code that generates a simple design matrix for > this scenario? > > I usually use a design matrix created from a targets file, and I never > really understand lines like... design<- model.matrix(~0+f) (what is > ~0+f)? No idea what f is here (other than the obvious; it is a variable pointing to a set of factors). But constructing the design matrix is simple. f <- factor(rep(c("case","control"), each = 3)) # ;-D design <- model.matrix(~f) fit <- lmFit(<yourmatrix>, design) fit2 <- eBayes(fit) topTable(fit2, coef=2) -OR- design <- model.matrix(~0+f) fit <- lmFit(<yourmatrix>, design) fit2 <- contrasts.fit(fit, c(-1,1)) fit2 <- eBayes(fit2) topTable(fit2, coef=1) These results will be identical, except the signs will be flipped for your coefficients (and I would normally prefer the sign in the second case). It is worth your while to figure out why, and what the difference is between the two design matrices. Best, Jim > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 12.9 years ago James W. MacDonald 65k

0

Entering edit mode

john herbert ▴ 560

@john-herbert-4612

Last seen 9.6 years ago

Thank you James, that is very helpful. In terms of why, I am not sure at the moment. To be honest, I don't have any idea about the stats here. Take the tilde for instance. searching online finds. 1. In asymptotic notation<http: mathworld.wolfram.com="" asymptoticnotation.html=""> , [image: fâ¼phi] is used to mean that [image: f/phi->1]. 2. In statistics <http: en.wikipedia.org="" wiki="" statistics=""> and probability theory <http: en.wikipedia.org="" wiki="" probability_theory="">, â¹~âº means âis distributed asâ How does that fit in with ~f? 0 compared factor variables? On Fri, Jun 3, 2011 at 2:20 PM, john herbert <arraystruggles@gmail.com>wrote: > Dear Bioconductors. > I have a six column matrix of one colour array data (first 3 columns are > case, second 3 are control), quantile normalized. > > I would like to do simple differential gene expression using limma. > > Is there a line or two of code that generates a simple design matrix for > this scenario? > > I usually use a design matrix created from a targets file, and I never > really understand lines like... design <- model.matrix(~0+f) (what is > ~0+f)? > > [[alternative HTML version deleted]]

ADD COMMENT • link 12.9 years ago john herbert ▴ 560

0

Entering edit mode

Hi John, On 6/6/2011 6:47 AM, john herbert wrote: > Thank you James, that is very helpful. > In terms of why, I am not sure at the moment. > > To be honest, I don't have any idea about the stats here. > > Take the tilde for instance. searching online finds. > 1. In asymptotic notation<http: mathworld.wolfram.com="" asymptoticnotation.html=""> > , [image: f???phi] is used to mean that [image: f/phi->1]. > 2. In statistics<http: en.wikipedia.org="" wiki="" statistics=""> and probability > theory<http: en.wikipedia.org="" wiki="" probability_theory="">, ???~??? means ???is > distributed as??? > > How does that fit in with ~f? > 0 compared factor variables? No. The tilde has a different meaning within R, specifying the right hand side of a model equation. The default in R is to fit an intercept in all linear models (which in the context of ANOVA is better thought of as a 'baseline' sample, to which all other samples are compared). So when you do something like f <- factor(rep(c("A","B"), each = 3)) design <- model.matrix(~f) you are by default setting the 'A' samples as the baseline sample, and the second coefficient in the model is the B - A comparison. To eliminate the intercept, you add either a 0 or a -1 to the right hand side of the equation: design <- model.matrix(~0+f) which will then compute the average expression of the A and B samples separately, so you have to explicitly create a contrasts matrix in order to compute the B - A contrast. See the limmaUsersGuide, and ?formula for more information. You might also consider looking at Julian Faraway's excellent book on using R to fit linear models. This used to be a pdf he gave away for free, but is now published. However, some work with the googles might get you to the pdf if it is floating around on somebody's website. Best, Jim > > On Fri, Jun 3, 2011 at 2:20 PM, john herbert<arraystruggles at="" gmail.com="">wrote: > >> Dear Bioconductors. >> I have a six column matrix of one colour array data (first 3 columns are >> case, second 3 are control), quantile normalized. >> >> I would like to do simple differential gene expression using limma. >> >> Is there a line or two of code that generates a simple design matrix for >> this scenario? >> >> I usually use a design matrix created from a targets file, and I never >> really understand lines like... design<- model.matrix(~0+f) (what is >> ~0+f)? >> >> > > [[alternative HTML version deleted]] > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 -------------- next part -------------- ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD REPLY • link 12.9 years ago James W. MacDonald 65k

Login before adding your answer.