RE: Design matrix with multiple genotypes + quantified variables (+cor/regression)

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

At 12:33 AM 24/08/2004, Matthew Hannah wrote: >Again, sorry for initially posting without to much investigation, but >lots on (haven't we all) and I was hoping someones experience could save >me alot of time. So heres an update. > >There are 2 basic questions - >1. Are the design and contrast matrices below correct? Is there a better >way to design it. My hypothesis is that treatment N - treatment A will >be similar between genotypes, but the genotypes will be different to >each other. I'm looking for the global treatment contrast, but don't >want the genotype differences getting in the way. Is this already taken >care of in the design below or does the design need to be different. ie: >is the lm contrast comparing (ConA, MutA, Mut2A) vs. (ConN, MutN, Mut2N) >OR averaging(ConA-ConN, MutA-MutN, Mut2A-Mut2N). > >2. How is it best to compare a variable to find genes that correlate to >it. I've done a fair bit on this now but still need some pointers. The >obvious thing to do was a genewise pearson, however, In 'Intro stats >with R' there is the statement - "The reader should be warned that there >are many incorrect uses of correlation coefficients, particularly when >they are used in regression-type settings". Well I'm duly warned but not >sure on what a regression-type setting is. Also it seems that regression >and pearson give the same result. > >For the correlation I used cor, and then it suggests to test that the >correlation is significantly different from zero using cor.test. From >comparing these it seems that there is a strict relationship between the >p-value and pearson coefficient that only varies with sample number (# >of arrays). The p-value just gives an indication of what pearson is >significant - but surely you don't need to get it for all genes as it >just seems to rely on sample #? > >So I then proceded with regression analysis using lm(). The output >values that appear to be useful are p-value and Rsquared. The former is >the same as from cor.test, and the later is the squared pearson >coefficient, which I've just discussed. Am I missing something, or is >there a better way? > >Finally as Limma uses lm functions can I do the regression using it, to >provide access to the other tools such as eBayes, classifyTests or >toptable. Or are they fundamentally different? Yes, you can do the regressions using limma. No, the approaches are not fundamentally different. Gordon >Thanks for your time, >Matt > > >-----Original Message----- >From: Matthew Hannah >Sent: Donnerstag, 19. August 2004 14:56 >To: 'bioconductor@stat.math.ethz.ch' >Subject: Design matrix with multiple genotypes + quantified variables > >Hi, > >After asking before this design and contrast matrix was suggested and it >worked well. But now it gets complicated? >2 genotypes - Con, Mut >2 treatments - A, N. >4 replicates > >treatments <- factor(c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4)) >design <- model.matrix(~ 0+treatments) >colnames(design) <- c("ConA","ConN","MutA","MutN") fit <- >lmFit(esetgcrma, design) > >cont.matrix <- makeContrasts(ConA-MutA, ConN-MutN, >Gen=(ConN+ConA-MutN-MutA)/2, ConA-ConN, MutA-MutN, >treatment=(ConA+MutA-ConN-MutN)/2,levels=design) >con.fit <- contrasts.fit(fit, cont.matrix) > >So what if I add a third genotype - Mut2? >Is it the obvious add treatments <- .....5,5,5,5,6,6,6,6)) and then for >the contrasts treatment=(ConA+MutA+Mut2A-ConN-MutN-Mut2N)/3) >Or am I misunderstanding how to design contrasts? Is there an easier way >of writing this when you have more genotypes? > >Also logically the lm is treating all samples as independent when they >are not, does this matter? Is it possible to fit the original lm using a >design taking genotype and treatment into account? Would this be a >better approach, especially as if you have more genotypes (eg:5-10). >What would the design matrix then look like? > >Finally, what if you have a quantified variable for each genotype like a >measure of growth before and after the treatment. Can you specify this >in anyway (in the design matrix?) so you take this into account during >the fit. I thought this was possible using lm or rlm, or am I confusing >something? Alternatively, does anyone have a different approach, such as >an efficient way of doing a gene-by-gene regression or correlation >analysis against the growth measure, and extracting the genes that >correlate best with the growth measure? > >Perhaps there is there a good (biologist simple?) book that would cover >design and contrast of lms, anyone know of one? > >Thanks again, >Matt > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor

Regression limma Regression limma • 930 views

ADD COMMENT • link 19.7 years ago Gordon Smyth 50k

Login before adding your answer.