RE: Design matrix with multiple genotypes + quantified variables (+cor/regression)
0
0
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia
At 12:33 AM 24/08/2004, Matthew Hannah wrote: >Again, sorry for initially posting without to much investigation, but >lots on (haven't we all) and I was hoping someones experience could save >me alot of time. So heres an update. > >There are 2 basic questions - >1. Are the design and contrast matrices below correct? Is there a better >way to design it. My hypothesis is that treatment N - treatment A will >be similar between genotypes, but the genotypes will be different to >each other. I'm looking for the global treatment contrast, but don't >want the genotype differences getting in the way. Is this already taken >care of in the design below or does the design need to be different. ie: >is the lm contrast comparing (ConA, MutA, Mut2A) vs. (ConN, MutN, Mut2N) >OR averaging(ConA-ConN, MutA-MutN, Mut2A-Mut2N). > >2. How is it best to compare a variable to find genes that correlate to >it. I've done a fair bit on this now but still need some pointers. The >obvious thing to do was a genewise pearson, however, In 'Intro stats >with R' there is the statement - "The reader should be warned that there >are many incorrect uses of correlation coefficients, particularly when >they are used in regression-type settings". Well I'm duly warned but not >sure on what a regression-type setting is. Also it seems that regression >and pearson give the same result. > >For the correlation I used cor, and then it suggests to test that the >correlation is significantly different from zero using cor.test. From >comparing these it seems that there is a strict relationship between the >p-value and pearson coefficient that only varies with sample number (# >of arrays). The p-value just gives an indication of what pearson is >significant - but surely you don't need to get it for all genes as it >just seems to rely on sample #? > >So I then proceded with regression analysis using lm(). The output >values that appear to be useful are p-value and Rsquared. The former is >the same as from cor.test, and the later is the squared pearson >coefficient, which I've just discussed. Am I missing something, or is >there a better way? > >Finally as Limma uses lm functions can I do the regression using it, to >provide access to the other tools such as eBayes, classifyTests or >toptable. Or are they fundamentally different? Yes, you can do the regressions using limma. No, the approaches are not fundamentally different. Gordon >Thanks for your time, >Matt > > >-----Original Message----- >From: Matthew Hannah >Sent: Donnerstag, 19. August 2004 14:56 >To: 'bioconductor@stat.math.ethz.ch' >Subject: Design matrix with multiple genotypes + quantified variables > >Hi, > >After asking before this design and contrast matrix was suggested and it >worked well. But now it gets complicated? >2 genotypes - Con, Mut >2 treatments - A, N. >4 replicates > >treatments <- factor(c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4)) >design <- model.matrix(~ 0+treatments) >colnames(design) <- c("ConA","ConN","MutA","MutN") fit <- >lmFit(esetgcrma, design) > >cont.matrix <- makeContrasts(ConA-MutA, ConN-MutN, >Gen=(ConN+ConA-MutN-MutA)/2, ConA-ConN, MutA-MutN, >treatment=(ConA+MutA-ConN-MutN)/2,levels=design) >con.fit <- contrasts.fit(fit, cont.matrix) > >So what if I add a third genotype - Mut2? >Is it the obvious add treatments <- .....5,5,5,5,6,6,6,6)) and then for >the contrasts treatment=(ConA+MutA+Mut2A-ConN-MutN-Mut2N)/3) >Or am I misunderstanding how to design contrasts? Is there an easier way >of writing this when you have more genotypes? > >Also logically the lm is treating all samples as independent when they >are not, does this matter? Is it possible to fit the original lm using a >design taking genotype and treatment into account? Would this be a >better approach, especially as if you have more genotypes (eg:5-10). >What would the design matrix then look like? > >Finally, what if you have a quantified variable for each genotype like a >measure of growth before and after the treatment. Can you specify this >in anyway (in the design matrix?) so you take this into account during >the fit. I thought this was possible using lm or rlm, or am I confusing >something? Alternatively, does anyone have a different approach, such as >an efficient way of doing a gene-by-gene regression or correlation >analysis against the growth measure, and extracting the genes that >correlate best with the growth measure? > >Perhaps there is there a good (biologist simple?) book that would cover >design and contrast of lms, anyone know of one? > >Thanks again, >Matt > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor
Regression limma Regression limma • 930 views
ADD COMMENT

Login before adding your answer.

Traffic: 877 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6