Search
Question: how to design matrix by edgeR
0
6.5 years ago by
wang peter2.0k
wang peter2.0k wrote:
HELLO ALL i have a question to ask, how to design a matrix i have such samples 0h, 3 replicate and 3 control 6h, 4 replicate and 1 control 12h, 4 replicate and 1 control 18h, 4 replicate and 1 control 24h, 3 replicate and 3 control 36h, 3 replicate and 1 control 48h, 3 replicate and 1 control thank you very much -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839 at cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253
modified 6.5 years ago by Gordon Smyth35k • written 6.5 years ago by wang peter2.0k
0
6.5 years ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:

Dear Shan Gao,

There are many legitimate ways to define a design matrix. The best way depends on what questions you want to answer or what hypotheses you want to test.

It looks like you have 35 libraries in total (24 treatment samples and 11 control samples).  Is this right?  Are all the libraries biologically
indendent?

I assume that the control samples have received some sort of inactive treatment for the same time period as the active samples.

What is it that you want to test?  Do you want to find DE genes between treatment and control at each time?  Or do you want to find genes are DE at each time vs 0h, adjusting for control differences?

In any case, you need to create a factors Time and Treatment. There are of length equal to the number of samples (35).  Time takes values from 0 up to 48. Treatment takes two values, Treatment or Control, say.

If you can respond to these questions, then I'll give you more help with the design matrix and so on.

Best wishes
Gordon

Dear Gordon, thank you very much. i want to find genes are DE at each time vs 0h, adjusting for control differences? i know i should do such work: treatment=factor(c(rep('treated',24),rep('control',11))) time=factor( c('oh','oh','oh','6h','6h','6h','6h','12h', '12h','12h','12h','18h','18h','18h','18h','24h','24h','24h','36h','36h ','36h','48h','48h','48h','oh', 'oh','oh','6h','12h','18h','24h','36h','48h')) but i donot understand how to design the matrix? design <- model.matrix(~treatment+tissue)? design <- model.matrix(~treatment*tissue) do you have some reference to tell me how to use the model.matrix function thank you very much -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839 at cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253

Dear Shan Gao,

I always advise people to create a "targets file", laying out the treatments associated with each sample, rather than trying to create
factors on the fly in R.  This can often avoid problems that might otherwise not be noticed.

For example, the factors you've created below are of different lengths, and I suspect you mean "0h" rather than "oh".

Assuming you fix this, a quick way to create a design matrix would be:

time <- factor(time, levels=c("0h","6h","12h","18h","24h","36h","48h"))
design <- model.matrix(~time+time:treatment)

This will create a design matrix with 14 columns.  Coefficients 8-14 test for treatment effects at each time.

y <- estimateGLMCommonDisp(y, design)
y <- estimateGLMTrendedDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)
fit <- glmFit(y, design)

To test for treatment effect at time 0:

lrt0 <- glmLRT(y,fit,coef=8)
topTags(lrt0)

To test for treatment effect at time 1:

lrt1 <- glmLRT(y,fit,coef=9)
topTags(lrt1)

and so on.

I don't know of a good reference on using model.matrix(), except perhaps for the limma User's Guide.  A very brief introduction to using model formula in R is given in Section 11 of the manual "An Introduction to R" that comes with R.

Best wishes
Gordon

dear Dr.Gordon: thank you for your patient reply, but i still donot understand what problem in my factor design?i have 3 questions as below: the 1st question: my experiments involves two factors, time and treatment. so i define: treatment=factor(c(rep('treated',24),rep('control',11))) time=factor( c('0h','0h','0h','6h','6h','6h','6h','12h', '12h','12h','12h','18h','18h','18h','18h','24h','24h','24h','36h','36 h','36h','48h','48h','48h','0h','0h','0h','6h','12h','18h','24h','36h' ,'48h')) the 2nd question: i want to get DE cross time considersing control. and my model.matrix is design <- model.matrix(~time+treatment)? or design <- model.matrix(~treatment+time)? and yours is design <- model.matrix(~time+time:treatment) which one is correct? the third question: i donot do what is the difference between test by glmLRT and test by pairedwised test functions? To test for treatment effect at time 0: lrt0 <- glmLRT(y,fit,coef=8) topTags(lrt0) To test for treatment effect at time 1: lrt1 <- glmLRT(y,fit,coef=9) topTags(lrt1) thank you very much if possible, can you give me some detailed material of glmLRT or glmFIT now i can only read your paper published recently shan gao

Dear Shao Gao,

I don't really understand your questions, and I'm afraid I have no more time to give you tutorials at the moment.

I will just say that the formula ~time+treatment (that you did not mention in earlier emails) estimates the treatment effect averaged over all the different times.  It estimates just one average treatment effect.

The formula I suggested, ~time+time:treatment estimates a separate treatment effect for each time.  In other words, it estimates seven
different time-specific treatment effects, one for 0h, one for 6h, etc. It seems clear from the experimental design that this is what is
required.

If you find that using model formula in R is too obscure to understand, then here is a longer method that you might find more intuitive.  It
treats your experiment as a oneway layout, from which you can extract all the comparisons you want.  You create a combined factor to label all your samples.  Lets say you use "T16" to mean treatment at 16h and C16 to mean control at 16h, and so on.  So create a combined factor with something like:

Group <- factor(c("T0","T0",...,"C48"))

Then

Group <- relevel(Group, ref="T0")
design <- model.matrix(~0+Group)
colnames(design) <- levels(Group)

Fit the full model (after estimating the dispersions):

fit <- glmFit(y,design)

Then later on you can make any comparison you want.  Suppose you want test for a treatment effect at time 16, that is T16-C16.  You can ask for this by:

T16vsC16 <- makeContrasts(T16-C16, levels=design)
lrt <- glmFit(y,fit,contrast=as.vector(T16vsC16))

This gives exactly the same results as the approach I suggested in my last email.

You can use this for any comparison at all.  For example, is the treatment effect larger at 6h than at 0h?

   con <- makeContrasts((T6-C6)-(T0-C0), levels=design)
lrt <- glmFit(y,fit,contrast=as.vector(con))

I hope this gives you enough to go on with.  If not, I hope that
others

Best wishes
Gordon

Thanks very much, Dr. Smyth, I start to understand what the manual of R is saying in terms of model formulae, to be honest, that manual seems to be drafted for the user of Limma or other R experts instead of a biologist.