**2.0k**wrote:

**37k**• written 6.9 years ago by wang peter •

**2.0k**

Question: how to design matrix by edgeR

0

wang peter • **2.0k** wrote:

HELLO ALL
i have a question to ask, how to design a matrix
i have such samples
0h, 3 replicate and 3 control
6h, 4 replicate and 1 control
12h, 4 replicate and 1 control
18h, 4 replicate and 1 control
24h, 3 replicate and 3 control
36h, 3 replicate and 1 control
48h, 3 replicate and 1 control
thank you very much
--
shan gao
Room 231(Dr.Fei lab)
Boyce Thompson Institute
Cornell University
Tower Road, Ithaca, NY 14853-1801
Office phone: 1-607-254-1267(day)
Official email:sg839 at cornell.edu
Facebook:http://www.facebook.com/profile.php?id=100001986532253

ADD COMMENT
• link
•
modified 6.9 years ago
by
Gordon Smyth ♦ **37k**
•
written
6.9 years ago by
wang peter • **2.0k**

Answer: how to design matrix by edgeR

0

6.9 years ago by

Gordon Smyth ♦ **37k**

Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia

Gordon Smyth ♦ **37k** wrote:

Dear Shan Gao,

There are many legitimate ways to define a design matrix. The best way depends on what questions you want to answer or what hypotheses you want to test.

It looks like you have 35 libraries in total (24 treatment samples and 11 control samples). Is this right? Are all the libraries biologically

indendent?

I assume that the control samples have received some sort of inactive treatment for the same time period as the active samples.

What is it that you want to test? Do you want to find DE genes between treatment and control at each time? Or do you want to find genes are DE at each time vs 0h, adjusting for control differences?

In any case, you need to create a factors Time and Treatment. There are of length equal to the number of samples (35). Time takes values from 0 up to 48. Treatment takes two values, Treatment or Control, say.

If you can respond to these questions, then I'll give you more help with the design matrix and so on.

Best wishes

Gordon

Dear Gordon,
thank you very much. i want to find genes are DE at
each time vs 0h, adjusting for control differences?
i know i should do such work:
treatment=factor(c(rep('treated',24),rep('control',11)))
time=factor( c('oh','oh','oh','6h','6h','6h','6h','12h',
'12h','12h','12h','18h','18h','18h','18h','24h','24h','24h','36h','36h
','36h','48h','48h','48h','oh',
'oh','oh','6h','12h','18h','24h','36h','48h'))
but i donot understand how to design the matrix?
design <- model.matrix(~treatment+tissue)?
design <- model.matrix(~treatment*tissue)
do you have some reference to tell me how to use the model.matrix
function
thank you very much
--
shan gao
Room 231(Dr.Fei lab)
Boyce Thompson Institute
Cornell University
Tower Road, Ithaca, NY 14853-1801
Office phone: 1-607-254-1267(day)
Official email:sg839 at cornell.edu
Facebook:http://www.facebook.com/profile.php?id=100001986532253

Dear Shan Gao,

I always advise people to create a "targets file", laying out the treatments associated with each sample, rather than trying to create

factors on the fly in R. This can often avoid problems that might otherwise not be noticed.

For example, the factors you've created below are of different lengths, and I suspect you mean "0h" rather than "oh".

Assuming you fix this, a quick way to create a design matrix would be:

time <- factor(time, levels=c("0h","6h","12h","18h","24h","36h","48h")) design <- model.matrix(~time+time:treatment)

This will create a design matrix with 14 columns. Coefficients 8-14 test for treatment effects at each time.

y <- estimateGLMCommonDisp(y, design) y <- estimateGLMTrendedDisp(y, design) y <- estimateGLMTagwiseDisp(y, design) fit <- glmFit(y, design)

To test for treatment effect at time 0:

lrt0 <- glmLRT(y,fit,coef=8) topTags(lrt0)

To test for treatment effect at time 1:

lrt1 <- glmLRT(y,fit,coef=9) topTags(lrt1)

and so on.

I don't know of a good reference on using model.matrix(), except perhaps for the limma User's Guide. A very brief introduction to using model formula in R is given in Section 11 of the manual "An Introduction to R" that comes with R.

Best wishes

Gordon

dear Dr.Gordon:
thank you for your patient reply, but i still donot understand
what problem in my factor design?i have 3 questions as below:
the 1st question:
my experiments involves two factors, time and treatment.
so i define:
treatment=factor(c(rep('treated',24),rep('control',11)))
time=factor( c('0h','0h','0h','6h','6h','6h','6h','12h',
'12h','12h','12h','18h','18h','18h','18h','24h','24h','24h','36h','36
h','36h','48h','48h','48h','0h','0h','0h','6h','12h','18h','24h','36h'
,'48h'))
the 2nd question:
i want to get DE cross time considersing control.
and my model.matrix is
design <- model.matrix(~time+treatment)?
or design <- model.matrix(~treatment+time)?
and yours is design <- model.matrix(~time+time:treatment)
which one is correct?
the third question:
i donot do what is the difference between test by glmLRT
and test by pairedwised test functions?
To test for treatment effect at time 0:
lrt0 <- glmLRT(y,fit,coef=8)
topTags(lrt0)
To test for treatment effect at time 1:
lrt1 <- glmLRT(y,fit,coef=9)
topTags(lrt1)
thank you very much
if possible, can you give me some detailed material
of glmLRT or glmFIT
now i can only read your paper published recently
shan gao

Dear Shao Gao,

I don't really understand your questions, and I'm afraid I have no more time to give you tutorials at the moment.

I will just say that the formula ~time+treatment (that you did not mention in earlier emails) estimates the treatment effect averaged over all the different times. It estimates just one average treatment effect.

The formula I suggested, ~time+time:treatment estimates a separate treatment effect for each time. In other words, it estimates seven

different time-specific treatment effects, one for 0h, one for 6h, etc. It seems clear from the experimental design that this is what is

required.

If you find that using model formula in R is too obscure to understand, then here is a longer method that you might find more intuitive. It

treats your experiment as a oneway layout, from which you can extract all the comparisons you want. You create a combined factor to label all your samples. Lets say you use "T16" to mean treatment at 16h and C16 to mean control at 16h, and so on. So create a combined factor with something like:

Group <- factor(c("T0","T0",...,"C48"))

Then

Group <- relevel(Group, ref="T0") design <- model.matrix(~0+Group) colnames(design) <- levels(Group)

Fit the full model (after estimating the dispersions):

fit <- glmFit(y,design)

Then later on you can make any comparison you want. Suppose you want test for a treatment effect at time 16, that is T16-C16. You can ask for this by:

T16vsC16 <- makeContrasts(T16-C16, levels=design) lrt <- glmFit(y,fit,contrast=as.vector(T16vsC16))

This gives exactly the same results as the approach I suggested in my last email.

You can use this for any comparison at all. For example, is the treatment effect larger at 6h than at 0h?

con <- makeContrasts((T6-C6)-(T0-C0), levels=design) lrt <- glmFit(y,fit,contrast=as.vector(con))

I hope this gives you enough to go on with. If not, I hope that

others

can help you.

Best wishes

Gordon

Please log in to add an answer.

Use of this site constitutes acceptance of our User
Agreement
and Privacy
Policy.

Powered by Biostar
version 16.09

Traffic: 222 users visited in the last hour