edgeR, multifactorial design
1
0
Entering edit mode
Mike Miller ▴ 70
@mike-miller-6388
Last seen 9.6 years ago
Dear EdgeR community, I am new to edgeR and still in the phase of reading the vignette in details to be able to use it for my data. I have a question in understanding the model.matrix. On page 27 (paragraph 3.3.2 "Nested interaction formulas"), the design is defined as: > targets Sample Treat Time 1 Sample1 Placebo 0h 2 Sample2 Placebo 0h 3 Sample3 Placebo 1h 4 Sample4 Placebo 1h 5 Sample5 Placebo 2h 6 Sample6 Placebo 2h 7 Sample1 Drug 0h 8 Sample2 Drug 0h 9 Sample3 Drug 1h 10 Sample4 Drug 1h 11 Sample5 Drug 2h 12 Sample6 Drug 2h targets$Treat <- relevel(targets$Treat, ref="Placebo") design <- model.matrix(~Treat + Treat:Time, data=targets) #and the coefficient names are: > colnames(design) [1] "(Intercept)" "TreatDrug" [3] "TreatPlacebo:Time1h" "TreatDrug:Time1h" [5] "TreatPlacebo:Time2h" "TreatDrug:Time2h" Whereas on page 28 (paragraph 3.3.4 "Interaction at any time") the design formula looks like this: #I added "2" in "design2" compared to original text for easier following: > design2 <- model.matrix(~Treat + Time + Treat:Time, data=targets) > colnames(design2) [1] "(Intercept)" "TreatDrug" [3] "Time1h" "Time2h" [5] "TreatDrug:Time1h" "TreatDrug:Time2h" It is explained that for the design2 (page 29 top): "The last two coefficients give the DrugvsPlacebo.1h and DrugvsPlacebo.2h contrasts, so that > lrt <- glmLRT(fit, coef=5:6) is useful because it detects genes that respond differently to the drug, relative to the placebo, at either of the times." My question is, if I understood it well, in design2, why there are no coefficients "TreatPlacebo:Time1h" and "TreatPlacebo:Time2h"? And should't "Time1h" and "Time2h" be effects of time, no matter of the Treat(ment), and not: "> lrt <- glmLRT(fit, coef=3) and > lrt <- glmLRT(fit, coef=4) are the e ffects of the reference drug, i.e., the effects of the placebo at 1 hour and 2 hours" as it is written in the vignette text? Thank you! ------------------------------------ Why I need edgeR: I have an RNASeq experiment (~30 samples), where I need to explore the influence of 3 factors with 2 levels each: 1. sex: f/m 2. disease_state:healthy/cancer 3. localization: blood/bones. Question I want to answer: which genes are differentially expressed between 2 localisations in 2 disease states (i.e. are bones more severely affected by cancer than blood) taking into account different sex? I assume that my design formula should look like: design=~sex+disease+localization+disease:localization Could anyone please tell me if the formula is correct? And, what should be the output? How could I know if the disease has different effects depending on the localization? By number of genes affected (=differentially expressed)? I would appreciate very much if someone has some time to help me with any of the questions. Best, Mike [[alternative HTML version deleted]]
RNASeq Cancer edgeR RNASeq Cancer edgeR • 1.9k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 3 hours ago
WEHI, Melbourne, Australia
Dear Mike, You are asking basic questions about interaction formula in R. Many non-statisticians find model formulas in R a bit confusing. It would be simpler and just as effective to take the alternative approach described in the section "Defining each treatment combination as a group" of the edgeR User's Guide. For your real experiment, you might combine disease state and localization into one factor (dis.loc) and use model.matrix(~sex+dis.loc) Best wishes Gordon > Date: Fri, 7 Feb 2014 14:34:04 +0100 > From: Mike Miller <mike.bioc32 at="" gmail.com=""> > To: <bioconductor at="" stat.math.ethz.ch=""> > Subject: [BioC] edgeR, multifactorial design > > Dear EdgeR community, > > > I am new to edgeR and still in the phase of reading the vignette in details > to be able to use it for my data. > I have a question in understanding the model.matrix. > On page 27 (paragraph 3.3.2 "Nested interaction formulas"), the design is > defined as: >> targets > Sample Treat Time > 1 Sample1 Placebo 0h > 2 Sample2 Placebo 0h > 3 Sample3 Placebo 1h > 4 Sample4 Placebo 1h > 5 Sample5 Placebo 2h > 6 Sample6 Placebo 2h > 7 Sample1 Drug 0h > 8 Sample2 Drug 0h > 9 Sample3 Drug 1h > 10 Sample4 Drug 1h > 11 Sample5 Drug 2h > 12 Sample6 Drug 2h > > targets$Treat <- relevel(targets$Treat, ref="Placebo") > > design <- model.matrix(~Treat + Treat:Time, data=targets) > > > #and the coefficient names are: >> colnames(design) > [1] "(Intercept)" "TreatDrug" > [3] "TreatPlacebo:Time1h" "TreatDrug:Time1h" > [5] "TreatPlacebo:Time2h" "TreatDrug:Time2h" > > Whereas on page 28 (paragraph 3.3.4 "Interaction at any time") the design > formula looks like this: > #I added "2" in "design2" compared to original text for easier following: >> design2 <- model.matrix(~Treat + Time + Treat:Time, data=targets) >> colnames(design2) > [1] "(Intercept)" "TreatDrug" > [3] "Time1h" "Time2h" > [5] "TreatDrug:Time1h" "TreatDrug:Time2h" > > It is explained that for the design2 (page 29 top): > "The last two coefficients give the DrugvsPlacebo.1h and DrugvsPlacebo.2h > contrasts, so that >> lrt <- glmLRT(fit, coef=5:6) > is useful because it detects genes that respond differently to the drug, > relative to the placebo, > at either of the times." > My question is, if I understood it well, in design2, why there are no > coefficients "TreatPlacebo:Time1h" and "TreatPlacebo:Time2h"? And should't > "Time1h" and "Time2h" be effects of time, no matter of the Treat(ment), and > not: > "> lrt <- glmLRT(fit, coef=3) > and >> lrt <- glmLRT(fit, coef=4) > are the e ffects of the reference drug, i.e., the effects of the placebo at > 1 hour and 2 hours" as it is written in the vignette text? > > Thank you! > ------------------------------------ > Why I need edgeR: I have an RNASeq experiment (~30 samples), where I need > to explore the influence of 3 factors with 2 levels each: > 1. sex: f/m > 2. disease_state:healthy/cancer > 3. localization: blood/bones. > Question I want to answer: which genes are differentially expressed between > 2 localisations in 2 disease states (i.e. are bones more severely affected > by cancer than blood) taking into account different sex? > I assume that my design formula should look like: > design=~sex+disease+localization+disease:localization > > Could anyone please tell me if the formula is correct? And, what should be > the output? How could I know if the disease has different effects depending > on the localization? By number of genes affected (=differentially > expressed)? > > I would appreciate very much if someone has some time to help me with any > of the questions. > Best, > Mike ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT

Login before adding your answer.

Traffic: 865 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6