Which formula should I use to set the best model matrix for my analysis ?
2
0
Entering edit mode
ZheFrench ▴ 60
@zhefrench-11689
Last seen 3 months ago
France

I 'm lost in the settings of the model.matrix for DESEQ2 or EDGER.

I have two biological replicates with a short timeline.

Treated Day1 * 2 // Treated Day 6 *2 // Treated Day 10 * 2

Untreated Day1 * 2 // Untreated Day 6 * 2 // Untreated Day 10 * 2

I want to compare each TimePoint - Treated vs all Untreated,but also all TimePoint between each others, for example Day1 vs Day10.

I'm not sure how to set the model matrix in DESEQ2 or EDGER.

I was wondering If I sould take account that the two biologicial replicates are linked in time. (BiolRep1 ->D0, D1,D6,D10 ,+ unT  , BiolRep2 -> D0,D1,D6,D10 + UnT) Or should I just compare the timepoints.

DESEQ2 : design <- model.matrix(~ TIMEPOINT )

DESEQ2 : design <- model.matrix(~ BIOLREP + TIMEPOINT ) # Time Point Adusted on biological replicates.

EDGER is more tricky on the formula ( my opinion )

EDGER : design <- model.matrix(~0 + TIMEPOINT ) # Intercept is a mystery for me , so I remove it with 0 in edgeR and i don't now if I could use it with DESEQ2.

EDGER : Don't see how to design as DESEQ2 with BIOLREP because then contrast will be hard to make. Input would be apreciate to design this formula and the contrasts to make then.

Thanks

DESEQ2 EDGER design • 2.9k views
ADD COMMENT
0
Entering edit mode

It's hard to see how edgeR can more tricky with the formula, since both packages use (or can use) the same model formulas.

ADD REPLY
4
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 44 minutes ago
The city by the bay

In other words, you have two replicate populations from which you have collected samples under various treatment conditions and/or at different timepoints:

timepoints <- rep(c(0, 1, 6, 10), 4)
treatment <- rep(c("treated", "untreated"), each=8)
population <- factor(rep(c(1,2,1,2), each=4))

I'm not entirely sure why your design doesn't have "treated day 0" or "untreated day 1", so I've just assumed that they're present; adjust the vectors above accordingly. My advice would be to use a simple additive model:

group <- paste0(treatment, ".", timepoints)
design <- model.matrix(~0 + group + population)

Then you can compare between pairs of treatment/time groups to your heart's content, e.g., between treated day 6 and untreated day 6, or between treated day 1 and treated day 10, and so on. The population blocking factor accounts for the differences between your two replicate populations and can be ignored when setting up contrasts.

ADD COMMENT
0
Entering edit mode

Does the order have an impact on the model ? 

Writting design <- model.matrix(~0 + population + group ) has not the same signifiance ? Do I need to set treatment <- relevel(treatment, ref="unTreated") before ?

ADD REPLY
0
Entering edit mode

The order of terms will affect the ease of interpretation of the coefficients; the way I've done it makes it easier to work with, so that's what I recommend. You don't have to do any re-leveling either.

ADD REPLY
0
Entering edit mode

I don't understand why when I print the design. The population1 doesn't appear. I only can see column population2.(even if I don't care about population...how it works?)

<font face="monospace">I see the different group. So I could do my contrast as follow : </font>

contrasts.matrix <- makeContrasts(
  day1Treated_vs_day0Treated = grouptreated.1-grouptreated.0,
  day6_Treated_vs_unt  = grouptreated.6-groupuntreated.6,
  levels=colnames(design)
  )

And last question I dont understand the use of ~0 to avoid intercept what does it mean biologically ?

ADD REPLY
0
Entering edit mode
  • population2 represents the log-fold change of population 2 over population 1. It doesn't make sense to have a population1 coefficient, because then it would just be the log-fold change of population 1 over itself.
  • Your contrasts look okay.
  • The ~0 just makes things easier because the coefficients represent average log-expression of each group. If your design contained an intercept, the coefficients would represent log-fold changes against a reference group (which would be the intercept). This requires some mental gymnastics to work with.
ADD REPLY
0
Entering edit mode

From which I read about Blocking (9.4.2) Blocking factor is set before group you compare , no ? https://www.bioconductor.org/packages/devel/bioc/vignettes/limma/inst/doc/usersguide.pdf

ADD REPLY
0
Entering edit mode

Doesn't really matter as long as you interpret the coefficients correctly.

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 13 hours ago
United States

If you want to make equivalent comparisons across software packages, you can use DESeq2 wherein you provide the model matrix itself to the 'full' argument of DESeq() and specify betaPrior=FALSE. This will produce equivalent coefficient names as with edgeR. Of course the two software packages still have different internal methods and will produce different results tables. But you can at least then use the same contrasts.

When you say "compare each TimePoint - Treated vs all Untreated", what exactly do you want to test? You only have both treated and untreated samples for day 6 and 10.

ADD COMMENT
0
Entering edit mode

I want to see if there is a change in gene expression early or late in the time line. Is there a difference in expression in treated cells versus untreated ? Are Day1 and Day6 states different ? So I'd like to compare  :

Treated cells at Day1 versus Untreated (both samples untreated Day1 and Day6)

Treated cells at Day6 versus all Untreated 

Treatment at Day6 vs Treatment at Day1

I'm not sure to follow you on using betaPrior=FALSE. My idea was to use DESEQ2 and EDGER and keep the genes who show differential expression in  both software......but setting the model correctly is a big step first...

ADD REPLY
0
Entering edit mode

In my opinion, using multiple DE methods and then taking the intersection of the DE genes is a recipe for being unnecessarily conservative rather than a recipe for improved quality.

I would advise you to to use one package or the other. Spend your time learning how to use one of the packages properly, instead of expending effort worrying about the differences.

ADD REPLY
0
Entering edit mode
I agree with Gordon. It's best to just stick to one rather than trying different testing methods. There will be a temptation to go for the results that confirm or are closest to pre-existing hypotheses.
ADD REPLY
0
Entering edit mode

I already spent time with both of them :-/ ... So I'd like to finally understand how to make them work to answer the same question even if results are differents. But I heard your point. So at least an explanation on how to set one method would be great !

Using this sampleTable to describe my data : https://github.com/ZheFrenchKitchen/pics/blob/master/design.png. Could you give me a hand on the design formula ? To compare each condition. Sorry If I repeat myself .

I 'd like to compare each group of condition. I don't know if it's better to take account of biological replicate (emt column here, what is called population by  Aaron Lun ) or just simply compare conditions between each other.

With DESEQ2 :

My first idea was to do something like this.( by the way, betaPrior = False , equals to ~0 in edgeR or no at All )?

dds <- DESeqDataSetFromMatrix(countData=countdata,colData=sampleTable, design = ~ emt + condition)
# I was thinking that it will adjust on emt.

# Compare then each group of condition , for example early vs late treated

res <- results(dds, contrast=c("condition","early_treated","late_treated")

But  Aaron Lun propose (in edgeR) something like design = ~ 0 + condition  + emt .  The new order disapointed me a little bit... Is this the best way to go ?

With EDGER :

dge <- DGEList(counts=countdata)
nb_samples <- length(colnames(dge))

cpm   <- cpm(dge) 

keep.exprs <- rowSums(cpm>1)>= nb_samples

dge <-dge[keep.exprs, keep.lib.sizes=FALSE]

dge <- calcNormFactors(dge)

emt <- as.factor(sampleTable$emt) 
dge$samples$emt <- emt

condition <- as.factor(sampleTable$condition) 
dge$samples$condition <- condition

design <- model.matrix(~0 + condition + emt )
v <- voom(dge,design,plot=TRUE)

vfit <- lmFit(v, design)

contrasts.matrix <- makeContrasts(
  early_vs_unT = conditionearly_treated-conditionunT,
  late_vs_unT  = conditionlate_treated-conditionunT,
  mant_vs_unT  = conditionmant-conditionunT,
  mant_vs_early = conditionmant-conditionearly_treated,
  mant_vs_late = conditionmant-conditionlate_treated,
  late_vs_early = conditionlate_treated-conditionearly_treated,
  levels=colnames(design)
  )

vfit <- contrasts.fit(vfit, contrasts.matrix)

efit <- eBayes(vfit)

res <- topTable(efit,coef="late_vs_early",sort.by="p",n="Inf")

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 755 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6