Search
Question: Controlling numerical covariates: differential gene expression with limma voom
0
gravatar for Onyi Ukay
13 days ago by
Onyi Ukay0
Onyi Ukay0 wrote:

Here is a snapshot of a data.frame, df

Subject | Cell_type | Condition | Viral_load
--------|-----------|-----------|------------
1       | A         | normal    | 10
1       | B         | normal    | 180
2       | A         | diseased  | 500 
2       | B         | diseased  | 90
3       | A         | normal    | 720
3       | B         | normal    | 63

I would like to find differentially expressed genes in the following comparisons:
      i. A.normal vs A.diseased
      ii. B.normal vs B.diseased
      iii. A.normal vs B.normal
      iv. A.diseased vs B.diseased

What is the right way to incorporate into my analysis an interactive design formula for Cell_type and Condition, while blocking for the Subject and controlling for Viral_load covariate?


My idea is to add a supplementary column to df called Group,

Subject | Cell_type | Condition | Viral_load | Group
--------|-----------|-----------|------------|-------
1       | A         | normal    | 10         | A.normal 
1       | B         | normal    | 180        | B.normal
2       | A         | diseased  | 500        | A.diseased
2       | B         | diseased  | 90         | B.diseased
3       | A         | normal    | 720        | A.normal
3       | B         | normal    | 63         | B.normal

Then run the following analysis:

design <- model.matrix(~ 0 + Group + Viral_load, data = df)
colnames(design) <- c(levels(df$Group), "Viral_load")
v <- voom(TMM, design = design)
dupcor <- duplicateCorrelation(v, design = v$design, block = df$Subject)
fit <- lmFit(v, block = df$Subject, correlation = dupcor$consensus.correlation)
cm <- makeContrasts(A.normal_vs_diseased = A.normal   - A.diseased,
                    B.normal_vs_diseased = B.normal   - B.diseased,
                    normal.A_vs_B        = A.normal   - B.normal,
                    diseased.A_vs_B      = A.diseased - B.diseased,
                    levels = v$design)
fit <- contrasts.fit(fit, cm)
fit <- eBayes(fit)

Have I accurately controlled for Viral_load as a covariate, blocked by Subject and extracted the appropriate contrasts?

ADD COMMENTlink modified 13 days ago by mikhael.manurung10 • written 13 days ago by Onyi Ukay0

Hi, I would really appreciate it if you upvote my answer if you find it helpful :)

ADD REPLYlink written 7 days ago by mikhael.manurung10
0
gravatar for mikhael.manurung
13 days ago by
mikhael.manurung10 wrote:

Hi Onyi,

Your design and contrast matrices are correct.

I am not sure about your variable naming convention, but please make sure the TMM input variable to your voom command is the expression set.

Last but not least, it is now recommended to do the duplicateCorrelation calculation twice each. Also keep in mind to include the block to your voom call! Here is an example:

v <- voom(TMM, design = design, block=df$Subject)
dupcor <- duplicateCorrelation(v, design = v$design, block = df$Subject)
v <- voom(TMM, design = design, correlation=dupcor$consensus.correlation, block=df$Subject)
dupcor <- duplicateCorrelation(v, design = v$design, block = df$Subject)

v <- voom(TMM, design = design, correlation=dupcor$consensus.correlation, block=df$Subject)

Further discussion on this can be seen at using duplicateCorrelation with limma+voom for RNA-seq data

I hope this answers your question.

Best, Mikhael

ADD COMMENTlink written 13 days ago by mikhael.manurung10

Thank you for your answer Mikhael. Could a limma developer, or someone with a higher reputation please affirm Mikhael's answer?

ADD REPLYlink written 13 days ago by Onyi Ukay0

I'm not technically a limma developer, but I hope my reputation is high enough for you.

As Mikhael said, your design matrices and contrasts are correct. I will just note that edgeR uses log-link GLMs, which means that your model assumes that log-expression scales linearly with viral load (i.e., raw expression increases exponentially with viral load). Perhaps a more reasonable relationship for most genes would be that log-expression scales linearly with log-viral load. Or you could be even more general and use a spline on the (log-)viral load, allowing you to block out any smooth relationship with viral load. All of this assumes, of course, that the viral load is not confounded with your conditions of interest, which will cause problems of varying levels to the different blocking strategies mentioned above.

The other thing I should point out is that the TMM should typically be a DGEList that has been run through calcNormFactors. If you give voom a raw matrix of counts, it won't do any normalization by default (aside from library size normalization when the log-CPMs are computed). You can set normalize.method but the available choices tend to be unnecessarily aggressive, being designed to deal with the horrors of microarrays.

ADD REPLYlink modified 12 days ago • written 12 days ago by Aaron Lun21k

Thanks for affirming Mikhael's answer Aaron.

TMM is indeed a DGEList, defined as:

dge  <- DGEList(counts=counts)
keep <- filterByExpr(dge, design, min.count=5) 
dge  <- dge[keep,,keep.lib.sizes=FALSE]
TMM  <- calcNormFactors(dge)

Thank you for also pointing out a different problem that I could have. On the suggested more general case, what exactly do you mean by "use a spline on the log-viral_load?" Does using a spline take care of the unknown relationship between log-cpm values and log-viral_load, or does it simply "block out any smooth relationship with viral load" [I assume you mean linear relationship here]?

 
ADD REPLYlink written 11 days ago by Onyi Ukay0

No, I meant any smooth relationship (depending on the number of d.f.). See section 9.6.2 of the limma user's guide.

ADD REPLYlink modified 10 days ago • written 10 days ago by Aaron Lun21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 376 users visited in the last hour