Limma voom RNA-seq analysis with both categorical and continuous confounders
1
0
Entering edit mode
@prathap1708-14714
Last seen 4 months ago
Duke NUS medical school, Singapore

Hi,

I have few doubts while using design matrix and voom function in limma
1st doubt:
I'm working on RNA seq data to find out differentially expressed genes for a group where three conditions exist. I would like to include gender, age and one of the control gene expression as confounders in the design matrix so take out the effect of sex, age and control gene expression. How to model the equation when we have both categorical and continuous confounders?
2nd doubt:
when we do use model.matrix function. whether we have to consider intercept or not? How does it differs in results. Please find my code below

## Differential expression analysis
# with intercept
design.adipose=model.matrix(~0+AdiposeCPMCountGT1_min20percentsamples$samples$group.1+AdiposeCPMCountGT1_min20percentsamples$samples$Gender)
# With out intercept
design.adipose = model.matrix(~AdiposeCPMCountGT1_min20percentsamples$samples$group.1+AdiposeCPMCountGT1_min20percentsamples$samples$Gender)
# To rename column names
colnames(design.adipose)[1] = "MAO"
colnames(design.adipose)[2] = "MHNW"
colnames(design.adipose)[3] = "MHO"
colnames(design.adipose)[4] = "Gender"
# To do comparisons between groups
contr.matrix.Adipose = makeContrasts(MHOvsMHNW = MHO-MHNW,MAOvsMHNW = MAO-MHNW,MAOvsMHO = MAO-MHO,levels = colnames(design.adipose)[1:4])
# Voom function
v = voom(AdiposeDGE,design.adipose,lib.size = colSums(AdiposeDGE$counts)*AdiposeDGE$samples$norm.factors,normalize.method="quantile",plot = TRUE)

limma voom • 2.7k views
ADD COMMENT
3
Entering edit mode
@gordon-smyth
Last seen 6 hours ago
WEHI, Melbourne, Australia

There's no difficulty with including both categorical and continuous variables in the design matrix. For example, one might use

design <- model.matrix(~ A + x)

where A is a categorical factor and x is a continuous covariate.

It's up to you whether to include an intercept or not, the only difference it makes is to how you form the contrasts. See Section 9.2 of the limma User's Guide for a simple introduction to the underlying idea.

ADD COMMENT
0
Entering edit mode

Thanks a lot Smyth. As my data is now adjusted for covariates like age,sex and control gene expression. where will be this adjusted data was stored?  As per the R script mentioned below, it won't be v$E expression data. My doubt is , is it in vfit or efit?

# Usual limma pipeline
vfit <- lmFit(v, design.adipose)
vfit <- contrasts.fit(vfit, contrasts=contr.matrix.Adipose)
efit = eBayes(vfit)
ADD REPLY
0
Entering edit mode

It's not stored anywhere in the sense that you are thinking. These covariates are just accounted for inside the linear model.

If you want to explore what your data looks like when these covariates are accounted for, take a look at limma's removeBatchEffects function.

ADD REPLY
0
Entering edit mode

Thanks for that.

In removeBatchEffects function,

removeBatchEffect(x, batch=NULL, batch2=NULL, covariates=NULL, design=matrix(1,ncol(x),1), ...)

I have log cpm withTMM normalised data with covariates like sex, age and control gene expression. I don't have batch information. Could i keep batch argument as NULL? can i pass all my covariates into covariates argument?

Thanks in advance,
Pratap.

ADD REPLY
0
Entering edit mode

You do have batch information. For this purpose, gender is the batch and age and control gene expression are the covariates. The batch arguments are for categorical confounders and the covarate arguments are for continuous confounders.

However, you should not be using removeBatchEffect() because it is not the correct way to do anything you asked about in your original question.

ADD REPLY
0
Entering edit mode
Ok. I'm going to use removeBatchEffect() to retrieve data which will be adjusted for confounders like gender, age and control gene expression. Could it make sense?
ADD REPLY
0
Entering edit mode

I think you are mis-understanding the process somewhat. The statistical tests for differential expression have been adjusted for the confounders, but this does not require that the data itself be adjusted in any way. The tests are adjusted, not the data.

ADD REPLY
0
Entering edit mode

Thanks Gordan for your prompt reply. 

I have one more doubt. In my case, is the 1st coefficient is comparison of MHO with MHNW, 2nd coefficient is comparison  of  MAO with MHNW and 3rd coefficient is comparison of MAO with MHW . Please let me know, if I understood wrong. Please find my code below.
 

design.adipose = model.matrix(~AdiposeCPMCountGT1_min20percentsamples$samples$group.1+AdiposeCPMCountGT1_min20percentsamples$samples$Gender)

# To rename column names
colnames(design.adipose)[1] = "MAO"
colnames(design.adipose)[2] = "MHNW"
colnames(design.adipose)[3] = "MHO"
colnames(design.adipose)[4] = "Gender"
# To do comparisons between groups
contr.matrix.Adipose = makeContrasts(MHOvsMHNW = MHO-MHNW,MAOvsMHNW = MAO-MHNW,MAOvsMHO = MAO-MHO,levels = colnames(design.adipose)[1:4])
# Voom function
v = voom(AdiposeDGE,design.adipose,lib.size = colSums(AdiposeDGE$counts)*AdiposeDGE$samples$norm.factors,normalize.method="quantile",plot = TRUE)

 

ADD REPLY
0
Entering edit mode

You are now asking a new question about contrasts that was not included in your original post. It would better to post a new question about how to form contrasts rather than just tacking additional questions onto replies and comments.

ADD REPLY
0
Entering edit mode
Thanks gordon. I will post as new question.
ADD REPLY

Login before adding your answer.

Traffic: 447 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6