Question: Regressing effect of treatment on RNA-seq expected count from rsem.
gravatar for hrishi27n
18 months ago by
hrishi27n0 wrote:

Hello All,

I have RNA-seq data collected by sequencing around 30 individuals(similar phenotypes) and my goal is to  group/cluster these patients based upon their expected counts obtained from rsem.  Most of these patients are on a few different medications and some are untreated, my PCA shows a clear separation between these patients based upon their medication types. I was wondering if there was some way of regressing the medication effect, so that both medicated and unmedicated individuals look relatively consistent. Any input or suggestion is highly appreciated. 


ADD COMMENTlink modified 18 months ago by James W. MacDonald48k • written 18 months ago by hrishi27n0
gravatar for James W. MacDonald
18 months ago by
United States
James W. MacDonald48k wrote:

You will want to first convert those data to something more amenable (than counts) for clustering. Alternatives include voom or cpm in edgeR, or rlog or vst in DESeq2. I won't go into arguments for or against any of those choices other than to say that they exist.

Once you have converted using the tool of your choice, you could use either removeBatchEffect from limma, or you could use ComBat from sva to regress out the medication effect.

ADD COMMENTlink written 18 months ago by James W. MacDonald48k

Thank you James. 

ADD REPLYlink written 18 months ago by hrishi27n0

James, so I used both removeBatchEffect and ComBat separately to see what worked better for me, it seems that removeBatchEffect regressed most of the medication effect.  Just to be sure and to see if we can improve this, I am pasting my code snippet below.  For removeBatchEffect is it necessary to provide a design matrix? considering the fact that I don't have any grouping and any other condition worth regressing?            

 myDGElist <-DGEList(counts=CountFrame) # CountFrame is my rsem dataFrame
 myNormalized <- calcNormFactors(myDGElist)
 design <- model.matrix(~1, data=myPheno) # myPheno includes treatment and other information
  v <-voom(myNormalized)
  treatment <- myPheno$Medications
  combatProcess <- ComBat(dat=v$E,batch=treatment,design,par.prior=TRUE, prior.plots=FALSE)
   usinglimma <- removeBatchEffect(v$E, treatment)


ADD REPLYlink modified 18 months ago • written 18 months ago by hrishi27n0

From ?removeBatchEffect:

 design: optional design matrix relating to treatment conditions to be
ADD REPLYlink written 18 months ago by James W. MacDonald48k

James, thank you for responding. 

My medication vector is something like below, after running removeBatchEffect it seems from the PCA that the medication effect is gone but the untreated points have also switched a little bit which I think should not have happened. Is there a way to prevent this from happening? (Also, does removeBatchEffect protects the biological variability?)

meds <-c("medication1","medication2",...."untreated","untreated","medicationX","untreated")

ADD REPLYlink modified 18 months ago • written 18 months ago by hrishi27n0

All removeBatchEffect does is regress out the mean expression for each of the batches you specify. It cannot 'protect' anything, as it doesn't know what should or should not be protected. If you have a poorly designed experiment, then removeBatchEffect may do things you don't like, which is why it is important to have a well designed experiment.

You can use a design matrix to attempt to preserve treatment conditions, but if the treatment conditions are confounded with the batch effects, then you have problems that no amount of statistical wizardry will be able to fix.

ADD REPLYlink written 18 months ago by James W. MacDonald48k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 240 users visited in the last hour