Question

Regressing effect of treatment on RNA-seq expected count from rsem.

0

Entering edit mode

hrishi27n ▴ 20

@hrishi27n-11821

Last seen 2.6 years ago

United States

Hello All,

I have RNA-seq data collected by sequencing around 30 individuals(similar phenotypes) and my goal is to group/cluster these patients based upon their expected counts obtained from rsem. Most of these patients are on a few different medications and some are untreated, my PCA shows a clear separation between these patients based upon their medication types. I was wondering if there was some way of regressing the medication effect, so that both medicated and unmedicated individuals look relatively consistent. Any input or suggestion is highly appreciated.

Thanks.

RNA rna-seq science bioinformatics • 1.9k views

ADD COMMENT • link updated 7.0 years ago by James W. MacDonald 65k • written 7.0 years ago by hrishi27n ▴ 20

score 3 · Accepted Answer · 2017-04-21

3

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 13 hours ago

United States

You will want to first convert those data to something more amenable (than counts) for clustering. Alternatives include voom or cpm in edgeR, or rlog or vst in DESeq2. I won't go into arguments for or against any of those choices other than to say that they exist.

Once you have converted using the tool of your choice, you could use either removeBatchEffect from limma, or you could use ComBat from sva to regress out the medication effect.

ADD COMMENT • link 7.0 years ago James W. MacDonald 65k

0

Entering edit mode

Thank you James.

ADD REPLY • link 7.0 years ago hrishi27n ▴ 20

0

Entering edit mode

James, so I used both removeBatchEffect and ComBat separately to see what worked better for me, it seems that removeBatchEffect regressed most of the medication effect. Just to be sure and to see if we can improve this, I am pasting my code snippet below. For removeBatchEffect is it necessary to provide a design matrix? considering the fact that I don't have any grouping and any other condition worth regressing?

 myDGElist <-DGEList(counts=CountFrame) # CountFrame is my rsem dataFrame
 myNormalized <- calcNormFactors(myDGElist)
 design <- model.matrix(~1, data=myPheno) # myPheno includes treatment and other information
  v <-voom(myNormalized)
  treatment <- myPheno$Medications
  combatProcess <- ComBat(dat=v$E,batch=treatment,design,par.prior=TRUE, prior.plots=FALSE)
   usinglimma <- removeBatchEffect(v$E, treatment)

ADD REPLY • link 7.0 years ago hrishi27n ▴ 20

0

Entering edit mode

From ?removeBatchEffect:

 design: optional design matrix relating to treatment conditions to be
          preserved

ADD REPLY • link 7.0 years ago James W. MacDonald 65k

0

Entering edit mode

James, thank you for responding.

My medication vector is something like below, after running removeBatchEffect it seems from the PCA that the medication effect is gone but the untreated points have also switched a little bit which I think should not have happened. Is there a way to prevent this from happening? (Also, does removeBatchEffect protects the biological variability?)

meds <-c("medication1","medication2",...."untreated","untreated","medicationX","untreated")

ADD REPLY • link 7.0 years ago hrishi27n ▴ 20

0

Entering edit mode

All removeBatchEffect does is regress out the mean expression for each of the batches you specify. It cannot 'protect' anything, as it doesn't know what should or should not be protected. If you have a poorly designed experiment, then removeBatchEffect may do things you don't like, which is why it is important to have a well designed experiment.

You can use a design matrix to attempt to preserve treatment conditions, but if the treatment conditions are confounded with the batch effects, then you have problems that no amount of statistical wizardry will be able to fix.

ADD REPLY • link 7.0 years ago James W. MacDonald 65k