Question

Visualise and remove batch effects in DESeq2

0

Entering edit mode

Tash. • 0

@tash-17343

Last seen 2.8 years ago

United Kingdom

Hi there,

I have 36 samples of which 23 (both cases and controls) of them turned out to have 60-70% rRNA contamination. The sequencing facility re-sequenced these 23 samples on more lanes = more seq depth in the contaminated samples with the thought that the reads would after removal of rRNA sequences correspond to the amount of reads in the 13 non-contaminated samples. I know it isn't ideal to still have a large amount of rRNA sequences in there but now this is where I am at.

The issue I have now is that I see a clear distinction on the PCA plot between the rRNA contaminated samples and the non-contaminated using:

rld <- rlog(dds, blind=FALSE)

plotPCA(rld, intgroup=("condition"))
(pcaData <- plotPCA(rld, intgroup = c("condition"), returnData=TRUE))
percentVar <- round(100 * attr(pcaData, "percentVar"))
pcaData$samples <- c(rep("control", 8), rep("cases", 8))
pcaData$name <- as.factor(pcaData$name)

The right group of samples are the contaminated lot. I understand that this grouping can be because of either the rRNA contamination or batch effect but I think it is more likely to be because of a batch effect as I don't see exactly the same pattern when I plotted it with the "old" contaminated samples. When I control for batch effect in my design (dds< DESeqDataSetFromMatrix(countData, colData, formula(~ batch + ~ condition)) I do get more DEGs than when not controlling for it (56 DEGs) - however if I run a differential expression analysis on only the non-contaminated samples (4 controls and 9 cases) I get 109 DEGs - so I don't know if I should really be including the second batch.

My question is now if there is anything else I can do in terms of trying to salvage these contaminated samples - can I control for this batch in any other way than including it in the design? I can try another tool I guess but first I would like to know if there anything else I can do in terms of visualising or controlling for this batch effect in DESeq2?

Thanks in advance.

deseq2 rnaseq • 2.5k views

ADD COMMENT • link updated 5.6 years ago by Michael Love 41k • written 5.6 years ago by Tash. • 0

2

Entering edit mode

You might consider looking into RUV ('RUVSeq' in bioconductor) with 1 nuisance variable since you are trying to address one batch effect. I have had some success with this, but be careful as the RUV method can sometimes over normalize your data and remove the biological variance you are interested in. You should be able to tell if this is the case by looking at RLE plots before and after applying RUV.

ADD REPLY • link 5.6 years ago harry.smith ▴ 20

score 0 · Answer 1 · 2018-09-14

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 2 hours ago

United States

I’d say the quality can’t be assessed necessarily by number of DE. Generally I’d go with more samples unless you have some external information to exclude those samples. Here you know that a large group of samples had rRNA contamination. I don’t have any strong preference or opinion on including those or not.

ADD COMMENT • link 5.6 years ago Michael Love 41k

0

Entering edit mode

Thank you Michael.

ADD REPLY • link 5.6 years ago Tash. • 0

0

Entering edit mode

If its alright to ask a follow up question (although this might be more appropriate for a different forum but i'll give it a try). You're mentioning you don't have a strong preference to include the samples with rRNA contamination, meaning that there probably isn't is any biological effect that can be masked by having high level of rRNA when sequencing? I have enough reads for DEA (around 20M) despite having ~ 60% rRNA contamination.

ADD REPLY • link 5.6 years ago Tash. • 0

0

Entering edit mode

I guess I just don't know the answer. If the reads are simply lower for these samples, then the library size should take care of it. But I can't say from this thread a definite answer.

ADD REPLY • link 5.6 years ago Michael Love 41k