Question: Visualise and remove batch effects in DESeq2
0
8 months ago by
Tash.0
Tash.0 wrote:

Hi there,

I have 36 samples of which 23 (both cases and controls) of them turned out to have 60-70% rRNA contamination. The sequencing facility re-sequenced these 23 samples on more lanes  = more seq depth in the contaminated samples with the thought that the reads would after removal of rRNA sequences correspond to the amount of reads in the 13 non-contaminated samples. I know it isn't ideal to still have a large amount of rRNA sequences in there but now this is where I am at.

The issue I have now is that I see a clear distinction on the PCA plot between the rRNA contaminated samples and the non-contaminated using:

rld <- rlog(dds, blind=FALSE)

plotPCA(rld, intgroup=("condition"))
(pcaData <- plotPCA(rld, intgroup = c("condition"), returnData=TRUE))
percentVar <- round(100 * attr(pcaData, "percentVar"))
pcaData$samples <- c(rep("control", 8), rep("cases", 8)) pcaData$name <- as.factor(pcaData\$name)

The right group of samples are the contaminated lot. I understand that this grouping can be because of either the rRNA contamination or batch effect but I think it is more likely to be because of a batch effect as I don't see exactly the same pattern when I plotted it with the "old" contaminated samples. When I control for batch effect in my design (dds< DESeqDataSetFromMatrix(countData, colData, formula(~ batch +  ~ condition)) I do get more DEGs than when not controlling for it (56 DEGs) - however if I run a differential expression analysis on only the non-contaminated samples (4 controls and 9 cases) I get 109 DEGs - so I don't know if I should really be including the second batch.

My question is now if there is anything else I can do in terms of trying to salvage these contaminated samples - can I control for this batch in any other way than including it in the design? I can try another tool I guess but first I would like to know if there anything else I can do in terms of visualising or controlling for this batch effect in DESeq2?

rnaseq deseq2 • 400 views
modified 8 months ago by Michael Love23k • written 8 months ago by Tash.0
2

You might consider looking into RUV ('RUVSeq' in bioconductor)  with 1 nuisance variable since you are trying to address one batch effect. I have had some success with this, but be careful as the RUV method can sometimes over normalize your data and remove the biological variance you are interested in. You should be able to tell if this is the case by looking at RLE plots before and after applying RUV.

Answer: Visualise and remove batch effects in DESeq2
0
8 months ago by
Michael Love23k
United States
Michael Love23k wrote:

I’d say the quality can’t be assessed necessarily by number of DE. Generally I’d go with more samples unless you have some external information to exclude those samples. Here you know that a large group of samples had rRNA contamination. I don’t have any strong preference or opinion on including those or not.

Thank you Michael.

If its alright to ask a follow up question (although this might be more appropriate for a different forum but i'll give it a try). You're mentioning you don't have a strong preference to include the samples with rRNA contamination, meaning that there probably isn't is any biological effect that can be masked by having high level of rRNA when sequencing? I have enough reads for DEA (around 20M) despite having ~ 60% rRNA contamination.