Question: Visualise and remove batch effects in DESeq2
gravatar for Tash.
9 weeks ago by
Tash.0 wrote:

Hi there, 

I have 36 samples of which 23 (both cases and controls) of them turned out to have 60-70% rRNA contamination. The sequencing facility re-sequenced these 23 samples on more lanes  = more seq depth in the contaminated samples with the thought that the reads would after removal of rRNA sequences correspond to the amount of reads in the 13 non-contaminated samples. I know it isn't ideal to still have a large amount of rRNA sequences in there but now this is where I am at. 

The issue I have now is that I see a clear distinction on the PCA plot between the rRNA contaminated samples and the non-contaminated using: 

rld <- rlog(dds, blind=FALSE)

plotPCA(rld, intgroup=("condition"))
(pcaData <- plotPCA(rld, intgroup = c("condition"), returnData=TRUE))
percentVar <- round(100 * attr(pcaData, "percentVar"))
pcaData$samples <- c(rep("control", 8), rep("cases", 8))
pcaData$name <- as.factor(pcaData$name)

PCA all samples

The right group of samples are the contaminated lot. I understand that this grouping can be because of either the rRNA contamination or batch effect but I think it is more likely to be because of a batch effect as I don't see exactly the same pattern when I plotted it with the "old" contaminated samples. When I control for batch effect in my design (dds< DESeqDataSetFromMatrix(countData, colData, formula(~ batch +  ~ condition)) I do get more DEGs than when not controlling for it (56 DEGs) - however if I run a differential expression analysis on only the non-contaminated samples (4 controls and 9 cases) I get 109 DEGs - so I don't know if I should really be including the second batch. 

My question is now if there is anything else I can do in terms of trying to salvage these contaminated samples - can I control for this batch in any other way than including it in the design? I can try another tool I guess but first I would like to know if there anything else I can do in terms of visualising or controlling for this batch effect in DESeq2?

Thanks in advance. 






ADD COMMENTlink modified 9 weeks ago by Michael Love20k • written 9 weeks ago by Tash.0

You might consider looking into RUV ('RUVSeq' in bioconductor)  with 1 nuisance variable since you are trying to address one batch effect. I have had some success with this, but be careful as the RUV method can sometimes over normalize your data and remove the biological variance you are interested in. You should be able to tell if this is the case by looking at RLE plots before and after applying RUV. 

ADD REPLYlink written 8 weeks ago by harry.smith20
gravatar for Michael Love
9 weeks ago by
Michael Love20k
United States
Michael Love20k wrote:

I’d say the quality can’t be assessed necessarily by number of DE. Generally I’d go with more samples unless you have some external information to exclude those samples. Here you know that a large group of samples had rRNA contamination. I don’t have any strong preference or opinion on including those or not.

ADD COMMENTlink written 9 weeks ago by Michael Love20k

Thank you Michael. 

ADD REPLYlink written 9 weeks ago by Tash.0

If its alright to ask a follow up question (although this might be more appropriate for a different forum but i'll give it a try). You're mentioning you don't have a strong preference to include the samples with rRNA contamination, meaning that there probably isn't is any biological effect that can be masked by having high level of rRNA when sequencing? I have enough reads for DEA (around 20M) despite having ~ 60% rRNA contamination. 

ADD REPLYlink written 8 weeks ago by Tash.0

I guess I just don't know the answer. If the reads are simply lower for these samples, then the library size should take care of it. But I can't say from this thread a definite answer.

ADD REPLYlink written 8 weeks ago by Michael Love20k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 142 users visited in the last hour