Extracting a subset of samples following SVA
Entering edit mode
Last seen 10 months ago

I am performing differential methylation analyses on different subgroups of a patient population. The data was assayed on Illumina EPIC arrays. We have considered using combat for the batch effects, or just blocking in limma as previously discussed on this forum.

I have a question however regarding SVA and a singularity issue with one particular subset. It seems that there is only 1 sample on several of the plates, and 1 sample in the wells. I tried merging the samples and running the SVA model again. The singularity error message remained. I have now been considering running the model + batch effects with SVA on the entire targets sample, and then extracting the subsets afterwards. Would this "fix" lead to false estimates in the downstream analyses? It seems I need a larger dataset in order to avoid those single samples on the plates and the wells.

Thanks for your input.

Combat SVA Limma • 224 views
Entering edit mode
Last seen 1 hour ago
Republic of Ireland

Hey, if you are implying that some of your batches have just a single sample and that there is evidence of a batch effect that involves these [batches], then there is not much that can be done. You should probably remove these samples from the study.

Over the years, I have noticed how many users are keen to adjust for what they assume to be batch or other unknown effects in their data. Do you have concrete evidence of a batch effect or is it just your intuition that there is some batch or other effect that must exist? The best strategy is obviously a solid experimental design, but we do not always have this, as we know.



  • there is in fact a very informative thread where another approach is suggested: https://support.bioconductor.org/p/109040/#109042
  • The approach suggested by Aaron using duplicateCorrelation() is actually one that I have recently employed in a study (2 days ago)
Entering edit mode

Hi Kevin,
Thank you for your response. The only evidence I have are two pronounced clusters in PCA plots and Scree plots that show 98% of the variability associated with dimension 1. At this point we are only considering batch effects from the EPIC array. I look forward to reading the approach suggested by Aaron using duplicateCorrelation. Were you satisfied with this option in your recent study? Best, Jonelle

Entering edit mode

Okay, I am convinced by 98% variation along PC1! Yes and no, with regard to the duplicateCorrelation approach: yes, because it allowed us to control for batch and Donor (Individual); however, I would have preferred a larger and differently-designed study.

Anyway, I trust that your analysis will go well.


Login before adding your answer.

Traffic: 551 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6