Using RUVs with known batches
1
1
Entering edit mode
@frederik-ziebell-14676
Last seen 19 days ago
Heidelberg, Germany

I have a large (bulk) RNA-seq data set with ~1500 samples, i.e. ~30 multiplexing runs (library prep and sequencing) with in total ~500 different conditions in triplicates (conditions are somehow randomized across runs). The ultimate goal is to test all 500 conditions against the wildtype-controls (which are present in many but not all multiplexing runs), taking into account 1) the batch effect originating from the multiplexing runs and 2) an additional unwanted biological source of variation for which we have negative controls: Our cells are haploid by default but tend to become diploid, even the WT-controls. That's why we also have known haploid WT-controls and known diploid WT-controls.

Can I use RUVs with the known haploid and diploid WT-controls as negative controls to account for the diploidization effect and the multiplexing runs?

My approaches so far:

A) Running RUVs() with 'condition' as indicator for the replicate samples, while the known haploid and diploid WT-controls have the same condition:

RUVs(x=counts(dds), cIdx=rownames(dds), k=10, scIdx=makeGroups(dds$condition))

This results in the first 8 latent factors being correlated with multiplexing runs, and factors 9 and 10 nicely separating known haploid from known diploid WT-controls.

B) Running RUVs() on batch-corrected vst-transformed data

vsd <- vst(dds)
assay(vsd) %<>% limma::removeBatchEffect(vsd$run)
RUVs(x=assay(vsd), cIdx=rownames(vsd), k=10, scIdx=makeGroups(vsd$condition), isLog = TRUE)

Here, the first 6 latent factors separate a few strong phenotypes, while factor 7 captures the diploidization.

What is a good design for differential testing?

  • A) with design ~condition + run + factor 9 + factor10
  • A) with design ~condition + factor1 + ... + factor10
  • B) with design ~condition + run + factor7
  • something else
ruvseq deseq2 • 1.3k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 14 hours ago
United States

hi Frederik,

I don't have any specific recommendations from the DESeq2 side on whether it's better to have RUV detect the batches, or to remove them so that RUV detects variation on batch-corrected data. In either case, you will provide batch or something RUV detects that is highly correlated to batch in the design. Off the top of my head, I can't anticipate how these would differ in practice.

Given that you are at EMBL Heidelberg, you may want to connect with Wolfgang Huber's group or Bernd Klaus who can give some pointers on working with large scale RNA-seq datasets.

ADD COMMENT

Login before adding your answer.

Traffic: 493 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6