Question

Using RUVs with known batches

1

Entering edit mode

Frederik Ziebell ▴ 30

@frederik-ziebell-14676

Last seen 15 months ago

Heidelberg, Germany

I have a large (bulk) RNA-seq data set with ~1500 samples, i.e. ~30 multiplexing runs (library prep and sequencing) with in total ~500 different conditions in triplicates (conditions are somehow randomized across runs). The ultimate goal is to test all 500 conditions against the wildtype-controls (which are present in many but not all multiplexing runs), taking into account 1) the batch effect originating from the multiplexing runs and 2) an additional unwanted biological source of variation for which we have negative controls: Our cells are haploid by default but tend to become diploid, even the WT-controls. That's why we also have known haploid WT-controls and known diploid WT-controls.

Can I use RUVs with the known haploid and diploid WT-controls as negative controls to account for the diploidization effect and the multiplexing runs?

My approaches so far:

A) Running RUVs() with 'condition' as indicator for the replicate samples, while the known haploid and diploid WT-controls have the same condition:

RUVs(x=counts(dds), cIdx=rownames(dds), k=10, scIdx=makeGroups(dds$condition))

This results in the first 8 latent factors being correlated with multiplexing runs, and factors 9 and 10 nicely separating known haploid from known diploid WT-controls.

B) Running RUVs() on batch-corrected vst-transformed data

vsd <- vst(dds)
assay(vsd) %<>% limma::removeBatchEffect(vsd$run)
RUVs(x=assay(vsd), cIdx=rownames(vsd), k=10, scIdx=makeGroups(vsd$condition), isLog = TRUE)

Here, the first 6 latent factors separate a few strong phenotypes, while factor 7 captures the diploidization.

What is a good design for differential testing?

A) with design ~condition + run + factor 9 + factor10
A) with design ~condition + factor1 + ... + factor10
B) with design ~condition + run + factor7
something else

ruvseq deseq2 • 1.7k views

ADD COMMENT • link updated 5.9 years ago by Michael Love 43k • written 5.9 years ago by Frederik Ziebell ▴ 30

score 0 · Answer 1 · 2019-08-07

hi Frederik,

I don't have any specific recommendations from the DESeq2 side on whether it's better to have RUV detect the batches, or to remove them so that RUV detects variation on batch-corrected data. In either case, you will provide batch or something RUV detects that is highly correlated to batch in the design. Off the top of my head, I can't anticipate how these would differ in practice.

Given that you are at EMBL Heidelberg, you may want to connect with Wolfgang Huber's group or Bernd Klaus who can give some pointers on working with large scale RNA-seq datasets.