Question: Using RUVs with known batches
0
14 days ago by
EMBL Heidelberg
Frederik Ziebell0 wrote:

I have a large (bulk) RNA-seq data set with ~1500 samples, i.e. ~30 multiplexing runs (library prep and sequencing) with in total ~500 different conditions in triplicates (conditions are somehow randomized across runs). The ultimate goal is to test all 500 conditions against the wildtype-controls (which are present in many but not all multiplexing runs), taking into account 1) the batch effect originating from the multiplexing runs and 2) an additional unwanted biological source of variation for which we have negative controls: Our cells are haploid by default but tend to become diploid, even the WT-controls. That's why we also have known haploid WT-controls and known diploid WT-controls.

Can I use RUVs with the known haploid and diploid WT-controls as negative controls to account for the diploidization effect and the multiplexing runs?

My approaches so far:

A) Running RUVs() with 'condition' as indicator for the replicate samples, while the known haploid and diploid WT-controls have the same condition:

RUVs(x=counts(dds), cIdx=rownames(dds), k=10, scIdx=makeGroups(dds$condition))  This results in the first 8 latent factors being correlated with multiplexing runs, and factors 9 and 10 nicely separating known haploid from known diploid WT-controls. B) Running RUVs() on batch-corrected vst-transformed data vsd <- vst(dds) assay(vsd) %<>% limma::removeBatchEffect(vsd$run)
RUVs(x=assay(vsd), cIdx=rownames(vsd), k=10, scIdx=makeGroups(vsd\$condition), isLog = TRUE)


Here, the first 6 latent factors separate a few strong phenotypes, while factor 7 captures the diploidization.

What is a good design for differential testing?

• A) with design ~condition + run + factor 9 + factor10
• A) with design ~condition + factor1 + ... + factor10
• B) with design ~condition + run + factor7
• something else
deseq2 ruvseq • 76 views
modified 13 days ago by Michael Love24k • written 14 days ago by Frederik Ziebell0
Answer: Using RUVs with known batches
0
13 days ago by
Michael Love24k
United States
Michael Love24k wrote:

hi Frederik,

I don't have any specific recommendations from the DESeq2 side on whether it's better to have RUV detect the batches, or to remove them so that RUV detects variation on batch-corrected data. In either case, you will provide batch or something RUV detects that is highly correlated to batch in the design. Off the top of my head, I can't anticipate how these would differ in practice.

Given that you are at EMBL Heidelberg, you may want to connect with Wolfgang Huber's group or Bernd Klaus who can give some pointers on working with large scale RNA-seq datasets.