Highly variable gene analysis including RUVs batch correction
f_pardow



I am using DESeq2 to search for highly variable genes. My experiment includes cells from 3 different human donors, they were planned as biological replicates and all treated under the same condition. As I saw in my PCA, donors differ quite a lot on in their genetic background. For now, I would like to see the effects of my treatment, so I used RUVs to normalize donors for batch effects. Afterwards I include the k's in my dds design.

ddsRUVs <- DESeqDataSetFromMatrix(countData = counts(setRUVs),
                              colData = pData(setRUVs),
                              design = ~ W_1 + W_2 + W_3 + W_4 + W_5 + W_6 + treatment)

For analysis of highly variable genes, I have the problem that my rlog transformation does not include the batch correction and I find only genes specific to my donors as highly variable. Also with counts(ddsRUVs, normalize = TRUE) they are not included. Any ideas if it would be possible to do my highly variable gene analysis on RUVs batch corrected data?

Thanks a lot!

How many treatment conditions do you have? It is not clear if there is more than one.

I have 7 different treatments and one control (without treatment) per donor. So in total 21 samples.

tmms


If you have measured each condition in every patient, couldn't you just add patient as blocking factor to the model? Something like ~ patient + treatment. Then you could compare, for example, treatment1 vs treatment2. This has the advantage that you perform the comparison within each patient. You can look at page 39 of the edgeR UserGuide for more information (https://bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf)


