Using RUVs on Deseq2 but clustering still seems wrong
Hello! Apologies if this question has been asked before. I am not very experienced in bioinformatics... I am analysing 15 samples, which include 3 biological replicates of the same cell line (A, B, C) and 5 different treatments (1,2,3,4,5). I am interested in the effect of treatment 1 (ctl) versus each of the other treatments. However, once I import my data into Deseq2 file and plot PCA, I can see the samples cluster by biological replicate rather than by treatment (even if I have both these factors in the degsign), so I wonder if this reflects a batch effect of the experiment itself. I tried using RUVs with k=3 (after several trials) and the samples cluster quite nicely by treatment. However when I apply these variants (W1 + W2 + W3) to the design and repeat the PCA the clustering in the PCA doesn't change much, neither does the heatmap of the most variable genes.. am I doing something wrong? Where/how will I see any effect of the RUV normalisation? What is the most appropriate way to show clustering if there is such a clear replicate bias? Thank you very much for your help! I really appreciate any feedback.

samples <- read.table(file.path(dir,"samplestest.txt"), header=TRUE, stringsAsFactors = TRUE) rownames(samples) <- samples$sample files <- file.path(dir,"salmon", paste(samples$sample, ".txt", sep='') ) names(files) <- samples$samples txi <- tximport(files, type="salmon", tx2gene=tx2gene, ignoreTxVersion = TRUE) reading in files with readtsv 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 transcripts missing from tx2gene: 13365 summarizing abundance summarizing counts summarizing length ddsTxi <- DESeqDataSetFromTximport(txi,colData = samples, design = ~ replicate + treatment) using counts and average transcript lengths from tximport dds <- DESeq(ddsTxi) estimating size factors using 'avgTxLength' from assays(dds), correcting for library size estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting model and testing dds <- dds[ rowSums(counts(dds)) > 10, ] rld <- rlog(dds, blind=FALSE) plotPCA(rld, intgroup=c("replicate")) PCA

counts <- counts(dds, normalized=FALSE) set <- newSeqExpressionSet(counts = counts, phenoData = samples) sets <- RUVs(set, unique(rownames(set)), k=3, differences) library(RColorBrewer) colors <- brewer.pal(3, "Set2") par(mfrow = c(1,2)) plotPCA(set, col=colors[samples$treatment], cex=1.2, main = 'without RUVs') plotPCA(sets, col=colors[samples$treatment], cex=1.2, main = 'with RUVs') PCA plot RUVs

ddsTxiruv <- DESeqDataSetFromTximport(txi, + colData = pData(sets), + design = ~ W1 + W2 + W_3 + treatment) using counts and average transcript lengths from tximport

ddsruv <- DESeq(ddsTxiruv) estimating size factors using 'avgTxLength' from assays(dds), correcting for library size estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting model and testing 104 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest nc <- counts(ddsruv, normalized=TRUE) filter <- rowSums(nc >= 50) >= 15 ddsruv <- ddsruv[filter,] ddsruv <- DESeq(ddsruv) using pre-existing normalization factors estimating dispersions found already estimated dispersions, replacing these gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting model and testing rldruv <- rlog(ddsruv, blind=FALSE) plotPCA(rldruv, intgroup=c("replicate")) PCA after RUV

topVarGenes <- head(order(rowVars(assay(rld)), decreasing = TRUE), 10) mat <- assay(rld)[ topVarGenes, ] anno <-[, c("treatment", "replicate")]) pheatmap(mat, annotationcol = anno, showrownames = F)

Heatmap rld

topVarGenes <- head(order(rowVars(assay(rldruv)), decreasing = TRUE), 10) mat <- assay(rldruv)[ topVarGenes, ] anno <-[, c("treatment", "replicate")]) pheatmap(mat, annotationcol = anno, show_rownames = F)

Heatmap rld_ruv

This is in fact asked a lot. It is also a Frequently Asked Question in the vignette.


