I would like to use DESeq2 on ATAC-seq data from human disease and control individuals. I know based on PCA plot that age is a big confounder. So, I want to first correct for age as a covariate, and then use RUVseq on the age-corrected count set to identify other unknown confounders and batch effects. What's the best way to do it?
Should I do this below or something else? 1) Using design(ddsruv) <- ~ W1 + age + disease
dds <- DESeqDataSetFromMatrix(countData = cts, colData = condition, design = ~ age + disease) dds$disease <- relevel(dds$disease, ref = "Control") #DE analysis dds<-DESeq(dds) res<-results(dds) #RUVseq set <- newSeqExpressionSet(counts(dds)) idx <- rowSums(counts(set) > 5) >= 2 set <- set[idx, ] set <- betweenLaneNormalization(set, which="upper") not.sig <- rownames(res)[which(res$pvalue > .95)] empirical <- rownames(set)[ rownames(set) %in% not.sig ] set <- RUVg(set, empirical, k=1) pData(set) #re-run DEseq w/ RUV factors added ddsruv<-dds ddsruv$W1 <-set$W_1 design(ddsruv) <- ~ W1 + age + disease ddsruv<-DESeq(ddsruv) ddsruvClean <- ddsruv[which(mcols(ddsruv)$betaConv),] resruv<-results(ddsruvClean) summary(resruv)
Alternatively, should I just use a design ~W1 + disease (perhaps using k=2, and adding W2), because RUVseq would take into account the known age-related confounding as well.