question about removing both known and unknown confounders
1
0
Entering edit mode
Lisa • 0
@2d183270
Last seen 7 weeks ago
United States

Hello,

I would like to use DESeq2 on ATAC-seq data from human disease and control individuals. I know based on PCA plot that age is a big confounder. So, I want to first correct for age as a covariate, and then use RUVseq on the age-corrected count set to identify other unknown confounders and batch effects. What's the best way to do it?

Should I do this below or something else? 1) Using design(ddsruv) <- ~ W1 + age + disease

dds <- DESeqDataSetFromMatrix(countData = cts,
colData = condition,
design = ~ age + disease)
dds$disease <- relevel(dds$disease, ref = "Control")

#DE analysis
dds<-DESeq(dds)
res<-results(dds)

#RUVseq
set <- newSeqExpressionSet(counts(dds))
idx  <- rowSums(counts(set) > 5) >= 2
set  <- set[idx, ]
set <- betweenLaneNormalization(set, which="upper")
not.sig <- rownames(res)[which(res$pvalue > .95)] empirical <- rownames(set)[ rownames(set) %in% not.sig ] set <- RUVg(set, empirical, k=1) pData(set) #re-run DEseq w/ RUV factors added ddsruv<-dds ddsruv$W1 <-set$W_1 design(ddsruv) <- ~ W1 + age + disease ddsruv<-DESeq(ddsruv) ddsruvClean <- ddsruv[which(mcols(ddsruv)$betaConv),]
resruv<-results(ddsruvClean)
summary(resruv)


Alternatively, should I just use a design ~W1 + disease (perhaps using k=2, and adding W2), because RUVseq would take into account the known age-related confounding as well.

Thanks, Lisa

DESeq2 RUVSeq • 184 views
0
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

If my understanding of RUV is correct, I think you should do a LRT in the first DESeq() run (the one that defines not.sig). I would use there:

dds<-DESeq(dds, test="LRT", reduced=~1)
res<-results(dds)


In this way, not.sig will avoid age-related genes. Make sense?

0
Entering edit mode

Thank you, I think that makes sense. And should I still use the original dds matrix after I do the RUVseq analysis? Is this correct?

#input matrix for DESeq2
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = condition,
design = ~age+disease)
dds$disease <- relevel(dds$disease, ref = "Control")

#DE analysis before RUVseq
dds_for_ruv<-DESeq(dds, test="LRT", reduced=~1)
res<-results(dds_for_ruv)
#RUVseq
library("RUVSeq")
set <- newSeqExpressionSet(counts(dds_for_ruv))
idx  <- rowSums(counts(set) > 5) >= 2
set  <- set[idx, ]
set <- betweenLaneNormalization(set, which="upper")
not.sig <- rownames(res)[which(res$pvalue > .2)] empirical <- rownames(set)[ rownames(set) %in% not.sig ] set <- RUVg(set, empirical, k=2) pData(set) #re-run DEseq with the original dds matrix (including age) w/ RUV factors added dds$W1 <-set$W_1 dds$W2 <-set$W_2 design(dds) <- ~ W1 + W2 + age + disease dds<-DESeq(dds) ddsruvClean <- dds[which(mcols(dds)$betaConv),]
resruv<-results(ddsruvClean, alpha=0.05)

0
Entering edit mode

This looks correct to me. You can also plot W1 and W2 against age and disease.