question about removing both known and unknown confounders
1
0
Entering edit mode
Lisa • 0
@2d183270
Last seen 7 weeks ago
United States

Hello,

I would like to use DESeq2 on ATAC-seq data from human disease and control individuals. I know based on PCA plot that age is a big confounder. So, I want to first correct for age as a covariate, and then use RUVseq on the age-corrected count set to identify other unknown confounders and batch effects. What's the best way to do it?

Should I do this below or something else? 1) Using design(ddsruv) <- ~ W1 + age + disease

dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = condition,
                              design = ~ age + disease) 
dds$disease <- relevel(dds$disease, ref = "Control")

#DE analysis
dds<-DESeq(dds)
res<-results(dds)

#RUVseq
set <- newSeqExpressionSet(counts(dds))
idx  <- rowSums(counts(set) > 5) >= 2
set  <- set[idx, ]
set <- betweenLaneNormalization(set, which="upper")
not.sig <- rownames(res)[which(res$pvalue > .95)] 
empirical <- rownames(set)[ rownames(set) %in% not.sig ]
set <- RUVg(set, empirical, k=1) 
pData(set)

#re-run DEseq w/ RUV factors added
ddsruv<-dds
ddsruv$W1 <-set$W_1

design(ddsruv) <- ~ W1 + age + disease

ddsruv<-DESeq(ddsruv)
ddsruvClean <- ddsruv[which(mcols(ddsruv)$betaConv),]
resruv<-results(ddsruvClean)
summary(resruv)

Alternatively, should I just use a design ~W1 + disease (perhaps using k=2, and adding W2), because RUVseq would take into account the known age-related confounding as well.

Thanks, Lisa

DESeq2 RUVSeq • 184 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

If my understanding of RUV is correct, I think you should do a LRT in the first DESeq() run (the one that defines not.sig). I would use there:

dds<-DESeq(dds, test="LRT", reduced=~1)
res<-results(dds)

In this way, not.sig will avoid age-related genes. Make sense?

ADD COMMENT
0
Entering edit mode

Thank you, I think that makes sense. And should I still use the original dds matrix after I do the RUVseq analysis? Is this correct?

#input matrix for DESeq2
dds <- DESeqDataSetFromMatrix(countData = cts,
                          colData = condition,
                          design = ~age+disease)
dds$disease <- relevel(dds$disease, ref = "Control")

#DE analysis before RUVseq
dds_for_ruv<-DESeq(dds, test="LRT", reduced=~1)
res<-results(dds_for_ruv)
#RUVseq
library("RUVSeq")
set <- newSeqExpressionSet(counts(dds_for_ruv))
idx  <- rowSums(counts(set) > 5) >= 2
set  <- set[idx, ]
set <- betweenLaneNormalization(set, which="upper")
not.sig <- rownames(res)[which(res$pvalue > .2)] 
empirical <- rownames(set)[ rownames(set) %in% not.sig ]
set <- RUVg(set, empirical, k=2) 
pData(set)

#re-run DEseq with the original dds matrix (including age) w/ RUV factors added
dds$W1 <-set$W_1
dds$W2 <-set$W_2
design(dds) <- ~ W1 + W2 + age + disease
dds<-DESeq(dds)
ddsruvClean <- dds[which(mcols(dds)$betaConv),]
resruv<-results(ddsruvClean, alpha=0.05)
ADD REPLY
0
Entering edit mode

This looks correct to me. You can also plot W1 and W2 against age and disease.

ADD REPLY

Login before adding your answer.

Traffic: 466 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6