Differentially expressed genes detection between three classes with a batch effect in DESeq2
1
0
Entering edit mode
@francescotabaro-15672
Last seen 6.0 years ago

Hi,

I am trying to detect differentially expressed genes in a RNA-seq dataset affect by batch effect due to different isolation protocols used in the wet lab during dataset generation.

Below, a snippet of the code I used to model my data:

rna_dds <- DESeqDataSet(rna_se, design = ~ RNA.Isolation.method + Type)
rna_dds <- DESeq(rna_dds, test = "LRT",
                 fitType = "local",
                 full = design(rna_dds)
                 reduced = ~ RNA.Isolation.method,
                parallel = F,
                quiet = T)
rna_dds <- rna_dds[!rowRanges(rna_dds)$allZero,]

I can get results as follow:

rna_LTR   <- results(rna_dds, alpha = 1e-4)
rna_batch <- results(rna_dds, 
                     contrast = c("RNA.Isolation.method", "P1", "P2"), 
                     test = "Wald", 
                     lfcThreshold = .5, 
                     altHypothesis = "lessAbs", 
                     alpha = 0.05)
rna_trt1  <- results(rna_dds, 
                     contrast = c("Type", "trt1",   "control"), 
                     alpha = 1e-4, 
                     test = "Wald", 
                     lfcThreshold = 2)
rna_trt2  <- results(rna_dds, 
                     contrast = c("Type", "trt2", "control"), 
                     alpha = 1e-4, 
                     test = "Wald", 
                     lfcThreshold = 2)

If I understood correctly, now rna_LTR will contain the pvalues for differences in Type between the two models ~ RNA.Isolation.method + Type vs ~ RNA.Isolation.method, those should take into account the batch effect due to the extraction protocol. rna_batch on the other hand should contain pvalues for genes that are weakly affected by the extraction protocol (because I used an altHypothesis = lessAbs). rna_trt1 should contain pvalues (and log2FoldChanges) for the DEG between trt1 and control. rna_trt2 should contain pvalues (and log2FoldChanges) for the DEG between trt2 and control.

My question is, is it safe to use pvalues from rna_LTR and log2FoldChanges from rna_trt1 and rna_trt2 to detect differentially expressed genes? Should I take into account pvalues also from rna_batch? If yes, how? e.g.

res <- data.frame(
          trt1vctrl = rna_trt1$log2FoldChange, 
          trt2vctrl = rna_trt2$log2FoldChange, 
          LTR_padj  = rna_LTR$padj,
          row.names = rownames(rna_dds))

res <- res[!apply(res, 1, function (r) any( is.na( r))), ]

rna_res[ rna_res$LTR_padj < 1e-4 & 
         abs(rna_res$trt1vctrl) > 2 & 
         abs(rna_res$trt2vctrl) > 2, ]

Thanks for your answer! :)

deseq2 batch effect differential gene expression • 678 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 7 hours ago
United States

If you want log2 fold changes from the original object you can just use "name" to specify which LFC to extract. See the LRT section of ?results. E.g.:

results(dds, name="condition_C_vs_A")
results(dds, name="condition_B_vs_A")

If dds involved an LRT, then the p-values won't change between these tables, only the LFC.

Why do you need to use rna_batch at all? The design controls for batch differences.

ADD COMMENT
0
Entering edit mode

Hi Michael, Thanks for your answer! I generated that object (rna_batch) because I was interested in visualizing the genes affected by the batch effect and being able to report a p-value for the difference. I was looking for genes with consistent expression across batches.

I am aware that, when using LRT, p-values refer to the test between more than two variable levels, and that's why I used it instead of the standard Wald test. I am looking for genes that behave differently across sample types albeit the RNA isolation protocol used. So if I interpreted your answer correctly, filtering for LRT p-value and pairwise LFC will return what I am looking for. Right?

Also, from your previous answer a question arise, what's the difference between using "contrast" and "name"? I have been always thinking that they would extract the same things.

rna_LTR_trt1 <- results(rna_dds, alpha = 1e-4, contrast = c("Type", "trt1", "ctrl")) # this will return LFC for trt1 vs ctrl and pvalues for LRT across the three sample type

rna_LTR_trt1 <- results(rna_dds, alpha = 1e-4, name = "Type_trt1_vs_ctrl")) # wouldn't this return the same?

ADD REPLY
0
Entering edit mode

Yes, you are right those two will return the same. I wasn't sure why you were generating the additional Wald test p-values but I suppose you want to do something with those. I typically would just consider the set of genes which are significant by the LRT, and then work with the LFCs of the two comparisons directly.

ADD REPLY

Login before adding your answer.

Traffic: 691 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6