is it necessary / recommended to separate samples when doing diff. exp. with DESeq2?
1
1
Entering edit mode
anton.kratz ▴ 60
@antonkratz-8836
Last seen 18 days ago
Japan, Tokyo, The Systems Biology Insti…

I have biological states called sample1, sample2 and negative control (NC) in duplicate. They are all from the same sequencing run.

I wrote R code using DESeq2 which generates all pairwise comparisons, i.e. sample1 vs NC, sample2 vs NC, sample1 vs sample2.

I have all raw reads in a file expr_table.csv, i.e. a file with seven columns.

I seems to me that when DESeq2 deals with one pairwise comparison, then the raw reads in the expression table of the state which is not considered somehow influences the result. In other words, it makes a difference if I compare sample1 to NC using read counts from a table containing only sample1 and NC, or doing the same comparison using read counts from a table which contains columns for sample1, sample2 and NC.

Which approach is to be preferred? For each test, should I make a new subset data frame and corresponding contrasts description which contains only the samples being compared?

Or is it recommended to put everything (all biological states) into one big data frame with all raw read counts?

P.S.: in actuality I have more more states than sample1, sample2, NC. There are sometime up to 12 or more states.

For reference, my contrasts definition for the above example is:

condition
sample1_rep1    sample1
sample1_rep2    sample1
sample2_rep1    sample2
sample2_rep2    sample2
negctrl_rep1    NC
negctrl_rep2    NC

And my R code is:

library("DESeq2")
library("ggplot2")
library("corrplot")

mycountdata <- read.delim("expr_table.csv", header = TRUE, sep = "\t")
mycoldata <- read.delim("contrast.txt", header = TRUE, sep = "\t")

# colnames(mycountdata) <- NULL
dds <- DESeqDataSetFromMatrix(countData = mycountdata, colData = mycoldata, design =~ condition)
dds <- DESeq(dds)

png(file="tmp/png/dispersion.png")
plotDispEsts(dds)
dev.off()

# Define the main test function
de_check <- function (this_dds, this_contrast, outfile) {
    res <- results(this_dds, contrast=this_contrast)
    write.table(as.data.frame(res),file=paste("tmp/tsv/", outfile, ".tsv", sep = ""), quote=FALSE, sep="\t")
    print(sum(res$padj < 0.1, na.rm=TRUE))
    print(sum(res$pvalue < 0.05, na.rm=TRUE))
    png(file=paste("tmp/png/", outfile, ".png", sep = ""))
    ylim_min <- min(as.data.frame(res)$log2FoldChange)
    ylim_max <- max(as.data.frame(res)$log2FoldChange)
    DESeq2::plotMA(res, ylim = c(ylim_min, ylim_max))
    dev.off()
}

mycombo <- combn(c("sample1", "sample2", "NC"),2)

myfunction <- function (x) {
    de_check(dds, c("condition",x[1],x[2]), paste(x[1],x[2],sep="_"))
    plotDispEsts(dds)
}

apply(mycombo, 2, myfunction)

deseq2 • 795 views
ADD COMMENT
3
Entering edit mode
@mikelove
Last seen 5 hours ago
United States

"I seems to me that when DESeq2 deals with one pairwise comparison, then the raw reads in the expression table of the state which is not considered somehow influences the result."

This is a frequently asked question in the vignette.

ADD COMMENT

Login before adding your answer.

Traffic: 737 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6