Question

is it necessary / recommended to separate samples when doing diff. exp. with DESeq2?

1

Entering edit mode

anton.kratz ▴ 60

@antonkratz-8836

Last seen 18 days ago

Japan, Tokyo, The Systems Biology Insti…

I have biological states called sample1, sample2 and negative control (NC) in duplicate. They are all from the same sequencing run.

I wrote R code using DESeq2 which generates all pairwise comparisons, i.e. sample1 vs NC, sample2 vs NC, sample1 vs sample2.

I have all raw reads in a file expr_table.csv, i.e. a file with seven columns.

I seems to me that when DESeq2 deals with one pairwise comparison, then the raw reads in the expression table of the state which is not considered somehow influences the result. In other words, it makes a difference if I compare sample1 to NC using read counts from a table containing only sample1 and NC, or doing the same comparison using read counts from a table which contains columns for sample1, sample2 and NC.

Which approach is to be preferred? For each test, should I make a new subset data frame and corresponding contrasts description which contains only the samples being compared?

Or is it recommended to put everything (all biological states) into one big data frame with all raw read counts?

P.S.: in actuality I have more more states than sample1, sample2, NC. There are sometime up to 12 or more states.

For reference, my contrasts definition for the above example is:

condition sample1_rep1 sample1 sample1_rep2 sample1 sample2_rep1 sample2 sample2_rep2 sample2 negctrl_rep1 NC negctrl_rep2 NC

And my R code is:

library("DESeq2")
library("ggplot2")
library("corrplot")

mycountdata <- read.delim("expr_table.csv", header = TRUE, sep = "\t")
mycoldata <- read.delim("contrast.txt", header = TRUE, sep = "\t")

# colnames(mycountdata) <- NULL
dds <- DESeqDataSetFromMatrix(countData = mycountdata, colData = mycoldata, design =~ condition)
dds <- DESeq(dds)

png(file="tmp/png/dispersion.png")
plotDispEsts(dds)
dev.off()

# Define the main test function
de_check <- function (this_dds, this_contrast, outfile) {
    res <- results(this_dds, contrast=this_contrast)
    write.table(as.data.frame(res),file=paste("tmp/tsv/", outfile, ".tsv", sep = ""), quote=FALSE, sep="\t")
    print(sum(res$padj < 0.1, na.rm=TRUE))
    print(sum(res$pvalue < 0.05, na.rm=TRUE))
    png(file=paste("tmp/png/", outfile, ".png", sep = ""))
    ylim_min <- min(as.data.frame(res)$log2FoldChange)
    ylim_max <- max(as.data.frame(res)$log2FoldChange)
    DESeq2::plotMA(res, ylim = c(ylim_min, ylim_max))
    dev.off()
}

mycombo <- combn(c("sample1", "sample2", "NC"),2)

myfunction <- function (x) {
    de_check(dds, c("condition",x[1],x[2]), paste(x[1],x[2],sep="_"))
    plotDispEsts(dds)
}

apply(mycombo, 2, myfunction)

deseq2 • 795 views

ADD COMMENT • link updated 6.8 years ago by Michael Love 41k • written 6.8 years ago by anton.kratz ▴ 60

score 3 · Accepted Answer · 2017-07-12

3

Entering edit mode

Michael Love 41k

@mikelove

Last seen 5 hours ago

United States

"I seems to me that when DESeq2 deals with one pairwise comparison, then the raw reads in the expression table of the state which is not considered somehow influences the result."

This is a frequently asked question in the vignette.

ADD COMMENT • link 6.8 years ago Michael Love 41k

0

Entering edit mode

Thank you Michael! Indeed it is there at https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#if-i-have-multiple-groups-should-i-run-all-together-or-split-into-pairs-of-groups

ADD REPLY • link 6.8 years ago anton.kratz ▴ 60