Question

How to know what genes are differentially expression in DESeq2 simulation?

0

Entering edit mode

megapode32559 • 0

@megapode32559-23129

Last seen 5.7 years ago

I see this count matrix. But it is not clear what rows correspond to differentially expressed genes. Is there a way to know what genes are set as differentially expressed in the simulation? Thanks.

R> library(DESeq2)
R> dds <- makeExampleDESeqDataSet()
R> head(dds@assays@data@listData$counts)
     sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8
sample9 sample10 sample11 sample12
[1,]      58      92     164      76      54      79     106     104
   87       53       45       52
[2,]       8      17      21       6       7       9      28      13
   10       15       24       24
[3,]       0       0       0       0       1       0       0       0
    2        0        0        6
[4,]      77     112      58      91      42      61      81      45
   93       79       98      114
[5,]      34     119      48      29      17      44      67      28
   35       26       28       32
[6,]       9       4       0       0       0       3       0       0
    1        0        1        1

deseq2 • 735 views

ADD COMMENT • link updated 5.7 years ago by Kevin Blighe ★ 4.0k • written 5.7 years ago by megapode32559 • 0

score 0 · Answer 1 · 2020-03-18

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 10 days ago

United States

Check out the vignette from this section:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#differential-expression-analysis

Or the workflow from here:

http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#differential-expression-analysis

ADD COMMENT • link 5.7 years ago Michael Love 43k

0

Entering edit mode

I am not asking for the results of DESeq(). I am asking in the simulated data without going through DESeq() function. How to know what genes are set to differentially expressed during the simulation?

ADD REPLY • link 5.7 years ago megapode32559 • 0

0

Entering edit mode

These are 'random' negative binomial distributed integers, which simulate RNA-seq read counts. As their generation is random, the genes that are statistically significantly differentially expressed is also, therefore, random:

library(DESeq2)

subset(results(DESeq(makeExampleDESeqDataSet())), padj <= 0.05)
DataFrame with 0 rows and 6 columns

subset(results(DESeq(makeExampleDESeqDataSet())), padj <= 0.05)
DataFrame with 1 row and 6 columns
               baseMean   log2FoldChange            lfcSE             stat
              <numeric>        <numeric>        <numeric>        <numeric>
gene17 47.9143327226135 1.35538473381927 0.31715302071736 4.27359868984869
                     pvalue              padj
                  <numeric>         <numeric>
gene17 1.92343091564459e-05 0.019195840538133

subset(results(DESeq(makeExampleDESeqDataSet())), padj <= 0.05) 
DataFrame with 1 row and 6 columns
               baseMean   log2FoldChange            lfcSE             stat
              <numeric>        <numeric>        <numeric>        <numeric>
gene986 4.2586203423404 -5.5854115960524 1.22803660944642 -4.5482451851091
                      pvalue                padj
                   <numeric>           <numeric>
gene986 5.40950903472945e-06 0.00540409952569472

subset(results(DESeq(makeExampleDESeqDataSet())), padj <= 0.05)
DataFrame with 0 rows and 6 columns

The code:

makeExampleDESeqDataSet

function (n = 1000, m = 12, betaSD = 0, interceptMean = 4, interceptSD = 2, 
    dispMeanRel = function(x) 4/x + 0.1, sizeFactors = rep(1, 
        m)) 
{
    beta <- cbind(rnorm(n, interceptMean, interceptSD), rnorm(n, 
        0, betaSD))
    dispersion <- dispMeanRel(2^(beta[, 1]))
    colData <- DataFrame(condition = factor(rep(c("A", "B"), 
        times = c(ceiling(m/2), floor(m/2)))))
    x <- if (m > 1) {
        stats::model.matrix.default(~colData$condition)
    }
    else {
        cbind(rep(1, m), rep(0, m))
    }
    mu <- t(2^(x %*% t(beta)) * sizeFactors)
    countData <- matrix(rnbinom(m * n, mu = mu, size = 1/dispersion), 
        ncol = m)
    mode(countData) <- "integer"
    colnames(countData) <- paste("sample", 1:m, sep = "")
    rowRanges <- GRanges("1", IRanges(start = (1:n - 1) * 100 + 
        1, width = 100))
    names(rowRanges) <- paste0("gene", 1:n)
    design <- if (m > 1) {
        as.formula("~ condition", env = .GlobalEnv)
    }
    else {
        as.formula("~ 1", env = .GlobalEnv)
    }
    object <- DESeqDataSetFromMatrix(countData = countData, colData = colData, 
        design = design, rowRanges = rowRanges)
    trueVals <- DataFrame(trueIntercept = beta[, 1], trueBeta = beta[, 
        2], trueDisp = dispersion)
    mcols(trueVals) <- DataFrame(type = rep("input", ncol(trueVals)), 
        description = c("simulated intercept values", "simulated beta values", 
            "simulated dispersion values"))
    mcols(object) <- cbind(mcols(object), trueVals)
    return(object)
}

ADD REPLY • link 5.7 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

There is a column in mcols(dds) which gives the true effect size / LFC across the condition.

Note that betaSD can be specified as a vector, so you can use it to generate null NB genes if you want for the first X% of genes, etc.

ADD REPLY • link 5.7 years ago Michael Love 43k