How to know what genes are differentially expression in DESeq2 simulation?
1
0
Entering edit mode
@megapode32559-23129
Last seen 4.7 years ago

I see this count matrix. But it is not clear what rows correspond to differentially expressed genes. Is there a way to know what genes are set as differentially expressed in the simulation? Thanks.

R> library(DESeq2)
R> dds <- makeExampleDESeqDataSet()
R> head(dds@assays@data@listData$counts)
     sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8
sample9 sample10 sample11 sample12
[1,]      58      92     164      76      54      79     106     104
   87       53       45       52
[2,]       8      17      21       6       7       9      28      13
   10       15       24       24
[3,]       0       0       0       0       1       0       0       0
    2        0        0        6
[4,]      77     112      58      91      42      61      81      45
   93       79       98      114
[5,]      34     119      48      29      17      44      67      28
   35       26       28       32
[6,]       9       4       0       0       0       3       0       0
    1        0        1        1
deseq2 • 561 views
ADD COMMENT
0
0
Entering edit mode

I am not asking for the results of DESeq(). I am asking in the simulated data without going through DESeq() function. How to know what genes are set to differentially expressed during the simulation?

ADD REPLY
0
Entering edit mode

These are 'random' negative binomial distributed integers, which simulate RNA-seq read counts. As their generation is random, the genes that are statistically significantly differentially expressed is also, therefore, random:

library(DESeq2)

subset(results(DESeq(makeExampleDESeqDataSet())), padj <= 0.05)
DataFrame with 0 rows and 6 columns

subset(results(DESeq(makeExampleDESeqDataSet())), padj <= 0.05)
DataFrame with 1 row and 6 columns
               baseMean   log2FoldChange            lfcSE             stat
              <numeric>        <numeric>        <numeric>        <numeric>
gene17 47.9143327226135 1.35538473381927 0.31715302071736 4.27359868984869
                     pvalue              padj
                  <numeric>         <numeric>
gene17 1.92343091564459e-05 0.019195840538133

subset(results(DESeq(makeExampleDESeqDataSet())), padj <= 0.05) 
DataFrame with 1 row and 6 columns
               baseMean   log2FoldChange            lfcSE             stat
              <numeric>        <numeric>        <numeric>        <numeric>
gene986 4.2586203423404 -5.5854115960524 1.22803660944642 -4.5482451851091
                      pvalue                padj
                   <numeric>           <numeric>
gene986 5.40950903472945e-06 0.00540409952569472

subset(results(DESeq(makeExampleDESeqDataSet())), padj <= 0.05)
DataFrame with 0 rows and 6 columns

The code:

makeExampleDESeqDataSet

function (n = 1000, m = 12, betaSD = 0, interceptMean = 4, interceptSD = 2, 
    dispMeanRel = function(x) 4/x + 0.1, sizeFactors = rep(1, 
        m)) 
{
    beta <- cbind(rnorm(n, interceptMean, interceptSD), rnorm(n, 
        0, betaSD))
    dispersion <- dispMeanRel(2^(beta[, 1]))
    colData <- DataFrame(condition = factor(rep(c("A", "B"), 
        times = c(ceiling(m/2), floor(m/2)))))
    x <- if (m > 1) {
        stats::model.matrix.default(~colData$condition)
    }
    else {
        cbind(rep(1, m), rep(0, m))
    }
    mu <- t(2^(x %*% t(beta)) * sizeFactors)
    countData <- matrix(rnbinom(m * n, mu = mu, size = 1/dispersion), 
        ncol = m)
    mode(countData) <- "integer"
    colnames(countData) <- paste("sample", 1:m, sep = "")
    rowRanges <- GRanges("1", IRanges(start = (1:n - 1) * 100 + 
        1, width = 100))
    names(rowRanges) <- paste0("gene", 1:n)
    design <- if (m > 1) {
        as.formula("~ condition", env = .GlobalEnv)
    }
    else {
        as.formula("~ 1", env = .GlobalEnv)
    }
    object <- DESeqDataSetFromMatrix(countData = countData, colData = colData, 
        design = design, rowRanges = rowRanges)
    trueVals <- DataFrame(trueIntercept = beta[, 1], trueBeta = beta[, 
        2], trueDisp = dispersion)
    mcols(trueVals) <- DataFrame(type = rep("input", ncol(trueVals)), 
        description = c("simulated intercept values", "simulated beta values", 
            "simulated dispersion values"))
    mcols(object) <- cbind(mcols(object), trueVals)
    return(object)
}
ADD REPLY
0
Entering edit mode

There is a column in mcols(dds) which gives the true effect size / LFC across the condition.

Note that betaSD can be specified as a vector, so you can use it to generate null NB genes if you want for the first X% of genes, etc.

ADD REPLY

Login before adding your answer.

Traffic: 854 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6