Question

makeExampleDESeqDataSet simulation using NB model

0

Entering edit mode

maedakus ▴ 10

@maedakus-9484

Last seen 7.7 years ago

Hi, all

i am working in a pharmaceutical industry, mainly in charge of translational research in japan.

now i am planing to establish the simulation data via makeExampleDESeqDataSet.

when doing the simulation, the mathematical model of negative binomical model is applied.

but unfortunately, in details, i cannot identify the exact math equation.

for instance, i cannot understand what the item as below means.

betaSD, interceptMean, interceptSD,dispMeanRel

if you know, would you mind telling me the mathematical model behind this makeExampleDESeqDataSet ?

thanks in advance !

maeda in japan

deseq2 • 662 views

ADD COMMENT • link updated 8.3 years ago by Michael Love 41k • written 8.3 years ago by maedakus ▴ 10

score 1 · Answer 1 · 2016-02-02

So, I'd first just note that this function is mostly for simple tests within the package. I would recommend for scientific work you used a published RNA-seq simulator such as Polyester.

http://bioinformatics.oxfordjournals.org/content/31/17/2778.abstract

http://bioconductor.org/packages/release/bioc/html/polyester.html

In the man page ?makeExampleDESeqDataSet you have some description:

Arguments:

       n: number of rows

       m: number of columns

  betaSD: the standard deviation for non-intercept betas, i.e. beta ~
          N(0,betaSD)

interceptMean: the mean of the intercept betas (log2 scale)

interceptSD: the standard deviation of the intercept betas (log2 scale)

dispMeanRel: a function specifying the relationship of the dispersions
          on ‘2^trueIntercept’

sizeFactors: multiplicative factors for each sample

But maybe even easier to understand what is going on is to look at the code, which you can (usually) obtain by typing the function name.

I highlight relevant lines with bold.

> makeExampleDESeqDataSet
function (n = 1000, m = 12, betaSD = 0, interceptMean = 4, interceptSD = 2, 
    dispMeanRel = function(x) 4/x + 0.1, sizeFactors = rep(1, 
        m)) 
{
    beta <- cbind(rnorm(n, interceptMean, interceptSD), rnorm(n, 
        0, betaSD))
    dispersion <- dispMeanRel(2^(beta[, 1]))
    colData <- DataFrame(condition = factor(rep(c("A", "B"), 
        times = c(ceiling(m/2), floor(m/2)))))
    x <- if (m > 1) {
        model.matrix(~colData$condition)
    }
    else {
        cbind(rep(1, m), rep(0, m))
    }
    mu <- t(2^(x %*% t(beta)) * sizeFactors)
    countData <- matrix(rnbinom(m * n, mu = mu, size = 1/dispersion), 
        ncol = m)
    mode(countData) <- "integer"
    colnames(countData) <- paste("sample", 1:m, sep = "")
    rowRanges <- GRanges("1", IRanges(start = (1:n - 1) * 100 + 
        1, width = 100))
    names(rowRanges) <- paste0("gene", 1:n)
    design <- if (m > 1) {
        as.formula("~ condition", env = .GlobalEnv)
    }
    else {
        as.formula("~ 1", env = .GlobalEnv)
    }
    object <- DESeqDataSetFromMatrix(countData = countData, colData = colData, 
        design = design, rowRanges = rowRanges)
    trueVals <- DataFrame(trueIntercept = beta[, 1], trueBeta = beta[, 
        2], trueDisp = dispersion)
    mcols(trueVals) <- DataFrame(type = rep("input", ncol(trueVals)), 
        description = c("simulated intercept values", "simulated beta values", 
            "simulated dispersion values"))
    mcols(object) <- cbind(mcols(object), trueVals)
    return(object)
}
<environment: namespace:DESeq2>