makeExampleDESeqDataSet simulation using NB model
1
0
Entering edit mode
maedakus ▴ 10
@maedakus-9484
Last seen 7.7 years ago

Hi, all

i am working in a pharmaceutical industry, mainly in charge of translational research in japan.

now i am planing to establish the simulation data via makeExampleDESeqDataSet.

when doing the simulation, the mathematical model of negative binomical model is applied.

but unfortunately, in details, i cannot identify the exact math equation.

for instance, i cannot understand what the item as below means.  

betaSD, interceptMean, interceptSD,dispMeanRel

if you know, would you mind telling me the mathematical model behind this makeExampleDESeqDataSet ?

thanks  in advance !

maeda in japan

 

deseq2 • 662 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 12 hours ago
United States

So, I'd first just note that this function is mostly for simple tests within the package. I would recommend for scientific work you used a published RNA-seq simulator such as Polyester.

http://bioinformatics.oxfordjournals.org/content/31/17/2778.abstract

http://bioconductor.org/packages/release/bioc/html/polyester.html

In the man page ?makeExampleDESeqDataSet you have some description:

Arguments:

       n: number of rows

       m: number of columns

  betaSD: the standard deviation for non-intercept betas, i.e. beta ~
          N(0,betaSD)

interceptMean: the mean of the intercept betas (log2 scale)

interceptSD: the standard deviation of the intercept betas (log2 scale)

dispMeanRel: a function specifying the relationship of the dispersions
          on ‘2^trueIntercept’

sizeFactors: multiplicative factors for each sample

 

But maybe even easier to understand what is going on is to look at the code, which you can (usually) obtain by typing the function name.

I highlight relevant lines with bold.

 

> makeExampleDESeqDataSet
function (n = 1000, m = 12, betaSD = 0, interceptMean = 4, interceptSD = 2, 
    dispMeanRel = function(x) 4/x + 0.1, sizeFactors = rep(1, 
        m)) 
{
    beta <- cbind(rnorm(n, interceptMean, interceptSD), rnorm(n, 
        0, betaSD))
    dispersion <- dispMeanRel(2^(beta[, 1]))
    colData <- DataFrame(condition = factor(rep(c("A", "B"), 
        times = c(ceiling(m/2), floor(m/2)))))
    x <- if (m > 1) {
        model.matrix(~colData$condition)
    }
    else {
        cbind(rep(1, m), rep(0, m))
    }
    mu <- t(2^(x %*% t(beta)) * sizeFactors)
    countData <- matrix(rnbinom(m * n, mu = mu, size = 1/dispersion), 
        ncol = m)
    mode(countData) <- "integer"
    colnames(countData) <- paste("sample", 1:m, sep = "")
    rowRanges <- GRanges("1", IRanges(start = (1:n - 1) * 100 + 
        1, width = 100))
    names(rowRanges) <- paste0("gene", 1:n)
    design <- if (m > 1) {
        as.formula("~ condition", env = .GlobalEnv)
    }
    else {
        as.formula("~ 1", env = .GlobalEnv)
    }
    object <- DESeqDataSetFromMatrix(countData = countData, colData = colData, 
        design = design, rowRanges = rowRanges)
    trueVals <- DataFrame(trueIntercept = beta[, 1], trueBeta = beta[, 
        2], trueDisp = dispersion)
    mcols(trueVals) <- DataFrame(type = rep("input", ncol(trueVals)), 
        description = c("simulated intercept values", "simulated beta values", 
            "simulated dispersion values"))
    mcols(object) <- cbind(mcols(object), trueVals)
    return(object)
}
<environment: namespace:DESeq2>

 

 

ADD COMMENT

Login before adding your answer.

Traffic: 616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6