Search
Question: Generate sample data based on DESeq2 result
0
gravatar for bharata1803
20 months ago by
bharata180320
Japan
bharata180320 wrote:

Hello,

I want to generate a dummy sample data but I want to do it based on DESeq2 result.

Suppose, I have a gene, gene A and I have do DESeq2 analysis between cancer and normal. The result is, gene A logFold change is 2.5, which is upregulated in cancer with p-value 0.01. The baseMean is around 35.

What I want to do is to generate a dummy data for each category, cancer and normal which will have baseMean 35 and if it is caluclated back, it will give 2.5 log fold change with 0.01 p-value.

Is it possible to do that?

I imagine something like this. Because DESeq2 use negative binomial as distribution, I just need to make  random sampling with negative binomial distribution given the baseMean for normal 35 and for cancer 37.5 (35 + 2.5 fold Change). Is it possible? I don"t know though for the distribition variance. I don"t think it is written in the output of DESeq2 DEG analysis.

 

Thank you.

 

 

ADD COMMENTlink modified 20 months ago by Michael Love14k • written 20 months ago by bharata180320
0
gravatar for Michael Love
20 months ago by
Michael Love14k
United States
Michael Love14k wrote:

I can give some notes, but you're on your own for implementing this.

The fold change is multiplicative, so it would be 35 * 2.5.

Also note that the baseMean is not the level in normal samples, but it is the average over all samples. You can figure out the normalized counts for normal samples like so:

normMean <- rowMeans(counts(dds, normalized=TRUE)[, dds$condition == "normal"])

It is not easy to reproduce a p-value. Instead I would encourage you to focus on the dispersion values. You can pick mean, dispersion values and LFC from the observed distribution:

data <- cbind(norm=normMean, disp=dispersions(dds), lfc=res$log2FoldChange)

 

ADD COMMENTlink written 20 months ago by Michael Love14k

Thank you! I think the baseMean is in log format so I add the LFC. I'm not familiar with what is dispersion value. I need to check from the paper oncemore, but is it similar like variance? 

ADD REPLYlink modified 20 months ago • written 20 months ago by bharata180320
1

Sorry yes, I got confused because you had 2.5 as the "fold change" in the last sentence.

The base mean is not in log scale.

It should be: 

normMean * 2^log2FoldChange

Dispersion is the second parameter of the negative binomial. Yes you should check the paper again.

In R, the dispersion and mean can be used like so:

rnbinom(1, mu=mean, size=1/dispersion)
ADD REPLYlink written 20 months ago by Michael Love14k
Thank you so much. I will try your suggestion.
ADD REPLYlink written 20 months ago by bharata180320
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 148 users visited in the last hour