Question: How to use Splatter to simulate a known number of DE genes in only two groups of cells ?
0
gravatar for chenxofhit
7 months ago by
chenxofhit0
China/Changsha/CSU
chenxofhit0 wrote:

I would like to simulate a dataset with two distinct groups of cells, and with the desired amount of DE genes if possible. In my case, I just want to test a method whether its performance is sensitive with a series of the increasing number of DE genes. For example, I would like to simulate a dataset of 500 cells and 5000 genes which has only two groups, with 150 DE genes and other non-DE genes. The number of non-DE genes is not necessarily equal to 5000 - 150 = 4850. According to the partition of genes, the dataset can be divided into two smaller datasets, which share the same cells. A serial number of DE genes, i.e., 50,60,70,...150 genes are gradually planted to the other dataset that only contains non-DE genes, to check whether the performance of a given method is sensitive to the varying DE genes.

My question is how to use splatter to solve this problem? If not, can I use other packages combined with splatter to get my purpose?

splatter • 190 views
ADD COMMENTlink modified 7 months ago by luke.zappia50 • written 7 months ago by chenxofhit0

I’m removing the DESeq2 tag because this question doesn’t seem to involve DESeq2.

ADD REPLYlink written 7 months ago by Michael Love25k
Answer: How to use Splatter to simulate a known number of DE genes in only two groups of
0
gravatar for luke.zappia
7 months ago by
luke.zappia50
luke.zappia50 wrote:

Hi chenxofhit

I'm not quite sure I understand the design you want but here is an example that might get us started.

library(splatter)

sim <- splatSimulate(nGenes = 5000,
                     batchCells = c(250, 250),
                     group.prob = c(0.5, 0.5),
                     de.prob = c(0.01, 0.01),
                     method = "groups")

This would generate a dataset with:

  • 5000 genes
  • Two technical batches with 250 cells in each batch
  • Two groups (approximating cell types), each of which is equally likely (~250 cells in each, or ~125 from each batch)
  • Around 1 % of genes (~50) differentially expressed in each group

Is that something along the lines of what you are looking for? This is using the Splat simulation but there are several other simulations in Splatter that might be useful and of course more in other packages.

ADD COMMENTlink modified 7 months ago • written 7 months ago by luke.zappia50

Thanks for your instant answer. The example you have given is very clear!

Actually, I need to know which ~100 genes are DE genes, to continue my following experiments. I do not know whether it can be obtained with the Splatter package in this simulation case. Or, should I use other DE related packages such as DESeq2, which I have put in the tags of the question and then removed by dear Michael, to get which ~100 genes from these 5000 genes are DE genes?

I am looking forward to your reply.

ADD REPLYlink written 7 months ago by chenxofhit0

All the intermediate parameters are stored in various slots of the SingleCellExperiment object that is returned, I suggest you take a look at these to see what will be useful for you.

To find which genes are DE have a look at the DEFacGroupX columns in rowData(sim). These can be considered as foldchanges between each group and a fictional base cell, 1 is normal expression, < 1 is down-regulated, > 1 is up-regulated. To find DE genes between groups you need to look for differences in these factors.

P.S. The code above needs method = "groups" to simulate the groups I described, I will edit it to show this.

ADD REPLYlink written 7 months ago by luke.zappia50

Thank you @luke.zappia.

ADD REPLYlink written 7 months ago by chenxofhit0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 150 users visited in the last hour