Question: How to use Splatter to simulate a known number of DE genes in only two groups of cells ?
0
7 months ago by
China/Changsha/CSU
chenxofhit0 wrote:

I would like to simulate a dataset with two distinct groups of cells, and with the desired amount of DE genes if possible. In my case, I just want to test a method whether its performance is sensitive with a series of the increasing number of DE genes. For example, I would like to simulate a dataset of 500 cells and 5000 genes which has only two groups, with 150 DE genes and other non-DE genes. The number of non-DE genes is not necessarily equal to 5000 - 150 = 4850. According to the partition of genes, the dataset can be divided into two smaller datasets, which share the same cells. A serial number of DE genes, i.e., 50,60,70,...150 genes are gradually planted to the other dataset that only contains non-DE genes, to check whether the performance of a given method is sensitive to the varying DE genes.

My question is how to use splatter to solve this problem? If not, can I use other packages combined with splatter to get my purpose?

splatter • 190 views
modified 7 months ago by luke.zappia50 • written 7 months ago by chenxofhit0

I’m removing the DESeq2 tag because this question doesn’t seem to involve DESeq2.

Answer: How to use Splatter to simulate a known number of DE genes in only two groups of
0
7 months ago by
luke.zappia50
luke.zappia50 wrote:

Hi chenxofhit

I'm not quite sure I understand the design you want but here is an example that might get us started.

library(splatter)

sim <- splatSimulate(nGenes = 5000,
batchCells = c(250, 250),
group.prob = c(0.5, 0.5),
de.prob = c(0.01, 0.01),
method = "groups")


This would generate a dataset with:

• 5000 genes
• Two technical batches with 250 cells in each batch
• Two groups (approximating cell types), each of which is equally likely (~250 cells in each, or ~125 from each batch)
• Around 1 % of genes (~50) differentially expressed in each group

Is that something along the lines of what you are looking for? This is using the Splat simulation but there are several other simulations in Splatter that might be useful and of course more in other packages.

Thanks for your instant answer. The example you have given is very clear!

Actually, I need to know which ~100 genes are DE genes, to continue my following experiments. I do not know whether it can be obtained with the Splatter package in this simulation case. Or, should I use other DE related packages such as DESeq2, which I have put in the tags of the question and then removed by dear Michael, to get which ~100 genes from these 5000 genes are DE genes?

All the intermediate parameters are stored in various slots of the SingleCellExperiment object that is returned, I suggest you take a look at these to see what will be useful for you.

To find which genes are DE have a look at the DEFacGroupX columns in rowData(sim). These can be considered as foldchanges between each group and a fictional base cell, 1 is normal expression, < 1 is down-regulated, > 1 is up-regulated. To find DE genes between groups you need to look for differences in these factors.

P.S. The code above needs method = "groups" to simulate the groups I described, I will edit it to show this.