Question

How to use Splatter to simulate a known number of DE genes in only two groups of cells ?

0

Entering edit mode

chenxofhit • 0

@chenxofhit-19781

Last seen 5.2 years ago

China/Changsha/CSU

I would like to simulate a dataset with two distinct groups of cells, and with the desired amount of DE genes if possible. In my case, I just want to test a method whether its performance is sensitive with a series of the increasing number of DE genes. For example, I would like to simulate a dataset of 500 cells and 5000 genes which has only two groups, with 150 DE genes and other non-DE genes. The number of non-DE genes is not necessarily equal to 5000 - 150 = 4850. According to the partition of genes, the dataset can be divided into two smaller datasets, which share the same cells. A serial number of DE genes, i.e., 50,60,70,...150 genes are gradually planted to the other dataset that only contains non-DE genes, to check whether the performance of a given method is sensitive to the varying DE genes.

My question is how to use splatter to solve this problem? If not, can I use other packages combined with splatter to get my purpose?

splatter Splatter • 1.2k views

ADD COMMENT • link updated 5.2 years ago by luke.zappia ▴ 50 • written 5.2 years ago by chenxofhit • 0

0

Entering edit mode

I’m removing the DESeq2 tag because this question doesn’t seem to involve DESeq2.

ADD REPLY • link 5.2 years ago Michael Love 41k

score 0 · Answer 1 · 2019-02-07

0

Entering edit mode

luke.zappia ▴ 50

@lukezappia-11973

Last seen 22 months ago

Germany

Hi chenxofhit

I'm not quite sure I understand the design you want but here is an example that might get us started.

library(splatter)

sim <- splatSimulate(nGenes = 5000,
                     batchCells = c(250, 250),
                     group.prob = c(0.5, 0.5),
                     de.prob = c(0.01, 0.01),
                     method = "groups")

This would generate a dataset with:

5000 genes
Two technical batches with 250 cells in each batch
Two groups (approximating cell types), each of which is equally likely (~250 cells in each, or ~125 from each batch)
Around 1 % of genes (~50) differentially expressed in each group

Is that something along the lines of what you are looking for? This is using the Splat simulation but there are several other simulations in Splatter that might be useful and of course more in other packages.

ADD COMMENT • link 5.2 years ago luke.zappia ▴ 50

0

Entering edit mode

Thanks for your instant answer. The example you have given is very clear!

Actually, I need to know which ~100 genes are DE genes, to continue my following experiments. I do not know whether it can be obtained with the Splatter package in this simulation case. Or, should I use other DE related packages such as DESeq2, which I have put in the tags of the question and then removed by dear Michael, to get which ~100 genes from these 5000 genes are DE genes?

I am looking forward to your reply.

ADD REPLY • link 5.2 years ago chenxofhit • 0

0

Entering edit mode

All the intermediate parameters are stored in various slots of the SingleCellExperiment object that is returned, I suggest you take a look at these to see what will be useful for you.

To find which genes are DE have a look at the DEFacGroupX columns in rowData(sim). These can be considered as foldchanges between each group and a fictional base cell, 1 is normal expression, < 1 is down-regulated, > 1 is up-regulated. To find DE genes between groups you need to look for differences in these factors.

P.S. The code above needs method = "groups" to simulate the groups I described, I will edit it to show this.