RNA-seq read simulation for benchmarking differential expression/splicing variance
1
0
Entering edit mode
@endre-sebestyen-2707
Last seen 5 weeks ago
Hungary, Budapest

I was wondering if anybody knows of a package or some method to simulate RNA-seq sequencing reads from samples where we have difference in the variance of gene expression or more importantly, splicing as opposed to the classical difference in mean gene expression or mean splicing values.

I was thinking about getting read counts for transcripts using some of the distributions from the extraDistr package, and feeding these read counts to polyester.

Something like:

library("extraDistr")
library("polyester")

# different read counts simulated somehow
count <- c(10, 50, 100)

# different alpha values depending on uniform random splicing or a dominant transcript
alphas <- c(1, 2, 10)

# simulate read counts
count_random <- rdirmnom(10, count, alphas)

# run polyester with the simulated counts
simulate_experiment_countmat(fasta = NULL, gtf = NULL, seqpath = NULL,
  count_random, outdir = ".", paired = TRUE, seed = NULL, ...)

Anybody ever tried something similar? Are there tools for simulating reads with differential variance?

polyester simulation RNA-seq syntheticdata • 263 views
ADD COMMENT
0
Entering edit mode

Hey, polyester author here -- that's exactly why we made the function simulate_experiment_countmat! It's for generating reads with whatever kind of differential expression you want -- if you can make a matrix of it, you can simulate RNA-seq with that pattern.

When polyester was written, there wasn't a specific tool for simulating any kind of differential expression (means or variances), but it's been like six years, so something new may have come along in that time! I will let others chime in if they know of a better approach, but can confirm this is the type of situation we were thinking about when we wrote simulate_experiment_countmat.

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

You can simulate expression values (in the form of TPMs, not counts) from any distribution you like, then simulate sequence reads as if you were sequencing an RNA sample with transcripts in those abundances using Rsubread::simReads.

ADD COMMENT

Login before adding your answer.

Traffic: 267 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6