Hi,
I'm highly in need of simulating a count matrix. I have read the manual and vignette of Polyester several times but I find it very confusing. The problem I'm gonna solve using Polyester is that I have a count matrix of 4 replicates in 4 time points from a real experiment. This is a time course study. I have downloaded the FASTA and GTF files from http://www.gencodegenes.org/releases/27.html .To put it in a nutshell, I have a real count matrix, a genome reference FASTA file and a GTF file. Now I want to simulate a count matrix for DE analysis. I want this simulated count matrix to have 4 replicates in 4 time points as the real count matrix does. But the problem is I don't know which function or functions in Polyester to use. I'm feeling really confused.
simulate_experiment_countmat creates FASTA files containing RNA-seq reads simulated from provided transcripts, with optional
differential expression between two groups (designated via read count matrix)----> it's used for two groups while I need 4 dependent groups over time.
simulate_experiment_empirical creates fasta files representing reads from a two-group experiment, where abundances and differential
expression are estimated from a real data set-----> It's again used for two groups.
create_read_numbers generates a simulated data set (counts Data matrix) based on known model parameters----->model parameters should be known. So maybe I can use this function to estimate the parameters:
get_params function estimates the parameters of a zero inflated negative binomial distribution based on a
real count data set based on the method of moments. The function also returns a spline fit of log
mean to log size which can be used when generating new simulated data.
There is an example of simulating time course data in the vignette: https://github.com/alyssafrazee/polyester#simulate_experiment_countmat-example but I still feel confused.
1- Even if I could use these functions, how would I know which genes are DE in the simulated count matrix?
2- How can I specify my replicates and groups in my real count matrix when I give it to Polyester as input and where can I choose the number of groups and replicates I want in the simulated matrix?
3- Where can I choose the DE genes?
4- How can I tell Polyester to generate a time course simulated data?
Could anyone help me understand where my mistakes are? I highly appreciate it.