Could an experienced user of Polyester help me with some questions?
0
0
Entering edit mode
@fagholizadeh89-12402
Last seen 5.6 years ago

Hi,

I'm highly in need of simulating a count matrix. I have read the manual and vignette of Polyester several times but I find it very confusing. The problem I'm gonna solve using Polyester is that I have a count matrix of 4 replicates in 4 time points from a real experiment. This is a time course study. I have downloaded the FASTA and GTF files from http://www.gencodegenes.org/releases/27.html .To put it in a nutshell, I have a real count matrix, a genome reference FASTA file and a GTF file. Now I want to simulate a count matrix for DE analysis. I want this simulated count matrix to have 4 replicates in 4 time points as the real count matrix does. But the problem is I don't know which function or functions in Polyester to use. I'm feeling really confused.

 

simulate_experiment_countmat creates FASTA files containing RNA-seq reads simulated from provided transcripts, with optional
differential expression between two groups (designated via read count matrix)----> it's used for two groups while I need 4 dependent groups over time.

simulate_experiment_empirical creates fasta files representing reads from a two-group experiment, where abundances and differential
expression are estimated from a real data set-----> It's again used for two groups.

create_read_numbers generates a simulated data set (counts Data matrix) based on known model parameters----->model parameters should be known. So maybe I can use this function to estimate the parameters:

get_params function estimates the parameters of a zero inflated negative binomial distribution based on a
real count data set based on the method of moments. The function also returns a spline fit of log
mean to log size which can be used when generating new simulated data.

There is an example of simulating time course data in the vignette: https://github.com/alyssafrazee/polyester#simulate_experiment_countmat-example but I still feel confused.

 

1- Even if I could use these functions, how would I know which genes are DE in the simulated count matrix?

2- How can I specify my replicates and groups in my real count matrix when I give it to Polyester as input and where can I choose the number of groups and replicates I want in the simulated matrix?

3- Where can I choose the DE genes?

4- How can I tell Polyester to generate a time course simulated data?

 

Could anyone help me understand where my mistakes are? I highly appreciate it.

rna-seq polyester simulation time course • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 720 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6