polyester: adding noise to RNASeq counts?
1
0
Entering edit mode
krishna312 • 0
@krishna312-6866
Last seen 8.5 years ago
Finland

I have a RNASeq raw count data. I want to generate different versions of the count data with varying level of random noise for a method evaluation. For example, the data with highest level of noise will have fewest differentially expressed genes and vice-versa.

I estimated the parameters of the original count data using 'get_params' function in 'polyester' package.
The 'create_read_numbers' function then uses the estimated parameters and generates count data with similar distribution, however, without biological signal (no differentially expressed genes).

Is it possible to retain the biological signal of the original data in the artificial data? And, then add varying level of noise into the generated data?

I will appreciate for your help!


Best wishes,

Krishna

rnaseq polyester • 1.6k views
ADD COMMENT
2
Entering edit mode
Alyssa Frazee ▴ 210
@alyssa-frazee-6710
Last seen 4.1 years ago
San Francisco, CA, USA

Hey Krishna,

In the `create_read_numbers` function, there are arguments called `mod` and `beta`. You can specify differential expression using the `beta` argument (`mod` is generally used to specify which group a sample belongs to, and `beta` can then be the differential expression coefficient for each gene. `beta` should have the same length as the number of genes, and is multiplicative, since the outcome value is on the log scale in this function). One way to retain the DE signal from the original data would be to estimate the differential expression coefficients directly from the original data, and use those as inputs (`beta`). 

Another way to do this would be to add the differing levels of random noise to the original data yourself (using whatever underlying distribution you'd like), and using the `simulate_experiment_countmat` function in polyester to generate the simulated reads. We designed `simulate_experiment_countmat` with use cases like this in mind (where you already have a transcript-by-sample matrix of counts). 

Hope this helps!

ADD COMMENT

Login before adding your answer.

Traffic: 853 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6