Question

Please help with estimating dispersion value using housekeeping genes in edgeR

0

Entering edit mode

Sara ▴ 10

@sara-9865

Last seen 16 months ago

Germany

Hi all experts,

I am very new in R and Bioconductor, so please be patient with me. I would like to do DE analysis for samples (without replicate) as a part of an RNA-seq project. As I read the R manual, firstly, we should estimate the dispersion value of samples using housekeeping genes. However, I have not any idea or background for estimating dispersion value by housekeeping genes, could you please help me out to do this analysis? providing a clear example on this issue would be highly appreciated.

Thanks

estimate dispersion value edgeR Houskeeping genes • 1.1k views

ADD COMMENT • link updated 7.7 years ago by Ryan C. Thompson ★ 7.9k • written 7.7 years ago by Sara ▴ 10

score 1 · Answer 1 · 2016-08-04

I think you misunderstand the edgeR User's Guide. It doesn't say you should use housekeeping genes to estimate dispersions, it says (and I paraphrase here), that if you have a (largish) set of genes that you think are housekeeping genes, then you can use them to estimate the dispersions. This assumes you are willing to make a couple of assumptions. First, that the set of genes that you think are housekeeping genes are actually not changing expression level to any great degree between samples, and second, that the housekeeping genes have a reasonable spread of expression level (high to low).

I assume that this is the fourth possibility listed there because it's the least likely to work well, given that it relies on some pretty strong assumptions. While it is a possibility, as a novice user, you are most likely better served by choosing one of the first two methods listed on p. 22 of the User's Guide.

score 0 · Answer 2 · 2016-08-04

As James MacDonald says, this method is least preferred because it relies on you having prior knowledge of ideally several hundred genes that are not changing between your conditions, which is not a common scenario (not to mention relying on some substantial logical leaps beyond that). If you really want to pursue this avenue rather than one of the more recommended approaches, your best bet might be to find a data set of similar samples (i.e. same species, same tissue, similar conditions) with replicates, perhaps a published data set on GEO, and either use the dispersion estimate from that data set, or use that data set to determine which genes are not changing and select those genes as housekeeping genes, which you can then use to estimate the dispersion from your own data.

In general, though, trying to get meaningful p-values out of data with no replicates is trying to get blood from a stone. Imagine if I said to you: "I have a control sample with a value of 10 and a treated sample with a value of 20. Is there a significant difference?" Analyzing a no-replicate dataset is certainly not a task suitable for a beginning bioinformatician.