Please help with estimating dispersion value using housekeeping genes in edgeR
Entering edit mode
Sara ▴ 10
Last seen 9 months ago

Hi all experts,

I am very new in R and Bioconductor, so please be patient with me. I would like to do DE analysis for samples (without replicate) as a part of an RNA-seq project. As I read the R manual, firstly, we should estimate the dispersion value of samples using housekeeping genes. However, I have not any idea or background for estimating dispersion value by housekeeping genes, could you please help me out to do this analysis? providing a clear example on this issue would be highly appreciated.



estimate dispersion value edgeR Houskeeping genes • 768 views
Entering edit mode
Last seen 1 day ago
United States

I think you misunderstand the edgeR User's Guide. It doesn't say you should use housekeeping genes to estimate dispersions, it says (and I paraphrase here), that if you have a (largish) set of genes that you think are housekeeping genes, then you can use them to estimate the dispersions. This assumes you are willing to make a couple of assumptions. First, that the set of genes that you think are housekeeping genes are actually not changing expression level to any great degree between samples, and second, that the housekeeping genes have a reasonable spread of expression level (high to low).

I assume that this is the fourth possibility listed there because it's the least likely to work well, given that it relies on some pretty strong assumptions. While it is a possibility, as a novice user, you are most likely better served by choosing one of the first two methods listed on p. 22 of the User's Guide.

Entering edit mode
Last seen 20 months ago
Scripps Research, La Jolla, CA

As James MacDonald says, this method is least preferred because it relies on you having prior knowledge of ideally several hundred genes that are not changing between your conditions, which is not a common scenario (not to mention relying on some substantial logical leaps beyond that). If you really want to pursue this avenue rather than one of the more recommended approaches, your best bet might be to find a data set of similar samples (i.e. same species, same tissue, similar conditions) with replicates, perhaps a published data set on GEO, and either use the dispersion estimate from that data set, or use that data set to determine which genes are not changing and select those genes as housekeeping genes, which you can then use to estimate the dispersion from your own data.

In general, though, trying to get meaningful p-values out of data with no replicates is trying to get blood from a stone. Imagine if I said to you: "I have a control sample with a value of 10 and a treated sample with a value of 20. Is there a significant difference?" Analyzing a no-replicate dataset is certainly not a task suitable for a beginning bioinformatician.

Entering edit mode

Thank you very much for all comments. 


Login before adding your answer.

Traffic: 312 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6