How to use dispersion estimated from one dataset in another dataset in EdgeR
1
0
Entering edit mode
lwc628 ▴ 20
@lwc628-6832
Last seen 8.8 years ago
United States

Hi.

I have ~30 biological replicate RNAseq data, and the plan is to measure the variance/dispersion of expression for each gene  and use this information for the future dataset from this system for the improved inference. 

Question is :

1) Is it okay to use gene-specific dispersion estimated from this dataset and use them for the other dataset and bypass its

estimateGLMCommonDisp, estimateGLMTrendedDisp, estimateGLMTagwiseDisp steps in edgeR? or is there way to use dispersion from 30 biological replicates as a prior?

2) If so, I was wondering if I can replace DGEList$tagwise.dispersion of the second dataset with the ones obtained from the first one?  or is there better way to achieve this?

3) If there is better approach to this problem, please let me know!

edger • 833 views
ADD COMMENT
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 6 hours ago
The city by the bay

I think that really depends on whether the variability amongst your original 30 replicates (i.e., the "reference" dataset) is comparable to the variability of your future datasets. Obviously, if the variabilities of the two datasets are very different, then it would be inappropriate to use the dispersion estimate of one dataset when analyzing the other. Some more detail on the nature of your system would be helpful.

In general, to tell if the variabilities are similar, you can include biological replication in your future datasets. You can then estimate the trended dispersion separately for each future dataset, and compare it to the trend for the reference dataset. If the trends are similar in shape and scale, then it may be appropriate to combine the datasets. Don't compare the tagwise dispersions; these estimates will be less precise if the future datasets are smaller, so differences might be misleading.

If the variabilites are similar (and you do have biological replicates in the future datasets), you can re-estimate the dispersions using all libraries from both the future and reference datasets. This means you get more residual degrees of freedom and more precise estimates for the tagwise dispersions, i.e., better than the estimates from the reference dataset alone. Otherwise, if worst comes to worst, you can just analyze the future dataset separately with edgeR.

ADD COMMENT

Login before adding your answer.

Traffic: 814 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6