edgeR: estimating dispersion for nested designs
2
0
Entering edit mode
Mauve • 0
@mauve-7320
Last seen 9.8 years ago
Norway

Hi,

I have an RNA-seq experiment that is similar to section 3.5 in the edgeR user’s guide, i.e. a nested paired approach, and I have used this approach to analyze my own data. Briefly, the experiment involves 60 RNA samples, corresponding to two groups of bacterial strains (commensal (C) and disease-causing (D)); each group consisting of 15 different strains; either strain treated (IND) and not treated with a chemical (CTR). My questions are about estimating dispersion in this type of scenario (which is skipped in the user’s guide):

  1. Can I correctly estimate common/trended and tagvise dispersion using estimateGLMCommonDisp /estimateGLMTrendedDisp and estimateGLMTagwiseDisp (relative to a design matrix), even though there are no true biological replicates?
  2. How do I calculate the prior degrees of freedom in this case?

Any help will be greatly appreciated.

edger nested estimatedispersions • 2.0k views
ADD COMMENT
1
Entering edit mode

See Section 2.10 of the edgeR User's Guide "What to do if you have no replicates"

ADD REPLY
0
Entering edit mode

So in the section 3.5 example, which option was used?

ADD REPLY
0
Entering edit mode

The Section 3.5 example has replicates. There are 18 samples, and the design matrix has only 12 columns, so there are 6 residual df for estimating the dispersion. Hence all the edgeR glm dispersion estimation methods work.

PS. Please be careful to post follow-up questions as comments rather than answers. I have moved our interchange so far to be comments on your original question.

ADD REPLY
3
Entering edit mode
@gordon-smyth
Last seen 6 minutes ago
WEHI, Melbourne, Australia

I think that you may have misunderstood the example in Section 3.5. You seem to be assuming that it has no replicates, but there are 18 samples, and the design matrix has only 12 columns, so there are 6 residual df for estimating the dispersion. Hence all the edgeR glm dispersion estimation methods work.

You have not entirely explained the purpose of your experiment. Do you want to find genes DE between IND and CTR, treating the different strains as biological replicates? In that case, your experiment is like Section 3.5 and you do have replicates.

If you want to find genes that are DE between IND and CTR for each strain separately, then you don't have replicates, and I don't think you can do any formal statistical analysis. Just compute fold changes.

ADD COMMENT
0
Entering edit mode
Mauve • 0
@mauve-7320
Last seen 9.8 years ago
Norway

Thank you for clarifying, I was under the impression that real biological replicates were required to estimate dispersion correctly, but I realize now that as I am only interested in group-level differences in expression between CTR and IND the strains can be considered as replicates.

ADD COMMENT

Login before adding your answer.

Traffic: 517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6