Question: edgeR- DE analysis with no replicates
gravatar for knipehf
3.4 years ago by
knipehf0 wrote:


First of all I would like to apologise if this is a straight forward question/ similar to another thread.

I have been given some RNA-seq data to do my dissertation (i say this as there is no possibility for more sequencing!), and have assembled a de novo transcriptome. Now I want to look at differentially expressed genes. 

Unfortunately, there are no replicates, and nestedness within the design. 

There are 6 individuals- normal male, infected male (intersex1), infected male (intersex2), normal female, normal female infected, infected female (intersex3). From this I want to look at the differences between males/females, infected/uninfected and normal males/females/intersex1/2/3. However, I also have 4 different tissue types, from each of these individuals which I also want to compare.

So my question is, what would be the best way to approach such analysis?

I can't really remove any explanatory factors, as they are all important- and even then i still probably do not have enough replicates. I am not entirely sure what a 'reasonable' dispersion value would be, as this is not a controlled experiment (environmental samples). So perhaps the method which appeals to me most would be to identify the non DE genes and calculate the dispersion value from that? 

Any advice/ suggestions would be greatly appreciated! 

Many thanks, 




ADD COMMENTlink modified 3.4 years ago by Aaron Lun25k • written 3.4 years ago by knipehf0
Answer: edgeR- DE analysis with no replicates
gravatar for Aaron Lun
3.4 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

Your idea of using non-DE genes is circular. To determine whether a gene is DE or not, you need an estimate for the dispersion, so you're back to the same problem as before. While you could do this in iterations (i.e., identify non-DE genes using an initial value for the dispersion, re-estimate the dispersion from the non-DE genes, and repeat using the new value), there's no guarantee that those iterations would converge to any sensible value for the dispersion. In fact, some back-of-the-envelope calculations suggest that they probably don't.

Anyway, I assume you've read the relevant section of the edgeR user's guide on this topic. The idea of dropping the least-important explanatory factors is only to estimate the dispersion; you still use the full design matrix to fit your GLM and to do your contrasts. If the dropped factor is important, then your dispersion will be overestimated (due to hidden DE inflating the variability between "replicates") and you'll be more conservative. This is probably the lesser of the two evils, with the alternative being that you detect a whole lot of false positives.

Really, though, if I got this data set, I would complain loudly to the person who gave it to me, and then refuse to do anything more than a descriptive analysis of the log-fold changes. Why should I have to suffer due to someone else's bad experimental design?

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Aaron Lun25k

The suffering part is an unfortunate consequence of the (sometimes perverse) power dynamic in academia where graduate students often feel they have little power to assert their views/will "up the chain"



ADD REPLYlink written 3.4 years ago by Steve Lianoglou12k


Thank you for the quick response!

I see what you mean, thank you. The only libraries that I can think of that would probably be the most similar/ less influenced by infection or sex are the muscle and 'head' (I know, not technically a tissue!), although I cannot say this for sure. Would it work if I used either of those to estimate the dispersion and then apply to the full design matrix? So I would ignore sex and infection for the estimation, and then just look at the other tissues and sexes for my main analysis (hepatopancreas, ovary, testes and muscle or head). 


ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by knipehf0

Yes, ignoring sex and/or infection seems like a good place to start (tissues will have to much DE between them to be sensibly ignored). In effect, you treat each individual as replicates of each other, allowing you to block on the individual and tissue in your design matrix for dispersion estimation. You then use the full design matrix to do your contrasts - presumably this has 6*4 = 24 coefficients for all the different combinations of individual/tissue.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by Aaron Lun25k

That's great, thank you so much! 

ADD REPLYlink written 3.4 years ago by knipehf0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 220 users visited in the last hour