Question

edgeR- DE analysis with no replicates

0

Entering edit mode

knipehf • 0

@knipehf-11012

Last seen 7.8 years ago

Hello,

First of all I would like to apologise if this is a straight forward question/ similar to another thread.

I have been given some RNA-seq data to do my dissertation (i say this as there is no possibility for more sequencing!), and have assembled a de novo transcriptome. Now I want to look at differentially expressed genes.

Unfortunately, there are no replicates, and nestedness within the design.

There are 6 individuals- normal male, infected male (intersex1), infected male (intersex2), normal female, normal female infected, infected female (intersex3). From this I want to look at the differences between males/females, infected/uninfected and normal males/females/intersex1/2/3. However, I also have 4 different tissue types, from each of these individuals which I also want to compare.

So my question is, what would be the best way to approach such analysis?

I can't really remove any explanatory factors, as they are all important- and even then i still probably do not have enough replicates. I am not entirely sure what a 'reasonable' dispersion value would be, as this is not a controlled experiment (environmental samples). So perhaps the method which appeals to me most would be to identify the non DE genes and calculate the dispersion value from that?

Any advice/ suggestions would be greatly appreciated!

Many thanks,

Hazel

edger differential gene expression without replicate nested design help • 2.5k views

ADD COMMENT • link updated 7.8 years ago by Aaron Lun ★ 28k • written 7.8 years ago by knipehf • 0

score 2 · Answer 1 · 2016-06-30

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 5 hours ago

The city by the bay

Your idea of using non-DE genes is circular. To determine whether a gene is DE or not, you need an estimate for the dispersion, so you're back to the same problem as before. While you could do this in iterations (i.e., identify non-DE genes using an initial value for the dispersion, re-estimate the dispersion from the non-DE genes, and repeat using the new value), there's no guarantee that those iterations would converge to any sensible value for the dispersion. In fact, some back-of-the-envelope calculations suggest that they probably don't.

Anyway, I assume you've read the relevant section of the edgeR user's guide on this topic. The idea of dropping the least-important explanatory factors is only to estimate the dispersion; you still use the full design matrix to fit your GLM and to do your contrasts. If the dropped factor is important, then your dispersion will be overestimated (due to hidden DE inflating the variability between "replicates") and you'll be more conservative. This is probably the lesser of the two evils, with the alternative being that you detect a whole lot of false positives.

Really, though, if I got this data set, I would complain loudly to the person who gave it to me, and then refuse to do anything more than a descriptive analysis of the log-fold changes. Why should I have to suffer due to someone else's bad experimental design?

ADD COMMENT • link 7.8 years ago Aaron Lun ★ 28k

1

Entering edit mode

The suffering part is an unfortunate consequence of the (sometimes perverse) power dynamic in academia where graduate students often feel they have little power to assert their views/will "up the chain"

ADD REPLY • link 7.8 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Thank you for the quick response!

I see what you mean, thank you. The only libraries that I can think of that would probably be the most similar/ less influenced by infection or sex are the muscle and 'head' (I know, not technically a tissue!), although I cannot say this for sure. Would it work if I used either of those to estimate the dispersion and then apply to the full design matrix? So I would ignore sex and infection for the estimation, and then just look at the other tissues and sexes for my main analysis (hepatopancreas, ovary, testes and muscle or head).

ADD REPLY • link 7.8 years ago knipehf • 0

1

Entering edit mode

Yes, ignoring sex and/or infection seems like a good place to start (tissues will have to much DE between them to be sensibly ignored). In effect, you treat each individual as replicates of each other, allowing you to block on the individual and tissue in your design matrix for dispersion estimation. You then use the full design matrix to do your contrasts - presumably this has 6*4 = 24 coefficients for all the different combinations of individual/tissue.