edgeR- DE analysis with no replicates
1
0
Entering edit mode
knipehf • 0
@knipehf-11012
Last seen 8.5 years ago

Hello,

First of all I would like to apologise if this is a straight forward question/ similar to another thread.

I have been given some RNA-seq data to do my dissertation (i say this as there is no possibility for more sequencing!), and have assembled a de novo transcriptome. Now I want to look at differentially expressed genes. 

Unfortunately, there are no replicates, and nestedness within the design. 

There are 6 individuals- normal male, infected male (intersex1), infected male (intersex2), normal female, normal female infected, infected female (intersex3). From this I want to look at the differences between males/females, infected/uninfected and normal males/females/intersex1/2/3. However, I also have 4 different tissue types, from each of these individuals which I also want to compare.

So my question is, what would be the best way to approach such analysis?

I can't really remove any explanatory factors, as they are all important- and even then i still probably do not have enough replicates. I am not entirely sure what a 'reasonable' dispersion value would be, as this is not a controlled experiment (environmental samples). So perhaps the method which appeals to me most would be to identify the non DE genes and calculate the dispersion value from that? 

Any advice/ suggestions would be greatly appreciated! 

Many thanks, 

Hazel

 

 

edger differential gene expression without replicate nested design help • 2.7k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 4 minutes ago
The city by the bay

Your idea of using non-DE genes is circular. To determine whether a gene is DE or not, you need an estimate for the dispersion, so you're back to the same problem as before. While you could do this in iterations (i.e., identify non-DE genes using an initial value for the dispersion, re-estimate the dispersion from the non-DE genes, and repeat using the new value), there's no guarantee that those iterations would converge to any sensible value for the dispersion. In fact, some back-of-the-envelope calculations suggest that they probably don't.

Anyway, I assume you've read the relevant section of the edgeR user's guide on this topic. The idea of dropping the least-important explanatory factors is only to estimate the dispersion; you still use the full design matrix to fit your GLM and to do your contrasts. If the dropped factor is important, then your dispersion will be overestimated (due to hidden DE inflating the variability between "replicates") and you'll be more conservative. This is probably the lesser of the two evils, with the alternative being that you detect a whole lot of false positives.

Really, though, if I got this data set, I would complain loudly to the person who gave it to me, and then refuse to do anything more than a descriptive analysis of the log-fold changes. Why should I have to suffer due to someone else's bad experimental design?

ADD COMMENT
1
Entering edit mode

The suffering part is an unfortunate consequence of the (sometimes perverse) power dynamic in academia where graduate students often feel they have little power to assert their views/will "up the chain"

 

 

ADD REPLY
0
Entering edit mode

 

Thank you for the quick response!

I see what you mean, thank you. The only libraries that I can think of that would probably be the most similar/ less influenced by infection or sex are the muscle and 'head' (I know, not technically a tissue!), although I cannot say this for sure. Would it work if I used either of those to estimate the dispersion and then apply to the full design matrix? So I would ignore sex and infection for the estimation, and then just look at the other tissues and sexes for my main analysis (hepatopancreas, ovary, testes and muscle or head). 

 

ADD REPLY
1
Entering edit mode

Yes, ignoring sex and/or infection seems like a good place to start (tissues will have to much DE between them to be sensibly ignored). In effect, you treat each individual as replicates of each other, allowing you to block on the individual and tissue in your design matrix for dispersion estimation. You then use the full design matrix to do your contrasts - presumably this has 6*4 = 24 coefficients for all the different combinations of individual/tissue.

ADD REPLY
0
Entering edit mode

That's great, thank you so much! 

ADD REPLY

Login before adding your answer.

Traffic: 704 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6