limma RNA-seq analysis of human data
Entering edit mode
chipolino • 0
Last seen 4.3 years ago


I am trying to analyze human RNA-seq data with limma. I have two conditions: affected by disease and not affected. More concretely, I have mother (VAN012, not affected), father (VAN013, affected), paternal grandma (VAN0092, affected), a kid (VAN011, girl, affected) and two unrelated and not affected controls (Ctl1, Ctl2, both males); I also have 2 replicates for each sample. My goal is to find DE genes in these two conditions. I am using limma for this task and after building MDS plot I see this pic:

Two conditions are not clustered at all, for example I see that not affected mother (VAN012) is very close to affected daughter (VAN011) and affected father (VAN013) is close to not affected control (Ctl2).

I guess, I will not obtain any good results with this dataset (as I said I am using limma, in my linear model I also include sex and family relation). So my question is, what is the proper way of doing RNA-seq analysis with such data?


Thank you



limma heterogenous human • 983 views
Entering edit mode
Aaron Lun ★ 28k
Last seen 58 minutes ago
The city by the bay

I don't think there's much you can do, for various reasons:

  • Your two "replicates" for each sample are technical, not biological. At best, this means that you can only claim replication across repeated sampling of the same individuals. You cannot make general statements about the effect of the disease on gene expression in the population, which is of much greater relevance. To do so, you would need independent biological replicates from a different set of (unrelated) individuals.
  • I'm not sure how you included "family relations" in the linear model. There is no obvious theoretical relationship between the expression of a gene in the parents and that in the child. If you had more families, you could block on the family factor using duplicateCorrelation() to account for correlations due to greater relatedness, but this is not possible here as you only have one family. This relates back to the first point; you need individuals from different families to make generalizable inferences.
  • Other factors are confounded with the disease condition, e.g., the controls are all male (not to mention ethnicity or age). Perhaps you could use the mother as an unaffected control for females, but then the comparisons between females are confounded by age.
  • Even putting aside all of the issues above, the MDS plot clearly demonstrates that there is no clear separation on disease. The second dimension represents the sex effect, but that's about it.

Long story short, I don't think there's a good way of analyzing this dataset.

Entering edit mode

Thank you for your reply,

In RNA-seq how do you usually handle technical replicates? Do you just analyze them independently (like I did here) or should you sum up the raw count from these replicates? According to this question (RNAseq with technical replicates: Does it sound right?) you should do summation, am I right?

Entering edit mode

Just sum them up; this won't discard any information, assuming that the counts are Poisson-distributed.


Login before adding your answer.

Traffic: 482 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6