Search
Question: limma RNA-seq analysis of human data
0
7 months ago by
g.e.novakovsky0 wrote:

Hi,

I am trying to analyze human RNA-seq data with limma. I have two conditions: affected by disease and not affected. More concretely, I have mother (VAN012, not affected), father (VAN013, affected), paternal grandma (VAN0092, affected), a kid (VAN011, girl, affected) and two unrelated and not affected controls (Ctl1, Ctl2, both males); I also have 2 replicates for each sample. My goal is to find DE genes in these two conditions. I am using limma for this task and after building MDS plot I see this pic:

Two conditions are not clustered at all, for example I see that not affected mother (VAN012) is very close to affected daughter (VAN011) and affected father (VAN013) is close to not affected control (Ctl2).

I guess, I will not obtain any good results with this dataset (as I said I am using limma, in my linear model I also include sex and family relation). So my question is, what is the proper way of doing RNA-seq analysis with such data?

Thank you

modified 7 months ago by Aaron Lun21k • written 7 months ago by g.e.novakovsky0
2
7 months ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

I don't think there's much you can do, for various reasons:

• Your two "replicates" for each sample are technical, not biological. At best, this means that you can only claim replication across repeated sampling of the same individuals. You cannot make general statements about the effect of the disease on gene expression in the population, which is of much greater relevance. To do so, you would need independent biological replicates from a different set of (unrelated) individuals.
• I'm not sure how you included "family relations" in the linear model. There is no obvious theoretical relationship between the expression of a gene in the parents and that in the child. If you had more families, you could block on the family factor using duplicateCorrelation() to account for correlations due to greater relatedness, but this is not possible here as you only have one family. This relates back to the first point; you need individuals from different families to make generalizable inferences.
• Other factors are confounded with the disease condition, e.g., the controls are all male (not to mention ethnicity or age). Perhaps you could use the mother as an unaffected control for females, but then the comparisons between females are confounded by age.
• Even putting aside all of the issues above, the MDS plot clearly demonstrates that there is no clear separation on disease. The second dimension represents the sex effect, but that's about it.

Long story short, I don't think there's a good way of analyzing this dataset.

In RNA-seq how do you usually handle technical replicates? Do you just analyze them independently (like I did here) or should you sum up the raw count from these replicates? According to this question (RNAseq with technical replicates: Does it sound right?) you should do summation, am I right?