How to proceed with different degradation levels of RNA-seq samples?
1
0
Entering edit mode
aec ▴ 90
@aec-9409
Last seen 4.4 years ago

Dear all,

I am dealing with a mixture of high/low complexity RNA-seq libraries, some of RNA samples had low RIN values (~4) and others high (>8) all coming from human brain tissue. I found high proportion of multimapping reads (~40%) for the most affected and a different proportion of gene biotypes detected (90% of the reads mapped to protein-coding for unaffected samples vs 60% for higly degraded). The rest mapped to ribosomal RNA and others (i.e miscRNA). I am wondering how confident I can be performing a differential expression analysis (control vs patient) on such a variable dataset. Is there a way to control for the level of degradation? like design ~degradation+condition ? or should I use surrogate variable analysis to remove unwanted variation? When I plot a PCA the most degraded samples cluster far apart.

Thanks,

degradation RNA-seq variability differential expression brain • 2.1k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 16 hours ago
The city by the bay

Your options are:

  1. Discard the low-quality libraries.
  2. Add a blocking factor corresponding to quality (high/low).
  3. Use array weights in a voom-limma pipeline.

Option 1 is the safest approach, given the systematic differences in the characteristics of the degraded libraries. The removal of a few libraries is not a major issue, compared to the problems of getting the wrong conclusions due to the funny behaviours of low-quality samples. For example, we routinely discard up to 10% of libraries during quality control of single-cell RNA-sequencing data. However, if discarding the libraries would result in the loss of too many samples, you might consider combining options 2 and 3 to make the best of a bad situation.

ADD COMMENT
0
Entering edit mode

Thanks Aaron.

In my case PC1 is separating the outliers from the rest (50% variance), what is better to correct for PC1 in my model design or just add a covariate with high/low quality as you pointed?

Other option would be trying this methodology:

http://biorxiv.org/content/early/2016/09/09/074245

they present 'quality surrogate variable analysis' (qsva) for this purpose.

I think I will try different options (also removing outliers) and see what I get.

ADD REPLY

Login before adding your answer.

Traffic: 588 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6