Question

How to proceed with different degradation levels of RNA-seq samples?

0

Entering edit mode

aec ▴ 90

@aec-9409

Last seen 5.7 years ago

Dear all,

I am dealing with a mixture of high/low complexity RNA-seq libraries, some of RNA samples had low RIN values (~4) and others high (>8) all coming from human brain tissue. I found high proportion of multimapping reads (~40%) for the most affected and a different proportion of gene biotypes detected (90% of the reads mapped to protein-coding for unaffected samples vs 60% for higly degraded). The rest mapped to ribosomal RNA and others (i.e miscRNA). I am wondering how confident I can be performing a differential expression analysis (control vs patient) on such a variable dataset. Is there a way to control for the level of degradation? like design ~degradation+condition ? or should I use surrogate variable analysis to remove unwanted variation? When I plot a PCA the most degraded samples cluster far apart.

Thanks,

degradation RNA-seq variability differential expression brain • 2.7k views

ADD COMMENT • link updated 8.9 years ago by Aaron Lun ★ 29k • written 8.9 years ago by aec ▴ 90

score 1 · Answer 1 · 2017-04-07

1

Entering edit mode

Aaron Lun ★ 29k

@alun

Last seen 1 hour ago

The city by the bay

Your options are:

Discard the low-quality libraries.
Add a blocking factor corresponding to quality (high/low).
Use array weights in a voom-limma pipeline.

Option 1 is the safest approach, given the systematic differences in the characteristics of the degraded libraries. The removal of a few libraries is not a major issue, compared to the problems of getting the wrong conclusions due to the funny behaviours of low-quality samples. For example, we routinely discard up to 10% of libraries during quality control of single-cell RNA-sequencing data. However, if discarding the libraries would result in the loss of too many samples, you might consider combining options 2 and 3 to make the best of a bad situation.

ADD COMMENT • link 8.9 years ago Aaron Lun ★ 29k

0

Entering edit mode

Thanks Aaron.

In my case PC1 is separating the outliers from the rest (50% variance), what is better to correct for PC1 in my model design or just add a covariate with high/low quality as you pointed?

Other option would be trying this methodology:

http://biorxiv.org/content/early/2016/09/09/074245

they present 'quality surrogate variable analysis' (qsva) for this purpose.

I think I will try different options (also removing outliers) and see what I get.

ADD REPLY • link 8.9 years ago aec ▴ 90