Question: How to proceed with different degradation levels of RNA-seq samples?
gravatar for aec
18 months ago by
aec40 wrote:

Dear all,

I am dealing with a mixture of high/low complexity RNA-seq libraries, some of RNA samples had low RIN values (~4) and others high (>8) all coming from human brain tissue. I found high proportion of multimapping reads (~40%) for the most affected and a different proportion of gene biotypes detected (90% of the reads mapped to protein-coding for unaffected samples vs 60% for higly degraded). The rest mapped to ribosomal RNA and others (i.e miscRNA). I am wondering how confident I can be performing a differential expression analysis (control vs patient) on such a variable dataset. Is there a way to control for the level of degradation? like design ~degradation+condition ? or should I use surrogate variable analysis to remove unwanted variation? When I plot a PCA the most degraded samples cluster far apart.


ADD COMMENTlink modified 18 months ago by Aaron Lun21k • written 18 months ago by aec40
gravatar for Aaron Lun
18 months ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

Your options are:

  1. Discard the low-quality libraries.
  2. Add a blocking factor corresponding to quality (high/low).
  3. Use array weights in a voom-limma pipeline.

Option 1 is the safest approach, given the systematic differences in the characteristics of the degraded libraries. The removal of a few libraries is not a major issue, compared to the problems of getting the wrong conclusions due to the funny behaviours of low-quality samples. For example, we routinely discard up to 10% of libraries during quality control of single-cell RNA-sequencing data. However, if discarding the libraries would result in the loss of too many samples, you might consider combining options 2 and 3 to make the best of a bad situation.

ADD COMMENTlink modified 18 months ago • written 18 months ago by Aaron Lun21k

Thanks Aaron.

In my case PC1 is separating the outliers from the rest (50% variance), what is better to correct for PC1 in my model design or just add a covariate with high/low quality as you pointed?

Other option would be trying this methodology:

they present 'quality surrogate variable analysis' (qsva) for this purpose.

I think I will try different options (also removing outliers) and see what I get.

ADD REPLYlink written 18 months ago by aec40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 350 users visited in the last hour