Question: How to proceed with different degradation levels of RNA-seq samples?
gravatar for aec
9 months ago by
aec30 wrote:

Dear all,

I am dealing with a mixture of high/low complexity RNA-seq libraries, some of RNA samples had low RIN values (~4) and others high (>8) all coming from human brain tissue. I found high proportion of multimapping reads (~40%) for the most affected and a different proportion of gene biotypes detected (90% of the reads mapped to protein-coding for unaffected samples vs 60% for higly degraded). The rest mapped to ribosomal RNA and others (i.e miscRNA). I am wondering how confident I can be performing a differential expression analysis (control vs patient) on such a variable dataset. Is there a way to control for the level of degradation? like design ~degradation+condition ? or should I use surrogate variable analysis to remove unwanted variation? When I plot a PCA the most degraded samples cluster far apart.


ADD COMMENTlink modified 9 months ago by Aaron Lun18k • written 9 months ago by aec30
gravatar for Aaron Lun
9 months ago by
Aaron Lun18k
Cambridge, United Kingdom
Aaron Lun18k wrote:

Your options are:

  1. Discard the low-quality libraries.
  2. Add a blocking factor corresponding to quality (high/low).
  3. Use array weights in a voom-limma pipeline.

Option 1 is the safest approach, given the systematic differences in the characteristics of the degraded libraries. The removal of a few libraries is not a major issue, compared to the problems of getting the wrong conclusions due to the funny behaviours of low-quality samples. For example, we routinely discard up to 10% of libraries during quality control of single-cell RNA-sequencing data. However, if discarding the libraries would result in the loss of too many samples, you might consider combining options 2 and 3 to make the best of a bad situation.

ADD COMMENTlink modified 9 months ago • written 9 months ago by Aaron Lun18k

Thanks Aaron.

In my case PC1 is separating the outliers from the rest (50% variance), what is better to correct for PC1 in my model design or just add a covariate with high/low quality as you pointed?

Other option would be trying this methodology:

they present 'quality surrogate variable analysis' (qsva) for this purpose.

I think I will try different options (also removing outliers) and see what I get.

ADD REPLYlink written 9 months ago by aec30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 106 users visited in the last hour