Search
Question: How to proceed with different degradation levels of RNA-seq samples?
0
gravatar for aec
7 months ago by
aec30
aec30 wrote:

Dear all,

I am dealing with a mixture of high/low complexity RNA-seq libraries, some of RNA samples had low RIN values (~4) and others high (>8) all coming from human brain tissue. I found high proportion of multimapping reads (~40%) for the most affected and a different proportion of gene biotypes detected (90% of the reads mapped to protein-coding for unaffected samples vs 60% for higly degraded). The rest mapped to ribosomal RNA and others (i.e miscRNA). I am wondering how confident I can be performing a differential expression analysis (control vs patient) on such a variable dataset. Is there a way to control for the level of degradation? like design ~degradation+condition ? or should I use surrogate variable analysis to remove unwanted variation? When I plot a PCA the most degraded samples cluster far apart.

Thanks,

ADD COMMENTlink modified 7 months ago by Aaron Lun17k • written 7 months ago by aec30
0
gravatar for Aaron Lun
7 months ago by
Aaron Lun17k
Cambridge, United Kingdom
Aaron Lun17k wrote:

Your options are:

  1. Discard the low-quality libraries.
  2. Add a blocking factor corresponding to quality (high/low).
  3. Use array weights in a voom-limma pipeline.

Option 1 is the safest approach, given the systematic differences in the characteristics of the degraded libraries. The removal of a few libraries is not a major issue, compared to the problems of getting the wrong conclusions due to the funny behaviours of low-quality samples. For example, we routinely discard up to 10% of libraries during quality control of single-cell RNA-sequencing data. However, if discarding the libraries would result in the loss of too many samples, you might consider combining options 2 and 3 to make the best of a bad situation.

ADD COMMENTlink modified 7 months ago • written 7 months ago by Aaron Lun17k

Thanks Aaron.

In my case PC1 is separating the outliers from the rest (50% variance), what is better to correct for PC1 in my model design or just add a covariate with high/low quality as you pointed?

Other option would be trying this methodology:

http://biorxiv.org/content/early/2016/09/09/074245

they present 'quality surrogate variable analysis' (qsva) for this purpose.

I think I will try different options (also removing outliers) and see what I get.

ADD REPLYlink written 7 months ago by aec30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 150 users visited in the last hour