Search
Question: How to proceed with different degradation levels of RNA-seq samples?
0
18 months ago by
aec40
aec40 wrote:

Dear all,

I am dealing with a mixture of high/low complexity RNA-seq libraries, some of RNA samples had low RIN values (~4) and others high (>8) all coming from human brain tissue. I found high proportion of multimapping reads (~40%) for the most affected and a different proportion of gene biotypes detected (90% of the reads mapped to protein-coding for unaffected samples vs 60% for higly degraded). The rest mapped to ribosomal RNA and others (i.e miscRNA). I am wondering how confident I can be performing a differential expression analysis (control vs patient) on such a variable dataset. Is there a way to control for the level of degradation? like design ~degradation+condition ? or should I use surrogate variable analysis to remove unwanted variation? When I plot a PCA the most degraded samples cluster far apart.

Thanks,

modified 18 months ago by Aaron Lun21k • written 18 months ago by aec40
0
18 months ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

2. Add a blocking factor corresponding to quality (high/low).
3. Use array weights in a voom-limma pipeline.

Option 1 is the safest approach, given the systematic differences in the characteristics of the degraded libraries. The removal of a few libraries is not a major issue, compared to the problems of getting the wrong conclusions due to the funny behaviours of low-quality samples. For example, we routinely discard up to 10% of libraries during quality control of single-cell RNA-sequencing data. However, if discarding the libraries would result in the loss of too many samples, you might consider combining options 2 and 3 to make the best of a bad situation.

Thanks Aaron.

In my case PC1 is separating the outliers from the rest (50% variance), what is better to correct for PC1 in my model design or just add a covariate with high/low quality as you pointed?

Other option would be trying this methodology:

http://biorxiv.org/content/early/2016/09/09/074245

they present 'quality surrogate variable analysis' (qsva) for this purpose.

I think I will try different options (also removing outliers) and see what I get.