I am dealing with a mixture of high/low complexity RNA-seq libraries, some of RNA samples had low RIN values (~4) and others high (>8) all coming from human brain tissue. I found high proportion of multimapping reads (~40%) for the most affected and a different proportion of gene biotypes detected (90% of the reads mapped to protein-coding for unaffected samples vs 60% for higly degraded). The rest mapped to ribosomal RNA and others (i.e miscRNA). I am wondering how confident I can be performing a differential expression analysis (control vs patient) on such a variable dataset. Is there a way to control for the level of degradation? like design ~degradation+condition ? or should I use surrogate variable analysis to remove unwanted variation? When I plot a PCA the most degraded samples cluster far apart.