Question

RUV - Does my data need further normalisation?

0

Entering edit mode

BioinfGuru ▴ 70

@yagalbi-11519

Last seen 16 months ago

Ireland

Hi all,

I have a dataset of 23 bulk rna-seq samples for differential expression analysis. Using both RUVs and RUVr I have produced the following images to assess/mitigate unwanted technical variation. the k-values are 18 (RUVs) and 15 (RUVr) before I get any clustering by variable of interest, and even then the RLE plot still shows evidence of technical variation present.

I'm really not sure what to do here. There is no other batch effect I can account for. My next idea is to try SVA, but I imagine there won't be much of a difference. I could also separate out the single variable "trial_condition" to 2 variables "trial" and "condition"... but again, I don't see it making much of a difference.

Any advice is appreciated, I'm really not sure how to proceed.

Regards

Kenneth

Raw counts without RUV: enter image description here

Raw counts with RUVs: enter image description here

Raw counts with RUVr: enter image description here

EDASeq plotRLE RNAseq • 1.4k views

ADD COMMENT • link 17 months ago BioinfGuru ▴ 70

score 2 · Accepted Answer · 2024-08-12

2

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 48 minutes ago

WEHI, Melbourne, Australia

Further normalization!? In my opinion, you have already over-normalized the data by a very long way. You start with data that shows no evidence of any treatment effect. You have 23 samples in 4 groups, so there are 23 - 4 = 19 residual degrees of freedom for replication. You add 18 RUVs columns, removing almost all those residual degrees of freedom. After removing almost all the replication variation, the groups now appear artificially separated. I would myself have no confidence in such an analysis.

RUV is a very good tool and less prone to over-fitting than most competitors, but I think you've pushed it outside its usage envelope.

Your original data showed some batch effect between the two trials, but that would already be handled by a trial effect in the model.

ADD COMMENT • link 17 months ago Gordon Smyth 53k

0

Entering edit mode

I knew I had something wrong. I realise now that in all the examples I've seen of RUV/SVA they all start with data that already shows a treatment effect but 1 or 2 samples are not clustering as expected. This makes much more sense now. Thank you for your assessment and explanation.

ADD REPLY • link 17 months ago BioinfGuru ▴ 70

0

Entering edit mode

Hi Gordon (or anyone),

As a sanity check of my understanding of the appropriate use of RUV with the following 2 images of data from a different tissue. Is this another example of artificially separating the groups? Where do I choose to stop increasing the k value (and how long is a piece of string)?

1) When k is set to 1, the use of RUVr and SVA show an improvement in clustering of expected groups, and the only obvious RLE plot changes are the box sizes and whiskers of T1. When k = 3, T1 and T2 are separated by PC1

2) When k is set to 9, for the first time one of the plots (SVA) clusters the 4 groups separately, and the RLE plot shows more uniform box sizes.

ADD REPLY • link 17 months ago BioinfGuru ▴ 70