RUV - Does my data need further normalisation?
1
0
Entering edit mode
BioinfGuru ▴ 70
@yagalbi-11519
Last seen 1 day ago
Ireland

Hi all,

I have a dataset of 23 bulk rna-seq samples for differential expression analysis. Using both RUVs and RUVr I have produced the following images to assess/mitigate unwanted technical variation. the k-values are 18 (RUVs) and 15 (RUVr) before I get any clustering by variable of interest, and even then the RLE plot still shows evidence of technical variation present.

I'm really not sure what to do here. There is no other batch effect I can account for. My next idea is to try SVA, but I imagine there won't be much of a difference. I could also separate out the single variable "trial_condition" to 2 variables "trial" and "condition"... but again, I don't see it making much of a difference.

Any advice is appreciated, I'm really not sure how to proceed.

Regards

Kenneth

Raw counts without RUV: enter image description here

Raw counts with RUVs: enter image description here

Raw counts with RUVr: enter image description here

EDASeq plotRLE RNAseq • 312 views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 9 minutes ago
WEHI, Melbourne, Australia

Further normalization!? In my opinion, you have already over-normalized the data by a very long way. You start with data that shows no evidence of any treatment effect. You have 23 samples in 4 groups, so there are 23 - 4 = 19 residual degrees of freedom for replication. You add 18 RUVs columns, removing almost all those residual degrees of freedom. After removing almost all the replication variation, the groups now appear artificially separated. I would myself have no confidence in such an analysis.

RUV is a very good tool and less prone to over-fitting than most competitors, but I think you've pushed it outside its usage envelope.

Your original data showed some batch effect between the two trials, but that would already be handled by a trial effect in the model.

ADD COMMENT
0
Entering edit mode

I knew I had something wrong. I realise now that in all the examples I've seen of RUV/SVA they all start with data that already shows a treatment effect but 1 or 2 samples are not clustering as expected. This makes much more sense now. Thank you for your assessment and explanation.

ADD REPLY
0
Entering edit mode

Hi Gordon (or anyone),

As a sanity check of my understanding of the appropriate use of RUV with the following 2 images of data from a different tissue. Is this another example of artificially separating the groups? Where do I choose to stop increasing the k value (and how long is a piece of string)?

1) When k is set to 1, the use of RUVr and SVA show an improvement in clustering of expected groups, and the only obvious RLE plot changes are the box sizes and whiskers of T1. When k = 3, T1 and T2 are separated by PC1

k1

2) When k is set to 9, for the first time one of the plots (SVA) clusters the 4 groups separately, and the RLE plot shows more uniform box sizes.

k9

ADD REPLY

Login before adding your answer.

Traffic: 660 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6