Preprocessing Issue
0
0
Entering edit mode
MOHAMMAD • 0
@MOHAMMAD-24781
Last seen 4 weeks ago

Hi all,

I have two RNA-seq count datasets as following:

dataset A contains 3 samples and 3 controls

dataset B contains 81 samples with no controls

what is the best workflow to handle the preprocessing in this case:

A- remove batch-effect (for merged dataset) >>>>> quantile Normalization.

B- quantile Normalization (for merged dataset) >>>>> batch-effext removal.

C- quantile Normalization (for each dataset separately) >>>>>>> batch-effect removal.

Thank you in advance.

sva RNASeq Preprocessing Normalization BatchEffect • 100 views
ADD COMMENT
1
Entering edit mode

No controls in B means that controls are nested with dataset, and therefore you cannot correct anything. Also, since you have many more samples in B than in A the B samples would probably dominate whetever effect the A samples have, so it comes down essentially to samplesB vs controlsA, which as said above is confounded by study. Summary: This comparison is probably not meaningful as any DEGs you see could be entirely due to the batch effect which you cannot remove with this setup. samplesA vs controlsA is what you can do or try to define some kinds of subtypes in samplesB and see whether you can find differences between these. Depends on your project whether this makes sense.

Edit: Can you clarify how dataset A is different from B? Is A and B from the same lab, with same kits being used and only the time of experiment is different, or is this from completely independent studies?

ADD REPLY
1
Entering edit mode

I'm not entirely as pessimistic here. You do have some controls in one group that will at least let you parse the variation a bit. Though you will want to be very careful about over-interpreting any results that you get.

My normal workflow is to do quantile normalization across the merged data set and then batch effect removal. Typically batch effect removal techniques assume specific models on the systematic error that may be violated by quantile normalization. This is a good paper on when to use quantile normalization: https://www.biorxiv.org/content/10.1101/012203v1.full.pdf

ADD REPLY

Login before adding your answer.

Traffic: 442 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6