Dear all,
Depending on the normalization approach (none, quantile, TMM or DESeq2) applied to the limma-voom function, the number of surrogate variables found by SVA and number of differentially expressed genes changes a lot. My question is, does SVA replace the normalization step? For example, if the RNA-seq samples have different sequencing depths, is this a technical factor that SVA corrects for?
Thanks,
Gordon,
My problem is that I have been trying SVA with different normalization options as follows:
For option 6 I followed :https://www.bioconductor.org/help/workflows/rnaseqGene/. Why is it that the number of sv depends so much on applying 'voom' or not?
It is completely incorrect to input counts to num.sv() as if they were microarray expression values. You have to use voom() or cpm() to convert the counts to a scale on which a microarray-type analysis makes sense.
When you do use voom(), you get the same number of surrogate variables regardless of what normalization method you used, just as I suggested to you would happen.
Ok, then option 5 is discarded. But what about the rest?
option 4 used TMM normalization and option 6 uses DESeq2 normalization, they both give 57 sv.
options 1,2,3 use voom and give 1 sv.
option 6 follows this code https://www.bioconductor.org/help/workflows/rnaseqGene/ :
Option 4 is counts. Option 5 is counts. Option 6 is counts. They are all incorrect. Option 6 is not the same as the rnaseq workflow.
If you use voom, you get the correct result ( n.sv=1 ).
If you ignore voom and just use counts, you get the wrong result ( n.sv=57 ).
That's pretty clear, is it not?
You are right Gordon, if I add this step:
I got the same results as 'voom'.
If I want to calculate n.sv using the rnaseq DESeq2 worklow, how to do it? it is not specified. Should I use the 'rlog' or 'vst' transformations?