I'm using voom and limma to analyse RNA-seq data. In general, I have followed the basic instructions regarding RNA-seq analysis presented in the limma manual.
The matrix of read counts is filtered so that I keep genes where cpm>=1 in at least 3 samples. This is done using the following code:
Next, I apply the TMM normalization and use the results as input for voom
DGE=DGEList(matrix) DGE=calcNormFactors(DGE,method =c("TMM")) v=voom(DGE,design,plot=T)
However, with one of my datasets, the plot produced by voom looks generally ok, but with a gap midway
Looking back at the manual for limma, I noticed the following note:
If the data are very noisy, one can apply the same between-array normalization methods as would be used for microarrays, for example: v <- voom(counts,design,plot=TRUE,normalize="quantile")
Although I am not certain that my dataset qualifies as "very noisy" (happy to read what people have to say about it or point me to examples) I decided to give it a whirl.
The plot looks better than before:
I should finish by saying that I went on and perform the differential expression analysis using the two outputs from voom. Things change marginally (1570 vs 1758 DEG out of 15729 genes at 5% FDR when using the "TMM" vs "quantile" strategy). Not surprisingly, the genes differentially expressed in one configuration only are associated with Q-values close to the 0.05 cut-off.
I am a bit uncertain about using the quantile normalization route (did not find a reference where people had used it), so I am eager to have inputs/guidelines about when to decide using TMM vs quantile norm.
Input on this most welcome!