Perplexing results with Limma with RNA-seq data
1
0
Entering edit mode
chris86 ▴ 400
@chris86-8408
Last seen 2.4 years ago
UCL, United Kingdom

Hi

I thought I would post my data because it is pretty strange & interesting. I am testing limma compared with DESEQ for the analysis of miRNA seq samples. Both are using the same design or grouping. On my other data set they roughly performed the same, limma being slightly less conservative.

For DESEQ I am using my standard code.

DE genes = 0. (7 reps group 1, 12, group 2). There is some moderate level of difference between groups judged qualitatively from PCA plots - but no DE genes is not surprising from that.

DESEQ

cds <- newCountDataSet( countTable2, group )
cds <- estimateSizeFactors( cds )
sizeFactors( cds )
cds <- estimateDispersions( cds )

res_1 <- nbinomTest( cds, "1", "2" )
res1sig <- subset(res_1, padj <= 0.05)

For Limma I have followed the manual and have used 3 types of normalisation.

design <- model.matrix(~group)
matrix <- data.matrix(countTable2)

dge <- DGEList(counts=matrix)
dge <- calcNormFactors(dge)

v <- voom(countTable2, design, plot=TRUE) ###### (method1)
v <- voom(dge, design, plot=TRUE) ######### (method2)
v <- voom(countTable2,design,plot=TRUE,normalize="quantile") ####### (method3)

fit <- lmFit(v,design)
fit <- eBayes(fit)
top2 <- topTable(fit,coef=2,number=Inf,sort.by="P")
sum(top2\$adj.P.Val<0.05)

So method 1 gives me 1663 DE miRNA genes, from 2576 total. This doesnt seem right to me especially when compared with DESEQ.

Method 2 gives me 1350, still pretty strange result. Why are there so many?

Method 3 gives me 13, this seems much more similar to DESEQ. This kind of figure we would expect.

I have also tested this on another cell type which has only two reps versus like 15 and it has about 1000 DE genes with limma and none for DESEQ, this is part of the same data set. The DESEQ results seem more belivable on this data set except for when I do quantile normalisation with limma. That seems to balance the two. I find it strange that the other methods that normalise according to depth generate such high numbers of DE genes compared to quantile normalisation. My understanding of these methods is that they should give roughly similar results. Any thoughts or helpful comments?

Thanks,

Chris

sequencing microarray • 1.8k views
1
Entering edit mode
@steve-lianoglou-2771
Last seen 4 days ago
United States

1. On the DESeq side, you should update your "analysis pipeline" to DESeq2. It's just better in every way, and the authors of DESeq are encouraging people migrate to it.
2. Of your three attempts to use voom, your #2 is perhaps the closest one to being "correct" (although still very wrong, see below). The key is that when you just pass voom a count matrix, it normalizes by total count per sample rather than TMM (which is the default method for calcNormFactors), which would be my default preference.

Perhaps most importantly, though, what you are failing to do completely in the voom analysis is to remove the rows of your count matrix that are lowly expressed, because these hose the mean/variance estimation, which you can see/diagnose on the plot generated with you call to voom(..., plot=TRUE). In brief, you want to ensure that the red fit line doesn't "slope/hook back down" (I'm talking as if you are "reading" the graph from right to left here) due to the drop/discretization in the observed variance from the black dots on the left side of the plot.

You can further diagnose this problem by looking at the MA plots you get from your voom results, are the majority of the "excessively" called differentially expressed genes at the lower end of the expression spectrum (ie. on the left hand side of the plot)?

Note that all of the voom examples in the limma user's guide filter out lowly expressed rows from DGEList before passing it down to voom.

Please also refer to the comments from Jim and Gordon here: Nanostring analysis with limma, to learn a bit more about the how's and why's of different normalization strategies you might try.