Search
Question: Limma RNAseq comparison with Cuffdiff
0
11 months ago by
nonCodingGene10 wrote:

I've the output of a differential expression analysis for RNA done with cuffdiff and I'm trying to check it with limma but I'm not getting the results that I would expect.

w164 <- read.table("COUNTfile.txt", header  = T, row.names = 1)

WM164_NEG_1 WM164_NEG_2 WM164_NEG_3 WM164_MDK_1 WM164_MDK_2 WM164_MDK_3
A1BG             125         152         145         127         121         114
A1BG-AS1          54          55          58          74          69          94
A1CF               0           0           0           0           0           0
A2LD1             91          85          81          56          58          69
A2M             6368        5305       10631        6560        6017        5833
A2ML1              2           1           8           1           4           4

Then I do try to do de differential expression analysis, I've tried to change the coef (1, 2) and the intercept (0, 1)

dge <- DGEList(counts=w164)
dge <- calcNormFactors(dge)
logCPM <- cpm(dge, log=TRUE, prior.count=1)

classes <- factor(c("NEG","NEG","NEG", "MDK","MDK","MDK"))
design <- model.matrix(~ 0 + classes)
#colnames(design) <- c("NEG", "MDK")

fit <- lmFit(logCPM, design)
fit <- eBayes(fit, trend=TRUE)
toptable <- topTable(fit, coef=2, number=nrow(fit\$p.value))

This is the output of our pipeline which used cuffdiff (just for the two more significant genes).

 gene sample_1 sample_2 FPKM_1 FPKM_2 log2(fold_change) test_stat q_value AASS WM164_NEG WM164_MDK 4.10347 2.48356 -0.724435 -2.3669 0.000702473 ABCB9 WM164_NEG WM164_MDK 5.60671 10.4515 0.898485 2.95674 0.000702473

This is the top table for both genes when coef = 1

        logFC  AveExpr        t      P.Value    adj.P.Val        B
AASS 4.366231 4.061036 31.65013 1.372641e-06 2.595976e-06 5.552753

ABCB9 4.292285 4.720068 18.24541 1.725563e-05 2.794191e-05 2.497849

When coef = 2

        logFC  AveExpr        t      P.Value    adj.P.Val        B
AASS 3.755841 4.061036 27.22551 2.747115e-06 4.929324e-06 4.717309
ABCB9 5.147851 4.720068 21.88221 7.501512e-06 1.272469e-05 3.50482

 The logFC seems to be the average of the values (from logCPM) from one of the conditions.

What I'm doing wrong?

Thanks

modified 11 months ago by Gordon Smyth34k • written 11 months ago by nonCodingGene10

Update, I've modify the classes vector such

classes <- factor(c("NEG","NEG","NEG", "MDK","MDK","MDK"), as.character(c("NEG", "MDK")))

Now logFC are quite close to the ones obtained in cuffdiff but none of the top 7 significant genes from cuffdiff are significant in limma.

3157 significant genes in cuffdiff

117 in limma

intersection of 29 genes

0
11 months ago by
Gordon Smyth34k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth34k wrote:

Two things:

First you need to define the design matrix as

design <- model.matrix(~ classes)

The use of "0+" in your formula is out of place and is causing you to test the wrong hypothesis. You are not actually testing for DE between MDK and NEG at all.

Second, you need to filter out genes for which the counts are all zero or all small before you run eBayes(). The limma pipeline assumes that you do some filtering first.