Limma RNAseq comparison with Cuffdiff
1
0
Entering edit mode
@noncodinggene-7018
Last seen 5.7 years ago

I've the output of a differential expression analysis for RNA done with cuffdiff and I'm trying to check it with limma but I'm not getting the results that I would expect.

w164 <- read.table("COUNTfile.txt", header  = T, row.names = 1)

head(w164)

         WM164_NEG_1 WM164_NEG_2 WM164_NEG_3 WM164_MDK_1 WM164_MDK_2 WM164_MDK_3
A1BG             125         152         145         127         121         114
A1BG-AS1          54          55          58          74          69          94
A1CF               0           0           0           0           0           0
A2LD1             91          85          81          56          58          69
A2M             6368        5305       10631        6560        6017        5833
A2ML1              2           1           8           1           4           4

 

Then I do try to do de differential expression analysis, I've tried to change the coef (1, 2) and the intercept (0, 1)

 

dge <- DGEList(counts=w164)
dge <- calcNormFactors(dge)
logCPM <- cpm(dge, log=TRUE, prior.count=1)

classes <- factor(c("NEG","NEG","NEG", "MDK","MDK","MDK"))
design <- model.matrix(~ 0 + classes)
#colnames(design) <- c("NEG", "MDK")

fit <- lmFit(logCPM, design)
fit <- eBayes(fit, trend=TRUE)
toptable <- topTable(fit, coef=2, number=nrow(fit$p.value))

This is the output of our pipeline which used cuffdiff (just for the two more significant genes).

gene
sample_1
sample_2
FPKM_1
FPKM_2
log2(fold_change)
test_stat
q_value
AASS
WM164_NEG
WM164_MDK
4.10347
2.48356
-0.724435
-2.3669
0.000702473
ABCB9
WM164_NEG
WM164_MDK
5.60671
10.4515
0.898485
2.95674
0.000702473

This is the top table for both genes when coef = 1

        logFC  AveExpr        t      P.Value    adj.P.Val        B
AASS 4.366231 4.061036 31.65013 1.372641e-06 2.595976e-06 5.552753

ABCB9 4.292285 4.720068 18.24541 1.725563e-05 2.794191e-05 2.497849

 

When coef = 2

        logFC  AveExpr        t      P.Value    adj.P.Val        B
AASS 3.755841 4.061036 27.22551 2.747115e-06 4.929324e-06 4.717309
ABCB9 5.147851 4.720068 21.88221 7.501512e-06 1.272469e-05 3.50482
 
The logFC seems to be the average of the values (from logCPM) from one of the conditions.

What I'm doing wrong?

Thanks

limma • 878 views
ADD COMMENT
0
Entering edit mode

Update, I've modify the classes vector such

classes <- factor(c("NEG","NEG","NEG", "MDK","MDK","MDK"), as.character(c("NEG", "MDK")))

Now logFC are quite close to the ones obtained in cuffdiff but none of the top 7 significant genes from cuffdiff are significant in limma.

3157 significant genes in cuffdiff

117 in limma

intersection of 29 genes

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 9 hours ago
WEHI, Melbourne, Australia

Two things:

First you need to define the design matrix as

design <- model.matrix(~ classes)

The use of "0+" in your formula is out of place and is causing you to test the wrong hypothesis. You are not actually testing for DE between MDK and NEG at all.

Second, you need to filter out genes for which the counts are all zero or all small before you run eBayes(). The limma pipeline assumes that you do some filtering first.

ADD COMMENT

Login before adding your answer.

Traffic: 904 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6