Search
Question: Limma RNAseq comparison with Cuffdiff
0
gravatar for Biorunner88
5 weeks ago by
Biorunner8810
Spain
Biorunner8810 wrote:

I've the output of a differential expression analysis for RNA done with cuffdiff and I'm trying to check it with limma but I'm not getting the results that I would expect.

w164 <- read.table("COUNTfile.txt", header  = T, row.names = 1)

head(w164)

         WM164_NEG_1 WM164_NEG_2 WM164_NEG_3 WM164_MDK_1 WM164_MDK_2 WM164_MDK_3
A1BG             125         152         145         127         121         114
A1BG-AS1          54          55          58          74          69          94
A1CF               0           0           0           0           0           0
A2LD1             91          85          81          56          58          69
A2M             6368        5305       10631        6560        6017        5833
A2ML1              2           1           8           1           4           4

 

Then I do try to do de differential expression analysis, I've tried to change the coef (1, 2) and the intercept (0, 1)

 

dge <- DGEList(counts=w164)
dge <- calcNormFactors(dge)
logCPM <- cpm(dge, log=TRUE, prior.count=1)

classes <- factor(c("NEG","NEG","NEG", "MDK","MDK","MDK"))
design <- model.matrix(~ 0 + classes)
#colnames(design) <- c("NEG", "MDK")

fit <- lmFit(logCPM, design)
fit <- eBayes(fit, trend=TRUE)
toptable <- topTable(fit, coef=2, number=nrow(fit$p.value))

This is the output of our pipeline which used cuffdiff (just for the two more significant genes).

gene
sample_1
sample_2
FPKM_1
FPKM_2
log2(fold_change)
test_stat
q_value
AASS
WM164_NEG
WM164_MDK
4.10347
2.48356
-0.724435
-2.3669
0.000702473
ABCB9
WM164_NEG
WM164_MDK
5.60671
10.4515
0.898485
2.95674
0.000702473

This is the top table for both genes when coef = 1

        logFC  AveExpr        t      P.Value    adj.P.Val        B
AASS 4.366231 4.061036 31.65013 1.372641e-06 2.595976e-06 5.552753

ABCB9 4.292285 4.720068 18.24541 1.725563e-05 2.794191e-05 2.497849

 

When coef = 2

        logFC  AveExpr        t      P.Value    adj.P.Val        B
AASS 3.755841 4.061036 27.22551 2.747115e-06 4.929324e-06 4.717309
ABCB9 5.147851 4.720068 21.88221 7.501512e-06 1.272469e-05 3.50482
 
The logFC seems to be the average of the values (from logCPM) from one of the conditions.

What I'm doing wrong?

Thanks

ADD COMMENTlink modified 5 weeks ago by Gordon Smyth32k • written 5 weeks ago by Biorunner8810

Update, I've modify the classes vector such

classes <- factor(c("NEG","NEG","NEG", "MDK","MDK","MDK"), as.character(c("NEG", "MDK")))

Now logFC are quite close to the ones obtained in cuffdiff but none of the top 7 significant genes from cuffdiff are significant in limma.

3157 significant genes in cuffdiff

117 in limma

intersection of 29 genes

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Biorunner8810
0
gravatar for Gordon Smyth
5 weeks ago by
Gordon Smyth32k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth32k wrote:

Two things:

First you need to define the design matrix as

design <- model.matrix(~ classes)

The use of "0+" in your formula is out of place and is causing you to test the wrong hypothesis. You are not actually testing for DE between MDK and NEG at all.

Second, you need to filter out genes for which the counts are all zero or all small before you run eBayes(). The limma pipeline assumes that you do some filtering first.

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Gordon Smyth32k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 287 users visited in the last hour