Question

LIMMA gives different logFC

0

Entering edit mode

beginner • 0

@beginner-10655

Last seen 8.8 years ago

Hello,

I'm facing serious problem trying to recalculate the results of a given Series from GEO. For better understanding I will just describe the experiment briefly

There are 22 samples, two samples are always a replicate. And the first two belong to the control group. I managed to calculate the mean value of each two replicates, logFC , the logratio in comparison to control and also the fold change. In order to find differential expressed genes (twofold up or down compared to control) that are common in at least 9 of 10 sample group I used LIMMA in R.

My code

samples <- as.factor(samples)
design <- model.matrix(~0 + samples)

fit <- lmFit(exprSet, design)

contrast.matrix <- makeContrasts(RF_control= control-RF, LNIT_control= control-LNIT, REC_control= control-REC, LIP_control=control-LIP, BUR_control= control-BUR,BRI_control=control-BRI, UL_control=control-UL, FF_control=control-FF, LT_control=control-LT, LTMAS_control= control-LTMAS, levels=design  )

fits <- contrasts.fit(fit, contrast.matrix)
eFit <- eBayes(fits)

topTable(eFit, number=10, coef=1)
nrow(topTable(eFit, coef=1, number=10000, lfc=2))

probeset.list <- topTable(eFit, coef=1, number=10000, lfc=2)
gene.symbols <- getSYMBOL(rownames(probeset.list), "hgu133plus2")

results <- cbind(probeset.list, gene.symbols)
write.table(results, "results1.txt", sep="\t", quote=FALSE)

I compared the logFC generated by LIMMA of some genes with the logFC that i have calculated before(which are definitely right) and they are different. Is LIMMA used wrongly?

And how to I combine the different data from each coefficient, sind they result in different row numbers

Thank you very much!

limma LogFC • 2.8k views

ADD COMMENT • link updated 8.8 years ago by Aaron Lun ★ 28k • written 8.8 years ago by beginner • 0

score 1 · Answer 1 · 2016-05-08

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 10 minutes ago

The city by the bay

For starters, it seems that you haven't renamed the columns of your design matrix. If your samples vector contains levels like control, RF, etc. then your design matrix should have column names of samplescontrol, samplesRF, etc. In fact, I'm surprised that makeContrasts didn't report an error when you ran it, because those column names shouldn't be present in the design matrix as you've shown it.

If renaming the columns doesn't change anything, give us an example of a gene, its expression values across the 22 samples (plus the grouping of those samples), the log-fold change that limma reports, and the log-fold change that you think is "definitely right". This will allow us to immediately figure out what's going wrong. However, please don't put the information into a separate answer; respond as a comment to this post.

Finally, if you want to collate DE results from different comparisons (i.e., after dropping different coefficients), I would suggest setting sort.by="none" and n=Inf in topTable. This will return results for all genes in the same order as the rows in the original matrix of expression values. The order will be the same across comparisons, so you can directly compare them.

ADD COMMENT • link 8.8 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank you very much for helping! I found the error :R somehow confused the target, so samples have different targets, which means when i compared them to the control group, the comparison was done to another group

> samples <-targetcelfiles$Target
> samples
 [1] "control" "control" "RF"      "RF"      "LNIT"    "LNIT"   
 [7] "REC"     "REC"     "LIP"     "LIP"     "BUR"     "BUR"    
[13] "BRI"     "BRI"     "UL"      "UL"      "FF"      "FF"     
[19] "LT"      "LT"      "LTMAS"   "LTMAS"  
> samples <- as.factor(samples)
> samples
 [1] control control RF      RF      LNIT    LNIT    REC     REC    
 [9] LIP     LIP     BUR     BUR     BRI     BRI     UL      UL     
[17] FF      FF      LT      LT      LTMAS   LTMAS  
Levels: BRI BUR control FF LIP LNIT LT LTMAS REC RF UL

> design
   control RF LNIT REC LIP BUR BRI UL FF LT LTMAS
1        0  0    1   0   0   0   0  0  0  0     0
2        0  0    1   0   0   0   0  0  0  0     0
3        0  0    0   0   0   0   0  0  0  1     0
4        0  0    0   0   0   0   0  0  0  1     0
5        0  0    0   0   0   1   0  0  0  0     0
6        0  0    0   0   0   1   0  0  0  0     0
7        0  0    0   0   0   0   0  0  1  0     0
8        0  0    0   0   0   0   0  0  1  0     0
9        0  0    0   0   1   0   0  0  0  0     0
10       0  0    0   0   1   0   0  0  0  0     0
11       0  1    0   0   0   0   0  0  0  0     0
12       0  1    0   0   0   0   0  0  0  0     0
13       1  0    0   0   0   0   0  0  0  0     0
14       1  0    0   0   0   0   0  0  0  0     0
15       0  0    0   0   0   0   0  0  0  0     1
16       0  0    0   0   0   0   0  0  0  0     1
17       0  0    0   1   0   0   0  0  0  0     0
18       0  0    0   1   0   0   0  0  0  0     0
19       0  0    0   0   0   0   1  0  0  0     0
20       0  0    0   0   0   0   1  0  0  0     0
21       0  0    0   0   0   0   0  1  0  0     0
22       0  0    0   0   0   0   0  1  0  0     0
attr(,"assign")
 [1] 1 1 1 1 1 1 1 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$samples
[1] "contr.treatment"

all target are being changed in 'design'. I don't get it, because my phenodata looks like this:

FFName    FileName    Target
GSM455792.CEL    GSM455792.CEL    control
GSM455793.CEL    GSM455793.CEL    control
GSM455794.CEL    GSM455794.CEL    RF
GSM455795.CEL    GSM455795.CEL    RF
GSM455796.CEL    GSM455796.CEL    LNIT        
GSM455797.CEL    GSM455797.CEL    LNIT
GSM455798.CEL    GSM455798.CEL    REC
GSM455799.CEL    GSM455799.CEL    REC
GSM455800.CEL    GSM455800.CEL    LIP
GSM455801.CEL    GSM455801.CEL    LIP
GSM455802.CEL    GSM455802.CEL    BUR
GSM455803.CEL    GSM455803.CEL    BUR
GSM455804.CEL    GSM455804.CEL    BRI
GSM455805.CEL    GSM455805.CEL    BRI
GSM455806.CEL    GSM455806.CEL    UL
GSM455807.CEL    GSM455807.CEL    UL
GSM455808.CEL    GSM455808.CEL    FF
GSM455809.CEL    GSM455809.CEL    FF
GSM455810.CEL    GSM455810.CEL    LT
GSM455811.CEL    GSM455811.CEL    LT    
GSM455812.CEL    GSM455812.CEL    LTMAS
GSM455813.CEL    GSM455813.CEL    LTMAS

ADD REPLY • link 8.8 years ago beginner • 0

1

Entering edit mode

You seem to have incorrectly renamed your columns for design. I'm guessing that you named your columns based on unique values in samples, which is wrong. You should be doing:

design <- model.matrix(~0+samples)
colnames(design) <- levels(samples)

ADD REPLY • link 8.8 years ago Aaron Lun ★ 28k

0

Entering edit mode

Regarding collating DE results

did you mean it like this :

topTable(eFit, number=10, coef=1, sort.by="none", sort.by="none")

topTable(eFit, number=10, coef=2, sort.by="none", sort.by="none")

...

..

...

Is there way to tell R to go through all coefficient at once?

ADD REPLY • link 8.8 years ago beginner • 0

1

Entering edit mode

It's called a loop:

all.results <- list()
for (con in colnames(contrast.matrix)) {
    all.results[[con]] <- topTable(eFit, n=Inf, coef=con, sort.by="none")
}

ADD REPLY • link 8.8 years ago Aaron Lun ★ 28k

0

Entering edit mode

I'm so sorry for disturbing again ...

I used your for loop. It works as long as I don't define lfc, which I need to define, because I'm only interested intwofold changed genes, but since I get different row numbers I cannot combine them to one table. I tried to do it with this method

#  list.df <- list (all.results)
#                  
# max.rows <- max(unlist(lapply(list.df, nrow), use.names = F))
# 
# list.df <- lapply(list.df, function(x) {
#   na.count <- max.rows - nrow(x)
#   if (na.count > 0L) {
#     na.dm <- matrix(NA, na.count, ncol(x))
#     colnames(na.dm) <- colnames(x)
#     rbind(x, na.dm)
#   } else {
#     x
#   }
# })

But then it reorder the transcripts, which makes the whole table not useful anymore. Do you know a solution?

ADD REPLY • link 8.8 years ago beginner • 0

0

Entering edit mode

Well, for starters, don't use lfc, use treat instead. See the "Note" in ?topTable.

ADD REPLY • link 8.8 years ago Aaron Lun ★ 28k

0

Entering edit mode

I get this Error in topTable(eFit, number = 10, coef = 1, sort.by = "none", n = Inf) :
unused argument (n = Inf)

ADD REPLY • link 8.8 years ago beginner • 0

1

Entering edit mode

n and number refer to the same argument, which can't be specified twice. Look at ?topTable for more details.

ADD REPLY • link 8.8 years ago Aaron Lun ★ 28k