Question: LIMMA gives different logFC
0
3.0 years ago by
beginner0
beginner0 wrote:

Hello,

I'm facing serious problem trying to recalculate the results of a given Series from GEO. For better understanding I will just describe the experiment briefly

There are 22 samples, two samples are always a replicate. And the first two belong to the control group. I managed to calculate the mean value of each two replicates, logFC , the logratio in comparison to control and also the fold change. In order to find differential expressed genes (twofold up or down compared to control) that are common in at least 9 of 10 sample group I used LIMMA in R.

My code

samples <- as.factor(samples)
design <- model.matrix(~0 + samples)

fit <- lmFit(exprSet, design)

contrast.matrix <- makeContrasts(RF_control= control-RF, LNIT_control= control-LNIT, REC_control= control-REC, LIP_control=control-LIP, BUR_control= control-BUR,BRI_control=control-BRI, UL_control=control-UL, FF_control=control-FF, LT_control=control-LT, LTMAS_control= control-LTMAS, levels=design  )

fits <- contrasts.fit(fit, contrast.matrix)
eFit <- eBayes(fits)

topTable(eFit, number=10, coef=1)
nrow(topTable(eFit, coef=1, number=10000, lfc=2))

probeset.list <- topTable(eFit, coef=1, number=10000, lfc=2)
gene.symbols <- getSYMBOL(rownames(probeset.list), "hgu133plus2")
​
results <- cbind(probeset.list, gene.symbols)
write.table(results, "results1.txt", sep="\t", quote=FALSE)

I compared the  logFC  generated by LIMMA of some genes with the logFC that i have calculated before(which are definitely right) and they are different.  Is LIMMA used wrongly?

And how to I combine the different data from each coefficient, sind they result in different row numbers

Thank you very much!

limma logfc • 553 views
modified 3.0 years ago by Aaron Lun23k • written 3.0 years ago by beginner0
1
3.0 years ago by
Aaron Lun23k
Cambridge, United Kingdom
Aaron Lun23k wrote:

For starters, it seems that you haven't renamed the columns of your design matrix. If your samples vector contains levels like control, RF, etc. then your design matrix should have column names of samplescontrol, samplesRF, etc. In fact, I'm surprised that makeContrasts didn't report an error when you ran it, because those column names shouldn't be present in the design matrix as you've shown it.

If renaming the columns doesn't change anything, give us an example of a gene, its expression values across the 22 samples (plus the grouping of those samples), the log-fold change that limma reports, and the log-fold change that you think is "definitely right". This will allow us to immediately figure out what's going wrong. However, please don't put the information into a separate answer; respond as a comment to this post.

Finally, if you want to collate DE results from different comparisons (i.e., after dropping different coefficients), I would suggest setting sort.by="none" and n=Inf in topTable. This will return results for all genes in the same order as the rows in the original matrix of expression values. The order will be the same across comparisons, so you can directly compare them.

Thank you very much for helping! I found the error :R somehow confused the target, so samples have different targets, which means when i compared them to the control group, the comparison was done to another group

> samples <-targetcelfiles$Target > samples [1] "control" "control" "RF" "RF" "LNIT" "LNIT" [7] "REC" "REC" "LIP" "LIP" "BUR" "BUR" [13] "BRI" "BRI" "UL" "UL" "FF" "FF" [19] "LT" "LT" "LTMAS" "LTMAS" > samples <- as.factor(samples) > samples [1] control control RF RF LNIT LNIT REC REC [9] LIP LIP BUR BUR BRI BRI UL UL [17] FF FF LT LT LTMAS LTMAS Levels: BRI BUR control FF LIP LNIT LT LTMAS REC RF UL > design control RF LNIT REC LIP BUR BRI UL FF LT LTMAS 1 0 0 1 0 0 0 0 0 0 0 0 2 0 0 1 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 1 0 4 0 0 0 0 0 0 0 0 0 1 0 5 0 0 0 0 0 1 0 0 0 0 0 6 0 0 0 0 0 1 0 0 0 0 0 7 0 0 0 0 0 0 0 0 1 0 0 8 0 0 0 0 0 0 0 0 1 0 0 9 0 0 0 0 1 0 0 0 0 0 0 10 0 0 0 0 1 0 0 0 0 0 0 11 0 1 0 0 0 0 0 0 0 0 0 12 0 1 0 0 0 0 0 0 0 0 0 13 1 0 0 0 0 0 0 0 0 0 0 14 1 0 0 0 0 0 0 0 0 0 0 15 0 0 0 0 0 0 0 0 0 0 1 16 0 0 0 0 0 0 0 0 0 0 1 17 0 0 0 1 0 0 0 0 0 0 0 18 0 0 0 1 0 0 0 0 0 0 0 19 0 0 0 0 0 0 1 0 0 0 0 20 0 0 0 0 0 0 1 0 0 0 0 21 0 0 0 0 0 0 0 1 0 0 0 22 0 0 0 0 0 0 0 1 0 0 0 attr(,"assign") [1] 1 1 1 1 1 1 1 1 1 1 1 attr(,"contrasts") attr(,"contrasts")$samples
[1] "contr.treatment"

all target are being changed in 'design'. I don't get it, because my phenodata looks like this:

FFName    FileName    Target
GSM455792.CEL    GSM455792.CEL    control
GSM455793.CEL    GSM455793.CEL    control
GSM455794.CEL    GSM455794.CEL    RF
GSM455795.CEL    GSM455795.CEL    RF
GSM455796.CEL    GSM455796.CEL    LNIT
GSM455797.CEL    GSM455797.CEL    LNIT
GSM455798.CEL    GSM455798.CEL    REC
GSM455799.CEL    GSM455799.CEL    REC
GSM455800.CEL    GSM455800.CEL    LIP
GSM455801.CEL    GSM455801.CEL    LIP
GSM455802.CEL    GSM455802.CEL    BUR
GSM455803.CEL    GSM455803.CEL    BUR
GSM455804.CEL    GSM455804.CEL    BRI
GSM455805.CEL    GSM455805.CEL    BRI
GSM455806.CEL    GSM455806.CEL    UL
GSM455807.CEL    GSM455807.CEL    UL
GSM455808.CEL    GSM455808.CEL    FF
GSM455809.CEL    GSM455809.CEL    FF
GSM455810.CEL    GSM455810.CEL    LT
GSM455811.CEL    GSM455811.CEL    LT
GSM455812.CEL    GSM455812.CEL    LTMAS
GSM455813.CEL    GSM455813.CEL    LTMAS

1

You seem to have incorrectly renamed your columns for design. I'm guessing that you named your columns based on unique values in samples, which is wrong. You should be doing:

design <- model.matrix(~0+samples)
colnames(design) <- levels(samples)

Regarding collating DE results

did you mean it like this :

topTable(eFit, number=10, coef=1, sort.by="none", sort.by="none")

topTable(eFit, number=10, coef=2, sort.by="none", sort.by="none")

...

..

...

Is there way to tell R to go through all coefficient at once?

1

It's called a loop:

all.results <- list()
for (con in colnames(contrast.matrix)) {
all.results[[con]] <- topTable(eFit, n=Inf, coef=con, sort.by="none")
}

I'm so sorry for disturbing again ...

I used your for loop. It works as long as I don't define lfc, which I need to define, because I'm only interested intwofold changed genes, but since I get different row numbers I cannot combine them to one table. I tried to do it with this method

#  list.df <- list (all.results)
#
# max.rows <- max(unlist(lapply(list.df, nrow), use.names = F))
#
# list.df <- lapply(list.df, function(x) {
#   na.count <- max.rows - nrow(x)
#   if (na.count > 0L) {
#     na.dm <- matrix(NA, na.count, ncol(x))
#     colnames(na.dm) <- colnames(x)
#     rbind(x, na.dm)
#   } else {
#     x
#   }
# })

But then it reorder the transcripts, which makes the  whole table not useful anymore. Do you know a solution?

Well, for starters, don't use lfc, use treat instead. See the "Note" in ?topTable.

I get this Error in topTable(eFit, number = 10, coef = 1, sort.by = "none", n = Inf) :
unused argument (n = Inf)

1

n and number refer to the same argument, which can't be specified twice. Look at ?topTable for more details.