Search
Question: Strong upwards correlation between log fold change and average log expression in MD plot of Limma
0
7 months ago by
gil.hornung0 wrote:

Hi,

I am trying to analyse my proteomics data (intensity based label free quantification) using Limma.

When I look at the MD plot after limma fit, I see a strong correlation between log-fold-change and average-log-expression. What could cause this odd behaviour and how can I fix it?

Thanks,

Gil

The code I used is:

> imat <- read.delim("matrix.tsv", sep = "\t", row.names = 1) %>%
+   as.matrix()
X51      X56      X57      X61      X65      X67      X72      X74      X77      X83      X85      X88
7966  1670300  1057800  1261700  1319300  1273000  1371400   950630  1920300   656870   706280  1157600   860140
7580  2727900  1536000  2675700  2168800  2246800  2530700  7548400  3874900  1761800  2203100  2815500  1867300
5068 27261000 19086000 19027000 16980000 24905000 29381000 27716000 24395000 15292000 16111000 13618000 14989000
4178   124790   372080        0   240420        0        0   492620   495800  2050400   408840  5791100  4327800
1038  1076100   962340  1384800        0  1395300        0   910440        0   714210        0        0   465820
1117        0   154830   159230        0        0        0   221510        0        0   133880   243150   881710
> sample_info <- read.delim("sample_desc.tsv",sep = "\t",stringsAsFactors = T)
Sample Group
1    X51     A
2    X56     A
3    X57     A
4    X61     A
5    X65     A
6    X67     A
>
> design <- model.matrix(~0+sample_info$Group) > colnames(design) <- levels(sample_info$Group)
>
> limma_fit = lmFit(log2(imat+1), design = design)
> plotMD(limma_fit)

session info:

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.8 (Santiago)

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] ecoliLeucine_1.10.0  ecolicdf_2.18.0      affy_1.48.0          Biobase_2.30.0       BiocGenerics_0.16.1  BiocInstaller_1.20.3
[7] plotly_4.5.6         ggdendro_0.1-20      gridExtra_2.2.1      purrr_0.2.2          ggplot2_2.2.1        dplyr_0.5.0
[13] tidyr_0.6.0          tibble_1.2           limma_3.26.9         knitr_1.15.1

loaded via a namespace (and not attached):
[1] colorspace_1.3-2      amap_0.8-14           htmltools_0.3.5       stats4_3.2.1          viridisLite_0.1.3     yaml_2.1.14
[7] base64enc_0.1-3       DBI_0.6               bit64_0.9-5           RColorBrewer_1.1-2    affyio_1.40.0         matrixStats_0.51.0
[13] plyr_1.8.4            stringr_1.2.0         zlibbioc_1.16.0       munsell_0.4.3         gtable_0.2.0          htmlwidgets_0.8
[19] memoise_1.0.0         evaluate_0.10         labeling_0.3          IRanges_2.4.8         AnnotationDbi_1.32.3  preprocessCore_1.32.0
[25] highr_0.6             Rcpp_0.12.10          edgeR_3.12.1          scales_0.4.1          S4Vectors_0.8.11      jsonlite_1.0
[31] bit_1.1-12            digest_0.6.12         stringi_1.1.2         grid_3.2.1            tools_3.2.1           magrittr_1.5
[37] lazyeval_0.2.0        RSQLite_1.1-2         MASS_7.3-45           data.table_1.10.4     assertthat_0.1        httr_1.2.1
[43] R6_2.2.0
modified 7 months ago by Gordon Smyth34k • written 7 months ago by gil.hornung0
3
7 months ago by
United States
James W. MacDonald46k wrote:

From ?plotMD, specifically the Details section

     If 'object' is an 'MArrayLM' object, then the plot is an fitted
model MD-plot in which the estimated coefficient is on the y-axis
and the average A-value is on the x-axis.

Since you have fit a cell means model (e.g., your coefficients estimate the mean of each group), what you are plotting as the 'logFC' is actually just the mean of the group, rather than the fold change. You need instead to either define a contrasts matrix and use contrasts.fit to compute the contrasts, or use a design with an intercept so your coefficients are already inherently contrasts.

2
7 months ago by
Gordon Smyth34k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth34k wrote:

Just to reinforce James' answer, the problem is that you haven't actually computed any log-fold-changes yet. So you're just plotting one mean against another.

Why did you include the "0+" in your model? Presumably you got that from reading a suggested analysis somewhere. If so, then go back to to the analysis and read what the next step was. If you don't already have some advice to read, then try Section 9.2 of the limma User's Guide.

I used "1+" in my model and it worked fine.

Al the best ,

Gil