Apologies for this lengthy post...
A colleague asked me to analyze an Agilent 2-colour dataset that is available at GEO:
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE32210
I am using limma to do so. This dataset has a common reference design; on all arrays the different samples were always labelled with Cy3, and the reference pool always with Cy5 (so no dye-swaps).
I preprocess the data using standard limma recommendations (removal of positive control probes, bg-correction using normexp() with offset=50, normalizeWithinArrays() with method=loess, normalizeBetweenArrays() with method=Aquantile, whereafter all probes are removed with intensity <75% percentile of the neg control probes, and the neg control probes themselves). I end up with an object called "MA.q3"
> class(MA.q3) [1] "MAList" attr(,"package") [1] "limma" >
So far, so good!
However, I am confused how to properly setup the design for this experiment.
When I build the design using limma's function modelMatrix(), in which I set the reference pool as ref, I noticed all respective samples are assigned using a negative number 1 (-1). This is what confuses me.
> design <- modelMatrix(targets, ref="mouse_reference_RNA")
Found unique target names:
Adult_liver Adult_Pancreas BAT CCl4_Liver_day13 CCl4_Liver_day3 DM_Liver_organoid EM_Liver_organoid Hepatocytes Lgr5_sorted_cells Liver mouse_reference_RNA Muscle Sorted_duct WAT
> design
Adult_liver Adult_Pancreas BAT CCl4_Liver_day13 CCl4_Liver_day3
GSM1047592_US22502677_251486837777_S01_GE2_105_Dec08_1_1 0 0 0 -1 0
GSM1047593_US22502677_251486837777_S01_GE2_105_Dec08_1_2 0 0 0 -1 0
GSM1047594_US22502677_251486837777_S01_GE2_105_Dec08_1_3 0 0 0 0 -1
GSM1047595_US22502677_251486837777_S01_GE2_105_Dec08_1_4 0 0 0 0 -1
<<snip>>
GSM797991_US22502677_251486826593_S01_GE2_105_Dec08_1_4 0 0 -1 0 0
GSM797992_US22502677_251486825703_S01_GE2_105_Dec08_1_1 0 0 -1 0 0
GSM797993_US22502677_251486825703_S01_GE2_105_Dec08_1_4 0 0 -1 0 0
GSM797994_US22502677_251486830305_S01_GE2_105_Dec08_1_1 -1 0 0 0 0
GSM797995_US22502677_251486833702_S01_GE2_105_Dec08_1_1 -1 0 0 0 0
GSM797996_US22502677_251486830304_S01_GE2_105_Dec08_1_2 0 0 0 0 0
GSM797997_US22502677_251486834188_S01_GE2_105_Dec08_1_2 0 0 0 0 0
GSM797998_US22502677_251486834188_S01_GE2_105_Dec08_1_1 0 0 0 0 0
GSM797999_US22502677_251486834188_S01_GE2_105_Dec08_1_4 0 -1 0 0 0
GSM798000_US22502677_251486835936_S01_GE2_105_Dec08_1_4 0 -1 0 0 0
<<snip>>
I realize this happens because all samples are expressed relative to the reference, which is in the Cy5 channel.
But is such design (with "negative" group assignments) subsequently properly analyzed? Or should I first convert the design matrix to include only positive values?
For now I continued with the (negative) design matrix:
> colnames(design)
[1] "Adult_liver" "Adult_Pancreas" "BAT" "CCl4_Liver_day13" "CCl4_Liver_day3" "DM_Liver_organoid"
[7] "EM_Liver_organoid" "Hepatocytes" "Lgr5_sorted_cells" "Liver" "Muscle" "Sorted_duct"
[13] "WAT"
>
> cont.matrix <- makeContrasts(
+ EM_DM= (EM_Liver_organoid - DM_Liver_organoid),
+ AdultLiver_DM = (Adult_liver - DM_Liver_organoid),
+ Liver_DM = (Liver - DM_Liver_organoid),
+ Hepatocytes_DM = (Hepatocytes - DM_Liver_organoid),
+ levels=design)
> cont.matrix
Contrasts
Levels EM_DM AdultLiver_DM Liver_DM Hepatocytes_DM
Adult_liver 0 1 0 0
Adult_Pancreas 0 0 0 0
BAT 0 0 0 0
CCl4_Liver_day13 0 0 0 0
CCl4_Liver_day3 0 0 0 0
DM_Liver_organoid -1 -1 -1 -1
EM_Liver_organoid 1 0 0 0
Hepatocytes 0 0 0 1
Lgr5_sorted_cells 0 0 0 0
Liver 0 0 1 0
Muscle 0 0 0 0
Sorted_duct 0 0 0 0
WAT 0 0 0 0
>
>
> fit <- lmFit(MA.q3, design)
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit3 <- eBayes(fit2, trend=TRUE, robust=TRUE)
>
> topTable(fit3,coef=1)
Row Col ControlType ProbeName SystematicName logFC AveExpr t P.Value adj.P.Val B
31040 367 75 0 A_52_P410495 NM_026336 -7.000364 6.471554 -45.13807 1.646844e-21 6.182251e-17 34.82217
9652 115 7 0 A_51_P422751 NM_017474 -8.485838 6.758554 -37.27941 7.081291e-20 1.329158e-15 32.56920
31911 378 14 0 A_52_P670766 NM_027339 -5.972531 6.422511 -32.83067 8.527083e-19 1.067022e-14 30.88386
25110 297 63 0 A_52_P3070 NM_178669 -4.789018 6.468355 -31.16513 2.356391e-18 2.211473e-14 30.15357
8588 102 43 0 A_51_P229403 NM_027339 -6.958043 6.699792 -30.04494 4.809562e-18 3.611019e-14 29.62718
15296 181 72 0 A_51_P511000 NM_181323 -4.221551 6.577204 -26.42710 5.811238e-17 3.635898e-13 27.70479
18082 214 64 0 A_51_P490840 NM_146077 -5.140845 6.526875 -25.04015 1.647681e-16 8.836278e-13 26.86462
26826 318 5 0 A_51_P492047 NM_020622 -5.232215 6.649090 -23.36079 6.274612e-16 2.880247e-12 25.75803
32009 379 28 0 A_51_P134993 NM_207208 -4.540090 6.624202 -23.24466 6.905227e-16 2.880247e-12 25.67760
34718 411 30 0 A_51_P362292 NM_023529 -4.421324 6.754218 -23.03258 8.234346e-16 3.091173e-12 25.52937
>
By doing so, do I correctly interpret that in when comparing the EM samples vs the DM samples (1st contrast; EM_DM), the most significantly regulated gene is NM_026336, which is down-regulated in the EM versus DM samples (log2 FC= -7.00), and is NOT up-regulated because of the "negative" design matrix?
I am asking because this is not what I concluded when inspecting the M-values of the respective samples; there it seems that NM_0026336 (rownumber =26261) is actually higher expressed in the EM samples compared to the DM samples...
> colnames(MA.q3$A) <- targets$Cy3
> colnames(MA.q3$M) <- targets$Cy3
> MA.q3$M[26261,]
CCl4_Liver_day13 CCl4_Liver_day13 CCl4_Liver_day3 CCl4_Liver_day3 Sorted_duct Hepatocytes Liver
-0.05987034 0.14093043 -0.05372743 -0.16743075 -0.48689356 -0.03920677 -0.05760740
Lgr5_sorted_cells Sorted_duct Hepatocytes Liver Lgr5_sorted_cells Muscle Muscle
0.12492195 -0.41947946 0.07274417 0.09072236 0.17494198 -0.09277732 -0.32188229
Muscle WAT WAT WAT BAT BAT BAT
-0.25082762 -0.02929802 -0.01729277 0.03040873 0.02812201 0.03574133 0.01385537
Adult_liver Adult_liver EM_Liver_organoid EM_Liver_organoid DM_Liver_organoid Adult_Pancreas Adult_Pancreas
-0.13853968 0.14293178 -0.11902644 -0.38656530 -7.25316029 -0.07174871 0.07526561
> MA.q3$A[26261,]
CCl4_Liver_day13 CCl4_Liver_day13 CCl4_Liver_day3 CCl4_Liver_day3 Sorted_duct Hepatocytes Liver
6.256466 6.333689 6.342524 6.274528 7.544673 6.446924 6.256481
Lgr5_sorted_cells Sorted_duct Hepatocytes Liver Lgr5_sorted_cells Muscle Muscle
6.553766 6.574924 6.525871 6.288565 6.356142 6.228113 6.294111
Muscle WAT WAT WAT BAT BAT BAT
6.341889 6.299306 6.346015 6.234404 6.284631 6.315308 6.288100
Adult_liver Adult_liver EM_Liver_organoid EM_Liver_organoid DM_Liver_organoid Adult_Pancreas Adult_Pancreas
6.302537 6.342125 6.292963 6.203313 9.078806 6.293387 6.303954
>
Your insights on this are appreciated!
Thanks,
Guido

To be complete; here the relevant part of the targets file:
> targets[,c("Cy3", "Cy5")] Cy3 Cy5 GSM1047592_US22502677_251486837777_S01_GE2_105_Dec08_1_1 CCl4_Liver_day13 mouse_reference_RNA GSM1047593_US22502677_251486837777_S01_GE2_105_Dec08_1_2 CCl4_Liver_day13 mouse_reference_RNA GSM1047594_US22502677_251486837777_S01_GE2_105_Dec08_1_3 CCl4_Liver_day3 mouse_reference_RNA GSM1047595_US22502677_251486837777_S01_GE2_105_Dec08_1_4 CCl4_Liver_day3 mouse_reference_RNA GSM1047596_US22502677_251486836582_S01_GE2_105_Dec08_1_1 Sorted_duct mouse_reference_RNA GSM1047597_US22502677_251486836582_S01_GE2_105_Dec08_1_2 Hepatocytes mouse_reference_RNA GSM1047598_US22502677_251486836582_S01_GE2_105_Dec08_1_3 Liver mouse_reference_RNA GSM1047599_US22502677_251486836582_S01_GE2_105_Dec08_1_4 Lgr5_sorted_cells mouse_reference_RNA GSM1047600_US22502677_251486836583_S01_GE2_105_Dec08_1_1 Sorted_duct mouse_reference_RNA GSM1047601_US22502677_251486836583_S01_GE2_105_Dec08_1_2 Hepatocytes mouse_reference_RNA GSM1047602_US22502677_251486836583_S01_GE2_105_Dec08_1_3 Liver mouse_reference_RNA GSM1047603_US22502677_251486836583_S01_GE2_105_Dec08_1_4 Lgr5_sorted_cells mouse_reference_RNA GSM797985_US22502677_251486826589_S01_GE2_105_Dec08_1_2 Muscle mouse_reference_RNA GSM797986_US22502677_251486826589_S01_GE2_105_Dec08_1_3 Muscle mouse_reference_RNA GSM797987_US22502677_251486826590_S01_GE2_105_Dec08_1_2 Muscle mouse_reference_RNA GSM797988_US22502677_251486826590_S01_GE2_105_Dec08_1_4 WAT mouse_reference_RNA GSM797989_US22502677_251486826591_S01_GE2_105_Dec08_1_1 WAT mouse_reference_RNA GSM797990_US22502677_251486826591_S01_GE2_105_Dec08_1_4 WAT mouse_reference_RNA GSM797991_US22502677_251486826593_S01_GE2_105_Dec08_1_4 BAT mouse_reference_RNA GSM797992_US22502677_251486825703_S01_GE2_105_Dec08_1_1 BAT mouse_reference_RNA GSM797993_US22502677_251486825703_S01_GE2_105_Dec08_1_4 BAT mouse_reference_RNA GSM797994_US22502677_251486830305_S01_GE2_105_Dec08_1_1 Adult_liver mouse_reference_RNA GSM797995_US22502677_251486833702_S01_GE2_105_Dec08_1_1 Adult_liver mouse_reference_RNA GSM797996_US22502677_251486830304_S01_GE2_105_Dec08_1_2 EM_Liver_organoid mouse_reference_RNA GSM797997_US22502677_251486834188_S01_GE2_105_Dec08_1_2 EM_Liver_organoid mouse_reference_RNA GSM797998_US22502677_251486834188_S01_GE2_105_Dec08_1_1 DM_Liver_organoid mouse_reference_RNA GSM797999_US22502677_251486834188_S01_GE2_105_Dec08_1_4 Adult_Pancreas mouse_reference_RNA GSM798000_US22502677_251486835936_S01_GE2_105_Dec08_1_4 Adult_Pancreas mouse_reference_RNA >