Question

how to set design for 2-dye Agilent array experiment with common reference group?

0

Entering edit mode

Guido Hooiveld ★ 3.9k

@guido-hooiveld-2020

Last seen 18 hours ago

Wageningen University, Wageningen, the …

Apologies for this lengthy post...

A colleague asked me to analyze an Agilent 2-colour dataset that is available at GEO:

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE32210

I am using limma to do so. This dataset has a common reference design; on all arrays the different samples were always labelled with Cy3, and the reference pool always with Cy5 (so no dye-swaps).

I preprocess the data using standard limma recommendations (removal of positive control probes, bg-correction using normexp() with offset=50, normalizeWithinArrays() with method=loess, normalizeBetweenArrays() with method=Aquantile, whereafter all probes are removed with intensity <75% percentile of the neg control probes, and the neg control probes themselves). I end up with an object called "MA.q3"

> class(MA.q3)
[1] "MAList"
attr(,"package")
[1] "limma"
>

So far, so good!

However, I am confused how to properly setup the design for this experiment.

When I build the design using limma's function modelMatrix(), in which I set the reference pool as ref, I noticed all respective samples are assigned using a negative number 1 (-1). This is what confuses me.

> design <- modelMatrix(targets, ref="mouse_reference_RNA")
Found unique target names:
 Adult_liver Adult_Pancreas BAT CCl4_Liver_day13 CCl4_Liver_day3 DM_Liver_organoid EM_Liver_organoid Hepatocytes Lgr5_sorted_cells Liver mouse_reference_RNA Muscle Sorted_duct WAT

> design
                                                         Adult_liver Adult_Pancreas BAT CCl4_Liver_day13 CCl4_Liver_day3
GSM1047592_US22502677_251486837777_S01_GE2_105_Dec08_1_1           0              0   0               -1               0
GSM1047593_US22502677_251486837777_S01_GE2_105_Dec08_1_2           0              0   0               -1               0
GSM1047594_US22502677_251486837777_S01_GE2_105_Dec08_1_3           0              0   0                0              -1
GSM1047595_US22502677_251486837777_S01_GE2_105_Dec08_1_4           0              0   0                0              -1
<<snip>>
GSM797991_US22502677_251486826593_S01_GE2_105_Dec08_1_4            0              0  -1                0               0
GSM797992_US22502677_251486825703_S01_GE2_105_Dec08_1_1            0              0  -1                0               0
GSM797993_US22502677_251486825703_S01_GE2_105_Dec08_1_4            0              0  -1                0               0
GSM797994_US22502677_251486830305_S01_GE2_105_Dec08_1_1           -1              0   0                0               0
GSM797995_US22502677_251486833702_S01_GE2_105_Dec08_1_1           -1              0   0                0               0
GSM797996_US22502677_251486830304_S01_GE2_105_Dec08_1_2            0              0   0                0               0
GSM797997_US22502677_251486834188_S01_GE2_105_Dec08_1_2            0              0   0                0               0
GSM797998_US22502677_251486834188_S01_GE2_105_Dec08_1_1            0              0   0                0               0
GSM797999_US22502677_251486834188_S01_GE2_105_Dec08_1_4            0             -1   0                0               0
GSM798000_US22502677_251486835936_S01_GE2_105_Dec08_1_4            0             -1   0                0               0
<<snip>>

I realize this happens because all samples are expressed relative to the reference, which is in the Cy5 channel.

But is such design (with "negative" group assignments) subsequently properly analyzed? Or should I first convert the design matrix to include only positive values?

For now I continued with the (negative) design matrix:

> colnames(design)
 [1] "Adult_liver"       "Adult_Pancreas"    "BAT"               "CCl4_Liver_day13"  "CCl4_Liver_day3"   "DM_Liver_organoid"
 [7] "EM_Liver_organoid" "Hepatocytes"       "Lgr5_sorted_cells" "Liver"             "Muscle"            "Sorted_duct"      
[13] "WAT"              
>

> cont.matrix <- makeContrasts(
+ EM_DM= (EM_Liver_organoid - DM_Liver_organoid),
+ AdultLiver_DM = (Adult_liver - DM_Liver_organoid),
+ Liver_DM = (Liver - DM_Liver_organoid),
+ Hepatocytes_DM = (Hepatocytes - DM_Liver_organoid),
+ levels=design)
> cont.matrix
                   Contrasts
Levels              EM_DM AdultLiver_DM Liver_DM Hepatocytes_DM
  Adult_liver           0             1        0              0
  Adult_Pancreas        0             0        0              0
  BAT                   0             0        0              0
  CCl4_Liver_day13      0             0        0              0
  CCl4_Liver_day3       0             0        0              0
  DM_Liver_organoid    -1            -1       -1             -1
  EM_Liver_organoid     1             0        0              0
  Hepatocytes           0             0        0              1
  Lgr5_sorted_cells     0             0        0              0
  Liver                 0             0        1              0
  Muscle                0             0        0              0
  Sorted_duct           0             0        0              0
  WAT                   0             0        0              0
>

>
> fit <- lmFit(MA.q3, design)
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit3 <- eBayes(fit2, trend=TRUE, robust=TRUE)
>
> topTable(fit3,coef=1)
      Row Col ControlType    ProbeName SystematicName     logFC  AveExpr         t      P.Value    adj.P.Val        B
31040 367  75           0 A_52_P410495      NM_026336 -7.000364 6.471554 -45.13807 1.646844e-21 6.182251e-17 34.82217
9652  115   7           0 A_51_P422751      NM_017474 -8.485838 6.758554 -37.27941 7.081291e-20 1.329158e-15 32.56920
31911 378  14           0 A_52_P670766      NM_027339 -5.972531 6.422511 -32.83067 8.527083e-19 1.067022e-14 30.88386
25110 297  63           0   A_52_P3070      NM_178669 -4.789018 6.468355 -31.16513 2.356391e-18 2.211473e-14 30.15357
8588  102  43           0 A_51_P229403      NM_027339 -6.958043 6.699792 -30.04494 4.809562e-18 3.611019e-14 29.62718
15296 181  72           0 A_51_P511000      NM_181323 -4.221551 6.577204 -26.42710 5.811238e-17 3.635898e-13 27.70479
18082 214  64           0 A_51_P490840      NM_146077 -5.140845 6.526875 -25.04015 1.647681e-16 8.836278e-13 26.86462
26826 318   5           0 A_51_P492047      NM_020622 -5.232215 6.649090 -23.36079 6.274612e-16 2.880247e-12 25.75803
32009 379  28           0 A_51_P134993      NM_207208 -4.540090 6.624202 -23.24466 6.905227e-16 2.880247e-12 25.67760
34718 411  30           0 A_51_P362292      NM_023529 -4.421324 6.754218 -23.03258 8.234346e-16 3.091173e-12 25.52937
>

By doing so, do I correctly interpret that in when comparing the EM samples vs the DM samples (1st contrast; EM_DM), the most significantly regulated gene is NM_026336, which is down-regulated in the EM versus DM samples (log2 FC= -7.00), and is NOT up-regulated because of the "negative" design matrix?

I am asking because this is not what I concluded when inspecting the M-values of the respective samples; there it seems that NM_0026336 (rownumber =26261) is actually higher expressed in the EM samples compared to the DM samples...

> colnames(MA.q3$A) <- targets$Cy3
> colnames(MA.q3$M) <- targets$Cy3
> MA.q3$M[26261,]
 CCl4_Liver_day13  CCl4_Liver_day13   CCl4_Liver_day3   CCl4_Liver_day3       Sorted_duct       Hepatocytes             Liver
      -0.05987034        0.14093043       -0.05372743       -0.16743075       -0.48689356       -0.03920677       -0.05760740
Lgr5_sorted_cells       Sorted_duct       Hepatocytes             Liver Lgr5_sorted_cells            Muscle            Muscle
       0.12492195       -0.41947946        0.07274417        0.09072236        0.17494198       -0.09277732       -0.32188229
           Muscle               WAT               WAT               WAT               BAT               BAT               BAT
      -0.25082762       -0.02929802       -0.01729277        0.03040873        0.02812201        0.03574133        0.01385537
      Adult_liver       Adult_liver EM_Liver_organoid EM_Liver_organoid DM_Liver_organoid    Adult_Pancreas    Adult_Pancreas
      -0.13853968        0.14293178       -0.11902644       -0.38656530       -7.25316029       -0.07174871        0.07526561

> MA.q3$A[26261,]
 CCl4_Liver_day13  CCl4_Liver_day13   CCl4_Liver_day3   CCl4_Liver_day3       Sorted_duct       Hepatocytes             Liver
         6.256466          6.333689          6.342524          6.274528          7.544673          6.446924          6.256481
Lgr5_sorted_cells       Sorted_duct       Hepatocytes             Liver Lgr5_sorted_cells            Muscle            Muscle
         6.553766          6.574924          6.525871          6.288565          6.356142          6.228113          6.294111
           Muscle               WAT               WAT               WAT               BAT               BAT               BAT
         6.341889          6.299306          6.346015          6.234404          6.284631          6.315308          6.288100
      Adult_liver       Adult_liver EM_Liver_organoid EM_Liver_organoid DM_Liver_organoid    Adult_Pancreas    Adult_Pancreas
         6.302537          6.342125          6.292963          6.203313          9.078806          6.293387          6.303954
>

Your insights on this are appreciated!

Thanks,

Guido

limma 2-dye agilent common reference design • 1.3k views

ADD COMMENT • link updated 8.4 years ago by James W. MacDonald 65k • written 8.4 years ago by Guido Hooiveld ★ 3.9k

0

Entering edit mode

To be complete; here the relevant part of the targets file:

> targets[,c("Cy3", "Cy5")]
                                                                       Cy3                 Cy5
GSM1047592_US22502677_251486837777_S01_GE2_105_Dec08_1_1  CCl4_Liver_day13 mouse_reference_RNA
GSM1047593_US22502677_251486837777_S01_GE2_105_Dec08_1_2  CCl4_Liver_day13 mouse_reference_RNA
GSM1047594_US22502677_251486837777_S01_GE2_105_Dec08_1_3   CCl4_Liver_day3 mouse_reference_RNA
GSM1047595_US22502677_251486837777_S01_GE2_105_Dec08_1_4   CCl4_Liver_day3 mouse_reference_RNA
GSM1047596_US22502677_251486836582_S01_GE2_105_Dec08_1_1       Sorted_duct mouse_reference_RNA
GSM1047597_US22502677_251486836582_S01_GE2_105_Dec08_1_2       Hepatocytes mouse_reference_RNA
GSM1047598_US22502677_251486836582_S01_GE2_105_Dec08_1_3             Liver mouse_reference_RNA
GSM1047599_US22502677_251486836582_S01_GE2_105_Dec08_1_4 Lgr5_sorted_cells mouse_reference_RNA
GSM1047600_US22502677_251486836583_S01_GE2_105_Dec08_1_1       Sorted_duct mouse_reference_RNA
GSM1047601_US22502677_251486836583_S01_GE2_105_Dec08_1_2       Hepatocytes mouse_reference_RNA
GSM1047602_US22502677_251486836583_S01_GE2_105_Dec08_1_3             Liver mouse_reference_RNA
GSM1047603_US22502677_251486836583_S01_GE2_105_Dec08_1_4 Lgr5_sorted_cells mouse_reference_RNA
GSM797985_US22502677_251486826589_S01_GE2_105_Dec08_1_2             Muscle mouse_reference_RNA
GSM797986_US22502677_251486826589_S01_GE2_105_Dec08_1_3             Muscle mouse_reference_RNA
GSM797987_US22502677_251486826590_S01_GE2_105_Dec08_1_2             Muscle mouse_reference_RNA
GSM797988_US22502677_251486826590_S01_GE2_105_Dec08_1_4                WAT mouse_reference_RNA
GSM797989_US22502677_251486826591_S01_GE2_105_Dec08_1_1                WAT mouse_reference_RNA
GSM797990_US22502677_251486826591_S01_GE2_105_Dec08_1_4                WAT mouse_reference_RNA
GSM797991_US22502677_251486826593_S01_GE2_105_Dec08_1_4                BAT mouse_reference_RNA
GSM797992_US22502677_251486825703_S01_GE2_105_Dec08_1_1                BAT mouse_reference_RNA
GSM797993_US22502677_251486825703_S01_GE2_105_Dec08_1_4                BAT mouse_reference_RNA
GSM797994_US22502677_251486830305_S01_GE2_105_Dec08_1_1        Adult_liver mouse_reference_RNA
GSM797995_US22502677_251486833702_S01_GE2_105_Dec08_1_1        Adult_liver mouse_reference_RNA
GSM797996_US22502677_251486830304_S01_GE2_105_Dec08_1_2  EM_Liver_organoid mouse_reference_RNA
GSM797997_US22502677_251486834188_S01_GE2_105_Dec08_1_2  EM_Liver_organoid mouse_reference_RNA
GSM797998_US22502677_251486834188_S01_GE2_105_Dec08_1_1  DM_Liver_organoid mouse_reference_RNA
GSM797999_US22502677_251486834188_S01_GE2_105_Dec08_1_4     Adult_Pancreas mouse_reference_RNA
GSM798000_US22502677_251486835936_S01_GE2_105_Dec08_1_4     Adult_Pancreas mouse_reference_RNA
>

ADD REPLY • link 8.4 years ago Guido Hooiveld ★ 3.9k

score 2 · Answer 1 · 2015-12-01

2

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 13 hours ago

United States

I think you are getting confused by the sign The M-values for your genes are Cy5/Cy3, and since you have the reference in the numerator the signs need to be flipped. So the 'real' expression value for your gene is 7.25 in DM and about 0.25 or so in EM. And since you set your contrast to be EM - DM, that is about -7. Make sense?

ADD COMMENT • link 8.4 years ago James W. MacDonald 65k

0

Entering edit mode

Thanks James, your help is as always very much appreciated!

Thus, to summarize this for myselves (and for the archive): the model generated by limma's modelMatrix() results in a "negative" design matrix, but for the contrasts analyzed limma / topTable() (automatically) reports the correct direction (and log2FC) of regulation. No need to make the design matrix 'positive'. [Why did I ever doubt this...?]

The opposite logFC found when comparing (subtracting) the M-values is due to the fact that by definition the M-values are Red/Green (= Cy5/Cy3), and that since in this particular case Cy5 is the reference, the signs need to be flipped.

See e.g.: https://en.wikipedia.org/wiki/MA_plot or specifically for limma Smyth & Speed: http://www.ncbi.nlm.nih.gov/pubmed/14597310

ADD REPLY • link 8.4 years ago Guido Hooiveld ★ 3.9k