limma & design matrix: for 2-color dyes and 4-comparisons?
1
0
Entering edit mode
@mehmet-ilyas-cosacak-9020
Last seen 6.2 years ago
Germany/Dresden/ CRTD - DZNE

Hi Everybody,

I have a question regarding limma design matrix. I searched for "limma and design" but could not find an answer for my question. I will be happy if you could send me the link if a similar question has been answered previously!

I am re-analyzing a data set from GEO (GSE74615) to compare with some published microarray data and an RNA-Seq that we have.

I generated the target file as bellow:

FileName    Cy3    Cy5
US12302316_252665511011_S01_GE2-v5_95_Feb07_1_1.txt    microglia_WT    microglia_AD
US12302316_252665511011_S01_GE2-v5_95_Feb07_1_2.txt    microglia_AD    microglia_WT
US12302316_252665511011_S01_GE2-v5_95_Feb07_1_3.txt    microglia_WT    microglia_AD
US12302316_252665511011_S01_GE2-v5_95_Feb07_1_4.txt    microglia_AD    microglia_WT
US12302316_252665511015_S01_GE2-v5_95_Feb07_1_1.txt    microglia_WT    microglia_AD
US12302316_252665511015_S01_GE2-v5_95_Feb07_1_2.txt    microglia_AD    microglia_WT
US12302316_252665511015_S01_GE2-v5_95_Feb07_1_3.txt    microglia_AD    microglia_WT
US12302316_252665511016_S02_GE2-v5_95_Feb07_1_1.txt    astrocytes_AD    astrocytes_WT
US12302316_252665511016_S02_GE2-v5_95_Feb07_1_2.txt    astrocytes_WT    astrocytes_AD
US12302316_252665511016_S02_GE2-v5_95_Feb07_1_3.txt    astrocytes_WT    astrocytes_AD
US12302316_252665511016_S02_GE2-v5_95_Feb07_1_4.txt    astrocytes_AD    astrocytes_WT

What I want to do for  DE gene analysis is to compare the followings:

1) microglia_AD vs microglia_WT

2) astrocytes_AD vs astrocytes_WT

3) astrocytes_AD vs microglia_AD

4) astrocytes_WT vs microglia_WT

I did the first 2 comparisons with the script below

library("limma")
wtFun <- function(x) as.numeric(x$ControlType == 0)
targets <- readTargets()
microglia <- targets[!(targets$Cy3 %in% "astrocytes_AD") & !(targets$Cy5 %in% "astrocytes_AD"),]
astrocytes <- targets[!(targets$Cy3 %in% "microglia_AD") & !(targets$Cy5 %in% "microglia_AD"),]
#1) microglia_AD vs microglia_WT
RG <- read.maimages(microglia, source = "agilent", annotation = "ProbeName", wt.fun = wtFun)
RG <- backgroundCorrect(RG, method = "normexp", offset = 50)
MA <- normalizeBetweenArrays(RG, method = "quantile")
MA <- avereps(MA, ID = MA$genes$ProbeName)
design <- modelMatrix(microglia, ref = "microglia_WT")
fit <- lmFit(MA, design)
fit <- eBayes(fit, trend = TRUE)
allResults <- topTable(fit, n = Inf)

#2) astrocytes_AD vs astrocytes_WT
RG <- read.maimages(astrocytes, source = "agilent", annotation = "ProbeName", wt.fun = wtFun)
RG <- backgroundCorrect(RG, method = "normexp", offset = 50)
MA <- normalizeBetweenArrays(RG, method = "quantile")
MA <- avereps(MA, ID = MA$genes$ProbeName)
design <- modelMatrix(astrocytes, ref = "astrocytes_WT")
fit <- lmFit(MA, design)
fit <- eBayes(fit, trend = TRUE)
allResults <- topTable(fit, n = Inf)

is there an easier way to design a design matrix that can do all the comparisons?

thank you very much in advance,

ilyas.

limma microarray limma design matrix • 1.3k views
ADD COMMENT
1
Entering edit mode

Note first that there was a typo on the 10th line of your targets file, which had "astroyctes" instead of "astrocytes". I have edited your post to fix that, and you should obviously make sure this is correct in your analysis as well.

ADD REPLY
4
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

The experimental design does not include any direct comparisons between astrocytes and microglia, so you cannot make comparisons 3 and 4 in a conventional two-color microarray analysis. To make the first two comparisons, I would do it like this:​

library("limma")
targets <- readTargets("targets.txt")
RG <- read.maimages(targets, source = "agilent")
keep <- RG$genes$ControlType==0
RG2 <- RG[keep,]
RG2 <- backgroundCorrect(RG2, method = "normexp", offset = 16)
MA <- normalizeWithinArrays(RG2, method = "loess")
dm <- modelMatrix(targets[1:7,], ref="microglia_WT")
da <- modelMatrix(targets[8:11,], ref="astrocytes_WT")
design <- cbind(Dye=1, blockDiag(dm,da))
keep <- rowMeans(MA$A) > 6
fit <- lmFit(MA[keep,], design)
fit <- eBayes(fit, trend=TRUE, robust=TRUE)
topTable(fit, coef="microglia_AD")
topTable(fit, coef="astrocytes_AD")

Note: I made a correction to the design matrix in the above code 14 hours after my original post.

To make all four comparisons that you want to do, it is necessary to undertake a separate channel analysis instead:

targetsC <- targetsA2C(targets)
group <- factor(targetsC$Target)
group <- relevel(group,ref="microglia_WT")
designsc <- model.matrix(~0+group)
colnames(designsc) <- levels(group)
MA <- normalizeWithinArrays(RG2, method = "loess")
MA <- normalizeBetweenArrays(MA, method="Aquantile")
keep <- rowMeans(MA$A) > 6
isc <- intraspotCorrelation(MA[keep,], designsc)
fitsc <- lmscFit(MA[keep,], designsc, isc$consensus)
cont.matrix <- makeContrasts(
   mAD.mWT = microglia_AD - microglia_WT,
   aAD.aWT = astrocytes_AD - astrocytes_WT,
   aAD.mAD = astrocytes_AD - microglia_AD,
   aWT.mWT = astrocytes_WT - microglia_WT,
   levels = designsc
)
fit2 <- contrasts.fit(fitsc, cont.matrix)
fit2 <- eBayes(fit2, trend=TRUE, robust=TRUE)
topTable(fit2, coef="mAD.mWT")
topTable(fit2, coef="aAD.aWT")
topTable(fit2, coef="aAD.mAD")
topTable(fit2, coef="aWT.mWT")

Note that the separate channel analysis is more powerful and will generally give smaller p-values than the conventional analysis for the same comparisons.

ADD COMMENT
0
Entering edit mode

Hi Gordon,

Thank you very much again and again for your help and answers!

It looks a bit complicated how to generate the design matrix. I will try to understand the logic behind.

However, after running the script the topTable(fit2, coef="mAD.mWT") and topTable(fit, coef="mAD") give different results! At least the top genes differ from each other! Do you think there is mistake here?

best,

ilyas.

 

topTable(fit2, coef="mAD.mWT")
      Row Col Start                                                     Sequence ProbeUID ControlType     ProbeName    GeneName SystematicName
5127  114  14     0 TGATATTGATGGAGTCTTACCTCTCTGAAACCTTAAGCCCAAATAAATCCTTCCTTCTAT     8485           0 A_55_P2005320      Tm7sf4      NM_029422
>
> topTable(fit, coef = "mAD")
      Row Col Start                                                     Sequence ProbeUID ControlType     ProbeName GeneName SystematicName
21411 257  69     0 ATGGATTCTACTCTTCCCTCTTTGCCTTGGTGAGGCAGCCCAGGTCGCTGCAGCATCTCT    19358           0 A_55_P2044710    Asb10      NM_080444
 

sessionInfo()
R version 3.2.5 (2016-04-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8.1 x64 (build 9600)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] limma_3.26.9

loaded via a namespace (and not attached):
[1] splines_3.2.5  statmod_1.4.24

 

ADD REPLY
1
Entering edit mode

Yes, you're right. The separate channel analysis was correct but I created the design matrix incorrectly for the conventional analysis. I have edited the code so that it now correct.

ADD REPLY
0
Entering edit mode

Thanks a lot Gordon! I really appreciate your help!

best, ilyas.

ADD REPLY

Login before adding your answer.

Traffic: 691 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6