Question: summarization for G4110B two colors agilent array: how to?
gravatar for Andrea Grilli
4.9 years ago by
Andrea Grilli240
Italy, Bologna, Rizzoli Orthopaedic Institute
Andrea Grilli240 wrote:

Dear BioC list,

I'm analyzing for the first time old data from Agilent two colors arrays (array version G4110B). As experimental design, I have 3 time points x 2 replicates at each time point in a treated vs control experiment. Arrays were hybridized at 2 different times, so I decided to reduce an evident batch effect using also the "NormalizeBetweenArrays" function after "NormalizeWithinArrays".

I analyzed data according to chapter 4 of Gentleman's et al book using marray package. Only at the ending point of detecting differentially expressed genes, I discovered that no summarization of identical probes was done; checking platform design, I've seen that only a minority of transcritps are detected by multiple and identical probes, instead majority are not.

I googled a lot but I cannot find an easy way to perform summarization within marray package or alternatives. I've seen that this step exists in "Agi4x44PreProcess" package (function "summarize.probe"), but I have some problem to make it working (function was not found after library is loaded).

So my questions:

- I'm puzzled that summarization does not exist in marray package: maybe I'm wrong in my googling and reading?

- If I can make "summarize.probe" working, is it correct to do it after NormalizeWithin?

Below there are script and sessionInfo. As general consideration, I did a very simple design matrix because I had some problem in setting a more complex design type.

I'm less familiar with gene expression Agilent arrays, so any comment is really appreciated.
Thank you in advance,

############### CODE ###############
files <- c("1.txt", "2.txt", "3.txt", "4.txt","5.txt", "6.txt")
# change array names
names <- c("30m_rep1", "30m_rep2", "60m_rep1", "60m_rep2", "120m_rep1", "120m_rep2")
RG <- read.maimages(files, source = "agilent", names = c(names))
RGb <- backgroundCorrect(RG, method = "subtract")

MA.median <- normalizeWithinArrays(RGb, method = "median")
    # NORMALIZE BETWEEN # <- normalizeBetweenArrays(MA.median, method = "quantile")
sel <- ($genes$ControlType)== 0
normdata <-[sel,]

# ---->>>> ADD SUMMARIZATION??? ###

 # ANALYSIS at single time point level #
array <- c(rep((1), 6))
time30 <- c(rep(c(0, 1, 1), each = 2)); time60 <- c(rep(c(1, 0, 1), each = 2)); time120 <- c(rep(c(1, 1, 0), each = 2))

design30 <- model.matrix(~0 + array + time30); design60 <- model.matrix(~0 + array + time60); design120 <- model.matrix(~0 + array + time120)

fit30 <- lmFit(normdata, design30); fit2.30 <- eBayes(fit30)
fit60 <- lmFit(normdata, design60); fit2.60 <- eBayes(fit60)
fit120 <- lmFit(normdata, design120); fit2.120 <- eBayes(fit120)

library(hgug4110b.db); library(annotate); library(annaffy)

fit30$genes$ID <- (getSYMBOL(fit30$genes$ProbeName, "hgug4110b.db"))
geneID <- cbind(ProbeID=c(fit30$genes$ProbeName), geneID=c(fit30$genes$ID), geneDescription=c(fit30$genes$Description))

fit2.30$genes$ID <- (getSYMBOL(fit2.30$genes$ProbeName, "hgug4110b.db")); fit2.60$genes$ID <- (getSYMBOL(fit2.60$genes$ProbeName, "hgug4110b.db"));

it2.120$genes$ID <- (getSYMBOL(fit2.120$genes$ProbeName, "hgug4110b.db"))

# set parameters
corr <- c("fdr"); p <- 0.05; len <- length(fit30$genes[,1]); log <- 0.5849625

# get significant genes
top_sign.30 <- toptable(fit2.30, adjust=corr, p.value=p, genelist=cbind(geneID), number=len, lfc=log)
top_sign.60 <- toptable(fit2.60, adjust=corr, p.value=p, genelist=cbind(geneID), number=len, lfc=log)
top_sign.120 <- toptable(fit2.120, adjust=corr, p.value=p, genelist=cbind(geneID), number=len, lfc=log)

# save d.e. lists
if (Save == "yes"){
    write.table(top_sign.30, file = "./EWS_6647.0662_vs_6647_MedQuantNorm_log2_Time30.txt", sep = "\t", col.names = NA)
    write.table(top_sign.60, file = "./EWS_6647.0662_vs_6647_MedQuantNorm_log2_Time60.txt", sep = "\t", col.names = NA)
    write.table(top_sign.120, file = "./EWS_6647.0662_vs_6647_MedQuantNorm_log2_Time120.txt", sep = "\t", col.names = NA)

############### SESSION INFO ###############
> sessionInfo()
R version 2.14.2 (2012-02-29)
Platform: i386-pc-mingw32/i386 (32-bit)

[1] LC_COLLATE=Italian_Italy.1252  LC_CTYPE=Italian_Italy.1252    LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C                  
[5] LC_TIME=Italian_Italy.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Biobase_2.14.0      BiocInstaller_1.2.1 marray_1.32.0       limma_3.10.3       

loaded via a namespace (and not attached):
[1] BiocGenerics_0.4.0 DBI_0.2-5          parallel_2.14.2    RSQLite_0.11.2     stats4_2.14.2      tools_2.14.2       xtable_1.7-1
ADD COMMENTlink modified 4.9 years ago by Gordon Smyth39k • written 4.9 years ago by Andrea Grilli240
Answer: summarization for G4110B two colors agilent array: how to?
gravatar for Gordon Smyth
4.9 years ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

Actually you are not using the marray package at all. All the code you give uses the limma package, which is automatically loaded when you type library(marray).

You haven't followed the advice of Chapter 4 of the Bioconductor book in detail. That chapter recommends loess normalization, whereas you have used the far more primitive median normalization. Why? The marray and limma authors all recommend loess normalization for Agilent data. There is no need for a normalizeBetweenArrays() step. We recommend that you background correct and normalize like this:

RGb <- backgroundCorrect(RG, method="normexp")
MA <- normalizeBetweenArrays(RGb, method="loess")

As Axel has pointed out, you are using out-of-date software. Please update to the current version of limma and Bioconductor so that we can help you properly. Having a look through a recent version of the limma User's Guide will I think be more effective than doing google searches.

Probe summarization is readily available though the avereps() function of limma, but I question whether you really need to do this. Agilent arrays do not contain a lot of repeated probes.

You didn't ask about the DE analysis, but your DE analysis isn't correct and getting this right is more important than whether or not to average the duplicate probes. You shouldn't be creating a different design matrix for each comparison. From your description, the experimental design seems to be like this:

> targets
  FileName     Cy3  Cy5
1    1.txt Control  30m
2    2.txt Control  30m
3    3.txt Control  60m
4    4.txt Control  60m
5    5.txt Control 120m
6    6.txt Control 120m

You can test for DE between each of the three times vs control like this:

> design <- modelMatrix(targets,ref="Control")
Found unique target names:
 120m 30m 60m Control 
> fit <- lmFit(MA, design)
> fit <- eBayes(fit)
> topTable(fit, coef="30m")
> topTable(fit, coef="60m")
> topTable(fit, coef="120m")
ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by Gordon Smyth39k
Answer: summarization for G4110B two colors agilent array: how to?
gravatar for Axel Klenk
4.9 years ago by
Axel Klenk940
Axel Klenk940 wrote:

Dear Andrea,

I cannot comment on packages marray and Agi4x44PreProcess since I have never used them but package limma has a function avereps() that you could use. And yes, I would do the summarization after the normalization steps as you propose.

Some comments you have not asked for:

your annotation code is pretty old-fashioned -- you could/should use the select interface as described in the vignette of package AnnotationDbi. And, by the way, your R and Bioc versions are pretty old, too.

And you could/should address your batch effect by including hybridization batches in your limma model (or using function ComBat() from package sva) rather than additional normalization.


 - axel


ADD COMMENTlink modified 4.9 years ago by Gordon Smyth39k • written 4.9 years ago by Axel Klenk940
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 224 users visited in the last hour