Dear BioC list,
I'm analyzing for the first time old data from Agilent two colors arrays (array version G4110B). As experimental design, I have 3 time points x 2 replicates at each time point in a treated vs control experiment. Arrays were hybridized at 2 different times, so I decided to reduce an evident batch effect using also the "NormalizeBetweenArrays" function after "NormalizeWithinArrays".
I analyzed data according to chapter 4 of Gentleman's et al book using marray package. Only at the ending point of detecting differentially expressed genes, I discovered that no summarization of identical probes was done; checking platform design, I've seen that only a minority of transcritps are detected by multiple and identical probes, instead majority are not.
I googled a lot but I cannot find an easy way to perform summarization within marray package or alternatives. I've seen that this step exists in "Agi4x44PreProcess" package (function "summarize.probe"), but I have some problem to make it working (function was not found after library is loaded).
So my questions:
- I'm puzzled that summarization does not exist in marray package: maybe I'm wrong in my googling and reading?
- If I can make "summarize.probe" working, is it correct to do it after NormalizeWithin?
Below there are script and sessionInfo. As general consideration, I did a very simple design matrix because I had some problem in setting a more complex design type.
I'm less familiar with gene expression Agilent arrays, so any comment is really appreciated.
Thank you in advance,
Andrea
############### CODE ############### library("marray") files <- c("1.txt", "2.txt", "3.txt", "4.txt","5.txt", "6.txt") # change array names names <- c("30m_rep1", "30m_rep2", "60m_rep1", "60m_rep2", "120m_rep1", "120m_rep2") # LOAD DATA RG <- read.maimages(files, source = "agilent", names = c(names)) # BACKGROUND CORRECT RGb <- backgroundCorrect(RG, method = "subtract") # NORMALIZE WITHIN # MA.median <- normalizeWithinArrays(RGb, method = "median") # NORMALIZE BETWEEN # MA.select <- normalizeBetweenArrays(MA.median, method = "quantile") # TAKE OUT CONTROLS # sel <- (MA.select$genes$ControlType)== 0 normdata <- MA.select[sel,] # ---->>>> ADD SUMMARIZATION??? ### # ANALYSIS at single time point level # array <- c(rep((1), 6)) time30 <- c(rep(c(0, 1, 1), each = 2)); time60 <- c(rep(c(1, 0, 1), each = 2)); time120 <- c(rep(c(1, 1, 0), each = 2)) design30 <- model.matrix(~0 + array + time30); design60 <- model.matrix(~0 + array + time60); design120 <- model.matrix(~0 + array + time120) fit30 <- lmFit(normdata, design30); fit2.30 <- eBayes(fit30) fit60 <- lmFit(normdata, design60); fit2.60 <- eBayes(fit60) fit120 <- lmFit(normdata, design120); fit2.120 <- eBayes(fit120) # ADD ANNOTATIONS library(hgug4110b.db); library(annotate); library(annaffy) fit30$genes$ID <- (getSYMBOL(fit30$genes$ProbeName, "hgug4110b.db")) geneID <- cbind(ProbeID=c(fit30$genes$ProbeName), geneID=c(fit30$genes$ID), geneDescription=c(fit30$genes$Description)) # add GENE SYMBOL fit2.30$genes$ID <- (getSYMBOL(fit2.30$genes$ProbeName, "hgug4110b.db")); fit2.60$genes$ID <- (getSYMBOL(fit2.60$genes$ProbeName, "hgug4110b.db")); it2.120$genes$ID <- (getSYMBOL(fit2.120$genes$ProbeName, "hgug4110b.db")) # set parameters corr <- c("fdr"); p <- 0.05; len <- length(fit30$genes[,1]); log <- 0.5849625 # get significant genes top_sign.30 <- toptable(fit2.30, adjust=corr, p.value=p, genelist=cbind(geneID), number=len, lfc=log) top_sign.60 <- toptable(fit2.60, adjust=corr, p.value=p, genelist=cbind(geneID), number=len, lfc=log) top_sign.120 <- toptable(fit2.120, adjust=corr, p.value=p, genelist=cbind(geneID), number=len, lfc=log) # save d.e. lists if (Save == "yes"){ write.table(top_sign.30, file = "./EWS_6647.0662_vs_6647_MedQuantNorm_log2_Time30.txt", sep = "\t", col.names = NA) write.table(top_sign.60, file = "./EWS_6647.0662_vs_6647_MedQuantNorm_log2_Time60.txt", sep = "\t", col.names = NA) write.table(top_sign.120, file = "./EWS_6647.0662_vs_6647_MedQuantNorm_log2_Time120.txt", sep = "\t", col.names = NA) ############### SESSION INFO ############### > sessionInfo() R version 2.14.2 (2012-02-29) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C [5] LC_TIME=Italian_Italy.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Biobase_2.14.0 BiocInstaller_1.2.1 marray_1.32.0 limma_3.10.3 loaded via a namespace (and not attached): [1] BiocGenerics_0.4.0 DBI_0.2-5 parallel_2.14.2 RSQLite_0.11.2 stats4_2.14.2 tools_2.14.2 xtable_1.7-1