Hello,
I was working with Illumina gene expression arrays (HT-12 v4.0 Human Expression Array Ambion) at the first time and I would like to ask your opinion.
I have 36 arrays in idat format (e.g. 101178090056_A_Grn.idat). At first 12 samples were processed on a chip denoted batch 1. Then later, 24 samples were processed on two chips at the same time denoted batch 2 and batch 3. I am interested in the diagnosis effect on gene expression intensities of 29 samples by controlling for confounding variables. After doing my homework I decided on the following pipeline.
##Processing 36 arrays (idat files) all together
set.seed (12345) library(limma) library(magrittr) library(illuminaio) library(dplyr)
#idat file processing
bgxfile = dir(path=" ", pattern="bgx") idatfiles = dir(path=" ", pattern="idat") data = read.idat(idatfiles, bgxfile) pe <- propexpr(data) #(average 0.52) y <- neqc(data) ##normalization
#I am interested in only 29 arrays
array_id <- as.matrix(colnames(y)) array_id_in <- array_id[-c(2,5,6,8,15,26,35)] y_29 <- y[,colnames(y) %in% array_id_in] colnames(y_29) ##target file target <- read.delim("target.txt") head(target) table(target$dx) #potential confounding variables age <- target$Age sex <- as.factor(target$Sex) dx <- relevel(factor(target$dx), ref="Norm") #Norm, MCI, Dem batch <- factor(target$batch) #(denoted as 1,2,3)
#######
all(target$array_ID==colnames(y_29)) #TRUE
###Analysis with limma
des <- model.matrix(~ dx + batch + sex + age) des
(Intercept) dxDem dxMCI batch2 batch3 sexMale age
1 1 0 1 0 0 0 80
2 1 0 0 0 0 1 83
3 1 0 0 0 0 0 67
4 1 1 0 0 0 1 75
etc.
fit <- eBayes(lmFit(y_29, des)) results <- decideTests(fit,method="global", adjust.method = "BH",p.value = 0.05) # method="global" – the contrasts are similar write.fit(fit,results, file="result.txt" ) sessionInfo() R version 3.2.3 (2015-12-10) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.11.6 (El Capitan) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] oligo_1.34.2 Biostrings_2.38.4 XVector_0.10.0 IRanges_2.4.8 [5] S4Vectors_0.8.11 Biobase_2.30.0 oligoClasses_1.32.0 BiocGenerics_0.16.1 [9] sva_3.18.0 genefilter_1.52.1 mgcv_1.8-13 nlme_3.1-128 [13] foreach_1.4.3 reshape_0.8.5 Hmisc_3.17-4 ggplot2_2.1.0 [17] Formula_1.2-1 survival_2.39-5 lattice_0.20-33 dynamicTreeCut_1.63-1 [21] cluster_2.0.4 dplyr_0.5.0 illuminaio_0.12.0 magrittr_1.5 [25] edgeR_3.12.1 limma_3.26.9
My questions:
1. Does my pipeline for illumina expression array look ok?
2. Since my “file read-in” is different from limma’s
x <- read.ilmn("probe profile.txt", ctrlfiles="control probe profile.txt", other.columns="Detection")
I cannot use function “rowSums(y$other$Detection < 0.05) >= 3
” for probe filtering. I was wondering if there was an other way to filter probes out from an EList object.
Thank you for help and time in advance.
Anita
Thank you for your help. I have made the suggested changes and it works.
Thanks once again.
Anita