Question

illumina gene expression array by limma

0

Entering edit mode

alakatos ▴ 130

@alakatos-6983

Last seen 4.6 years ago

United States

Hello,

I was working with Illumina gene expression arrays (HT-12 v4.0 Human Expression Array Ambion) at the first time and I would like to ask your opinion.

I have 36 arrays in idat format (e.g. 101178090056_A_Grn.idat). At first 12 samples were processed on a chip denoted batch 1. Then later, 24 samples were processed on two chips at the same time denoted batch 2 and batch 3. I am interested in the diagnosis effect on gene expression intensities of 29 samples by controlling for confounding variables. After doing my homework I decided on the following pipeline.

##Processing 36 arrays (idat files) all together

set.seed (12345)
library(limma)
library(magrittr)
library(illuminaio)
library(dplyr)

#idat file processing

bgxfile = dir(path=" ", pattern="bgx")
idatfiles = dir(path=" ", pattern="idat")
data = read.idat(idatfiles, bgxfile)
pe <- propexpr(data) #(average 0.52)
y <- neqc(data)  ##normalization

#I am interested in only 29 arrays

array_id <- as.matrix(colnames(y))
array_id_in <- array_id[-c(2,5,6,8,15,26,35)]
y_29 <- y[,colnames(y) %in% array_id_in]
colnames(y_29)
##target file

target <- read.delim("target.txt")
head(target)
table(target$dx)

#potential confounding variables

age <- target$Age
sex <- as.factor(target$Sex)
dx <- relevel(factor(target$dx), ref="Norm")  #Norm, MCI, Dem
batch <- factor(target$batch)  #(denoted as 1,2,3)

#######

all(target$array_ID==colnames(y_29)) #TRUE

###Analysis with limma

des <- model.matrix(~ dx + batch + sex + age)
des

(Intercept) dxDem dxMCI batch2 batch3 sexMale age

1 1 0 1 0 0 0 80

2 1 0 0 0 0 1 83

3 1 0 0 0 0 0 67

4 1 1 0 0 0 1 75

etc.

fit <- eBayes(lmFit(y_29, des))
results <- decideTests(fit,method="global", adjust.method = "BH",p.value = 0.05) # method="global" – the contrasts are similar 

write.fit(fit,results, file="result.txt" )

sessionInfo()

R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
 [1] oligo_1.34.2          Biostrings_2.38.4     XVector_0.10.0        IRanges_2.4.8       
 [5] S4Vectors_0.8.11      Biobase_2.30.0    oligoClasses_1.32.0   BiocGenerics_0.16.1 
 [9] sva_3.18.0            genefilter_1.52.1     mgcv_1.8-13           nlme_3.1-128        
[13] foreach_1.4.3         reshape_0.8.5         Hmisc_3.17-4          ggplot2_2.1.0       
[17] Formula_1.2-1         survival_2.39-5       lattice_0.20-33       dynamicTreeCut_1.63-1
[21] cluster_2.0.4         dplyr_0.5.0           illuminaio_0.12.0     magrittr_1.5        
[25] edgeR_3.12.1          limma_3.26.9

My questions:

1. Does my pipeline for illumina expression array look ok?

2. Since my “file read-in” is different from limma’s

x <- read.ilmn("probe profile.txt", ctrlfiles="control probe profile.txt",  other.columns="Detection")

I cannot use function “rowSums(y$other$Detection < 0.05) >= 3” for probe filtering. I was wondering if there was an other way to filter probes out from an EList object.

Thank you for help and time in advance.

Anita

illumina human ht-12 v4 limma pipeline • 1.8k views

ADD COMMENT • link 7.7 years ago alakatos ▴ 130

0

Entering edit mode

Thank you for your help. I have made the suggested changes and it works.

Thanks once again.

Anita

ADD REPLY • link updated 7.7 years ago by Gordon Smyth 50k • written 7.7 years ago by alakatos ▴ 130

score 3 · Answer 1 · 2016-08-23

Dear Anita,

The first thing to do is to upgrade to the current release of Bioconductor, because your current installation of R is a full Bioconductor release out of date. Then you will be able to read and process the idat files in limma to get the same sort of object that you would have got from read.ilmn(). Using the latest version of limma, you can use:

library(limma)
x <- read.idat(idatfiles, bgxfile)
x$other$Detection <- detectionPValues(x)
y <- neqc(x)

After this everything is the same as usual. For example, you could filter using detection p-values:

isexpr <- rowSums(y$other$Detection < 0.05) >= 3
y <- y[isexpr,]

and so on.

The current version of read.idat() can read extra annotation columns out of the bgx file as well.

I won't comment on the rest of the analysis except to say that I would not use method="global" here. Your predictor variables are all quite different in nature. Treating batch effects as comparable to genuine predictors doesn't seem appropriate.

Also, why not try trend=TRUE and robust=TRUE when you run eBayes()? They are likely to improve the results.