How to normalize and determine the differentially expressed genes for illumina ht-12 v4 expression beadchip?
use limma and read the manual
Thank you for your replies. But, I was trying to analyse https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE74629. Does limma handle this kind of data where the non-normalized txt file is only provided? Besides, the non-normalized file has columns for: Probe id, Sample and detection p-value only? What are the input files needed for analysing using limma?
Yes, limma will read and process those columns. See my answer below.
limma processes this sort of data easily. The key is to use the arguments of read.ilmn() to tell limma how the columns are named in the data file from GEO. Here is a quick limma analysis of GSE74629:
> x <- read.ilmn("GSE74629_non-normalized.txt",expr="SAMPLE ",probeid="ID_REF")
Reading file GSE74629_non-normalized.txt ... ...
> y <- neqc(x)
Note: inferring mean and variance of negative control probe intensities from the
> Group <- rep(c("PDAC","Healthy"),c(36,14))
> Group <- factor(Group)
> design <- model.matrix(~Group)
> keep <- rowSums(y$E>5) >= 14
> y2 <- y[keep,]
> fit <- lmFit(y2,design)
> fit <- eBayes(fit,trend=TRUE,robust=TRUE)
logFC AveExpr t P.Value adj.P.Val B
ILMN_2079655 -1.504 8.54 -8.28 3.50e-11 4.79e-07 15.2
ILMN_1697268 0.879 8.31 7.92 1.09e-10 5.00e-07 14.1
ILMN_1784884 0.886 11.74 7.86 1.40e-10 5.00e-07 13.9
ILMN_1705892 -1.049 6.85 -7.85 1.46e-10 5.00e-07 13.9
ILMN_1804738 0.795 7.26 7.56 4.31e-10 1.11e-06 12.8
ILMN_3201663 -0.815 4.93 -7.53 4.87e-10 1.11e-06 12.7
ILMN_1652073 -0.939 11.04 -7.43 7.08e-10 1.33e-06 12.4
ILMN_3226875 -1.219 9.83 -7.41 8.33e-10 1.33e-06 12.2
ILMN_1811702 0.927 9.09 7.37 8.76e-10 1.33e-06 12.2
ILMN_1797522 0.807 9.04 7.27 1.32e-09 1.58e-06 11.8
The neqc normalization method used above is described here: https://doi.org/10.1093/nar/gkq871
Thank you for the reply. This was really helpful :) Was struggling with it for a long time.
Dear Gordon Smyth,
I'm facing this same problem, I can't design the matrix I do not understand how to tell the software R what are the controls and the patients.
I tried help in biostar: https://www.biostars.org/p/312770/#312921
Just to extend Chris answer, the R package BeadArrayUseCases
(http://bioconductor.org/packages/release/data/experiment/html/BeadArrayUseCases.html) has a very extensive vignette especially for Illumina platforms.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy