Question

Microarray data preprocessing and differentially expressed genes

0

Entering edit mode

AAAbdul • 0

@aaabdul-10719

Last seen 7.2 years ago

I am using the same database in this article

https://www.dropbox.com/s/qpibj0rqeu1w4zm/Differential%2520expression%2520network%2520analysis%2520for%2520diabetes%2520mellitus%2520type%25202%2520basedon%2520expressed%2520level%2520of%2520islet%2520cells.pdf?dl=0

they are using GSE38642 database,that includes 54 non-diabetic and 9 diabetic islets samples, platform ( Affymetrix Human Gene 1.0 ST Array)

section 2.2 explain Data preprocessing by Affy package.

I tried to follow their process but I stick in limma

my code is that

rm(list = ls())

library(GEOquery)

library(affy)

library(gcrma)

library(hugene10stv1cdf)

library(hugene10stv1probe)

library(hugene10stprobeset.db)

library(hugene10sttranscriptcluster.db)

library(oligo)

library(affydata)

library(preprocessCore)

getGEOSuppFiles("GSE38642")

untar("GSE38642/GSE38642_RAW.tar", exdir="data")

cels <- list.files("data/", pattern = "[gz]")

sapply(paste("data", cels, sep="/"), gunzip)

cels = list.files("data/", pattern = "CEL")

setwd("/Users/ansamabdulrahman/data")

celfiles=read.celfiles(cels);

pms = pm(celfiles)

rmadata=bg.adjust (pms)

summary(rmadata)

celfiles.rma=normalize.quantiles(log2(rmadata))

eset=assayDataElement(celfiles,'exprs')

samples <- c("healthy","healthy","healthy","healthy","healthy","healthy", "healthy", "healthy","healthy","healthy","T2D","T2D", "healthy","healthy","healthy", "healthy","healthy","healthy","healthy","healthy", "healthy","healthy","healthy","healthy","healthy","healthy","healthy", "healthy","healthy", "healthy","healthy","healthy","healthy","healthy", "healthy","healthy", "healthy","healthy","T2D","T2D","healthy", "healthy","healthy","healthy", "healthy","T2D", "T2D","T2D","healthy","healthy", "healthy","healthy", "healthy","healthy","healthy","healthy","healthy", "healthy","healthy", "healthy", "T2D","T2D","healthy")

library(convert)

design <- model.matrix(~samples)

design

Could anyone help me to complete the process?

microarray affy limma • 1.6k views

ADD COMMENT • link updated 7.9 years ago by Bernd Klaus ▴ 610 • written 7.9 years ago by AAAbdul • 0

score 1 · Answer 1 · 2016-06-15

1

Entering edit mode

Bernd Klaus ▴ 610

@bernd-klaus-6281

Last seen 5.5 years ago

Germany

Hi,

I have just published a microarray workflow that analyses HuGene 1.0 ST data:

http://f1000research.com/articles/5-1384/v1

This should help you to get started and the .Rmd will be in the Bioc workflow section

hopefully very soon.

Best wishes,

Bernd

ADD COMMENT • link 7.9 years ago Bernd Klaus ▴ 610

0

Entering edit mode

Thanks a lot. That is kind of you

ADD REPLY • link 7.9 years ago AAAbdul • 0

0

Entering edit mode

Dear Bernrd,

You are using ram normalisation in your article.

I want to use background adjustment and quantile normalization

Do you have a clue how to use transformed expression matrix with probe level to matrix with gene level then use limma
my code is that:

rm(list = ls())

library(GEOquery)

library(affy)

library(gcrma)

library(hugene10stv1cdf)

library(hugene10stv1probe)

library(hugene10stprobeset.db)

library(hugene10sttranscriptcluster.db)

library(oligo)

library(affydata)

library(preprocessCore)

getGEOSuppFiles("GSE38642")

untar("GSE38642/GSE38642_RAW.tar", exdir="data")

cels <- list.files("data/", pattern = "[gz]")

sapply(paste("data", cels, sep="/"), gunzip)

cels = list.files("data/", pattern = "CEL")

setwd("/Users/ansamabdulrahman/data")

celfiles=read.celfiles(cels);

pms = pm(celfiles)

rma_data=bg.adjust (pms)

summary(rma_data)

celfiles_normalize=normalize.quantiles(log2(rmadata))

object<-new("ExpressionSet", exprs=as.matrix(celfiles_normalize))

eset=assayDataElement(object,'exprs')

rma=format(eset, digits=5)

I would be grateful to you.

ADD REPLY • link 7.9 years ago AAAbdul • 0

1

Entering edit mode

Why are you doing these things? The authors state that they just did RMA using Affy's Expression Console, so what you are doing isn't even remotely similar to what they did. Do note that there might be some subtle differences between what Expression Console does and what oligo::rma does, but they won't be big enough to matter.

Also note that there is no profit whatsoever in doing non-standard things, especially as a new R user. The RMA algorithm has been the de facto standard for summarizing Affy data for well over a decade now, and any deviation from that method will require a pretty good rationale. And to provide that rationale you will have to explain what you did, and why you did it, both of which I think will be difficult for you to do.

If you just do

eset <- getGEO("GSE38642")[[1]]

then you will have the normalized, summarized data that the original authors submitted. For the vast majority of users, this is completely acceptable. If you insist on getting the raw data, you can just do

getGEOSuppFiles("GSE38642")
untar("GSE38642/GSE38642_RAW.tar", exdir="data")
cels <- list.files("data/", pattern = "[gz]")
setwd("/Users/ansamabdulrahman/data")
celfiles=read.celfiles(cels)
eset <- oligo::rma(celfiles)

Note that you don't have to unzip things, nor do you need to do all that extra stuff. In either case you now have an ExpressionSet that you can feed directly to limma to make comparisons. Note that limma knows what to do with an ExpressonSet, so you don't need a matrix.

ADD REPLY • link 7.9 years ago James W. MacDonald 65k

0

Entering edit mode

Dear James,

The authors state (The raw data was preprocessed by Affy package (R/Bioconductor) of R language following the three steps: background adjustment, quantile normalization and finally summarization, and logarithmic transformation )

I am using the same dataset in my project. I am interested in up and down regulated genes. In this article they said they got 147 DEGs (threshold of DEGs was P < 0.001)

I used the same process but I got more than 8000 DEGs

Here is my code:

rm(list = ls())

library(GEOquery)

library(affy)

library(gcrma)

library(hugene10stv1cdf)

library(hugene10stv1probe)

library(hugene10stprobeset.db)

library(hugene10sttranscriptcluster.db)

getGEOSuppFiles("GSE38642")

untar("GSE38642/GSE38642_RAW.tar", exdir="data")

cels <- list.files("data/", pattern = "[gz]")

sapply(paste("data", cels, sep="/"), gunzip)

cels = list.files("data/", pattern = "CEL")

setwd("/Users/ansamabdulrahman/data")

library(simpleaffy)

celfiles=ReadAffy(verbose=TRUE, filenames=cels, cdfname="hugene10stv1")

celfiles

celfiles.rma =rma(celfiles)

ADD REPLY • link 7.9 years ago AAAbdul • 0

2

Entering edit mode

You seem not to want to listen, but instead to want to, like, copy and paste stuff. That is really tedious - I don't really care what you have done, as it doesn't make sense, and you obviously don't know what you are doing - not to mention vaguely insulting. So here you go, all wrapped up in a ribbon.

And yes, I understand that 127 != 147. The authors you cite say they 'fit a linear model' which is as uninformative as one could possibly get, so it's entirely possible that they included other confounders in their model that we are not including, or maybe they fit a slightly different model. For example, fitting a weighted model gets us up to 157 genes.

> library(GEOquery)
> library(limma)
> library(affycoretools)
> library(hugene10sttranscriptcluster.db)
> eset <- getGEO("GSE38642")[[1]]
> eset <- annotateEset(eset, hugene10sttranscriptcluster.db)
> design <- model.matrix(~0+pData(phenoData(eset))[,15])
> colnames(design) <- c("normal","diabetic")
> contrast <- matrix(c(-1,1), ncol=1, dimnames = list(colnames(design), "diabetic - normal"))
> fit <- lmFit(eset, design)
> fit2 <- eBayes(contrasts.fit(fit, contrast))
> nrow(topTable(fit2,1,Inf, p.value = 0.001, adjust = "none"))
[1] 127

ADD REPLY • link 7.9 years ago James W. MacDonald 65k

0

Entering edit mode

May I misunderstand you in the first comment. You said they used RMA. I used RMA as well in the above code but I did not get the same number of DEGs

Thanks for your help and clarifying, I am a new user in R

ADD REPLY • link 7.9 years ago AAAbdul • 0