Question

Questions about analysis of Human Gene 2.0 ST Array

0

Entering edit mode

Biologist ▴ 110

@biologist-9801

Last seen 4.1 years ago

Hi All,

I have tried to process data from Affymetrix Human Gene 2.0 ST Array. This is my first time investigating data from this array. I have a couple of questions. I would appreciate any advice in advance.

1) I have realized "affy" package does not work for data from this array, and found "oligo" package does work. My normalizing processes are the followings:

celfiles = list.files(path = ".", pattern = ".CEL$", all.files = FALSE,
full.names = FALSE, recursive = FALSE, ignore.case = FALSE)
Data <- read.celfiles(celfiles)
celfiles.rma <- rma(Data, target="core")

celfiles.rma

ExpressionSet (storageMode: lockedEnvironment)
assayData: 53617 features, 6 samples
element names: exprs
protocolData
rowNames: #8_(HuGene-2_0-st).CEL E_(HuGene-2_0-st).CEL ... T_(HuGene-2_0-st).CEL (6 total)
varLabels: exprs dates
varMetadata: labelDescription channel
phenoData
rowNames: #8_(HuGene-2_0-st).CEL E_(HuGene-2_0-st).CEL ... T_(HuGene-2_0-st).CEL (6 total)
varLabels: index
varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.hugene.2.0.st

Now, How can I to remove control probesets from ST arrays? Thank you

@ James W. MacDonald Could you please help me in this?

oligo • 2.7k views

ADD COMMENT • link updated 8.1 years ago by James W. MacDonald 65k • written 8.1 years ago by Biologist ▴ 110

score 0 · Answer 1 · 2016-03-28

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 20 hours ago

United States

library(affycoretools)
celfiles.rma <- getMainProbes(celfiles.rma)

ADD COMMENT • link 8.1 years ago James W. MacDonald 65k

0

Entering edit mode

Hi James,

Im working on Affymetrix Human Gene 2.0 ST Array.

library(oligo)
#Read cel files
celfiles = list.files(path = ".", pattern = ".CEL$", all.files = FALSE,
full.names = FALSE, recursive = FALSE, ignore.case = FALSE)

celfiles
[1] "#8_(HuGene-2_0-st).CEL" "E_(HuGene-2_0-st).CEL" "GG7_(HuGene-2_0-st).CEL"
[4] "J_(HuGene-2_0-st).CEL" "O_(HuGene-2_0-st).CEL" "T_(HuGene-2_0-st).CEL"
Data <- read.celfiles(celfiles)
#RMA
celfiles.rma <- rma(Data, target="core")

library(limma)

design <- model.matrix(~ 0+factor(c(1,1,1,2,2,2)))

colnames(design) <- c("group1", "group2")
fit <- lmFit(celfiles.rma, design)

contrast.matrix <- makeContrasts(group2-group1, group1-group2, levels=design)

fit2 <- contrasts.fit(fit, contrast.matrix)

fit2 <- eBayes(fit2)

tab <- topTable(fit2, coef=2, adjust="fdr", sort.by="none", number=Inf)

head(tab)
logFC AveExpr t P.Value adj.P.Val B
16650001 -0.67713091 0.7013795 -2.11246203 0.07463104 0.8414740 -3.923065
16650003 0.01775323 0.8173575 0.05752413 0.95581904 0.9922585 -5.247542
16650005 0.20727356 1.2926154 0.28654513 0.78318782 0.9636612 -5.214740
16650007 0.13198594 0.8028477 0.37395467 0.72008320 0.9502862 -5.191016
16650009 -0.31117759 0.7187522 -0.92432176 0.38765340 0.8815813 -4.917247
16650011 -0.18142907 0.7450577 -0.46483921 0.65689248 0.9382330 -5.160082

write.table(tab, file="DEG.xls",row.names=F, sep="\t")

results <- tab[which(tab$logFC >= 1.5 & tab$P.Value <= 0.05),]
write.table(results, file="DEGp.xls", row.names=T, sep="\t")

idx = which(tab$P.Value < 0.05 & tab$logFC > 1.5)

heatmap(exprs(celfiles.rma[idx,]),trace='none',scale='row')

Could you please tell me how can I add the annotation for the genes?

Thank you

ADD REPLY • link 8.1 years ago Biologist ▴ 110

score 0 · Answer 2 · 2016-03-29

I have added a function called annotateEset in the devel version of affycoretools, which will make this dead simple. The only slight hiccup is that you need to use R-devel and the devel version of Bioconductor. But we are getting really close to release, so the devel versions are likely to be OK.

After installing R-devel and Bioc-devel (including affycoretools, obvs), you would just do

celfiles.rma <- annotateEset(celfiles.rma, hugene20sttranscriptcluster.db)

OR

celfiles.rma <- annotateEset(celfiles.rma, pd.hugene.2.0.st)

which will annotate your ExpressionSet, and those annotations will then be propagated through to your MArrayLM object, and to your topTable output. The annotation will be in the 'genes' list item of the MArrayLM object, and you can get that directly using the $ operator, if you need the annotation for anything else.

The difference in the second argument has to do with the source of the annotation. The usual way to annotate is to use the ChipDb package (e.g., hugene20sttranscriptcluster.db), in which case you can use an additional argument of the columns you want to annotate with.

The Gene ST arrays all come with a parsed version of the Affy annotation file that you can also use, but in that case you will only get what I have already pre-specified as being the interesting columns. I consider this a fall-back annotation method, and primarily intended for use with the non-model organism arrays for which there are not Bioc ChipDb packages.