Questions about analysis of Human Gene 2.0 ST Array
2
0
Entering edit mode
Biologist ▴ 110
@biologist-9801
Last seen 4.1 years ago

Hi All,

I have tried to process data from Affymetrix Human Gene 2.0 ST Array. This is my first time investigating data from this array. I have a couple of questions. I would appreciate any advice in advance.

1) I have realized "affy" package does not work for data from this array, and found "oligo" package does work. My normalizing processes are the followings:

celfiles = list.files(path = ".", pattern = ".CEL$", all.files = FALSE,
                      full.names = FALSE, recursive = FALSE, ignore.case = FALSE)
Data <- read.celfiles(celfiles)
celfiles.rma <- rma(Data, target="core")

celfiles.rma

ExpressionSet (storageMode: lockedEnvironment)
assayData: 53617 features, 6 samples 
  element names: exprs 
protocolData
  rowNames: #8_(HuGene-2_0-st).CEL E_(HuGene-2_0-st).CEL ... T_(HuGene-2_0-st).CEL (6 total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: #8_(HuGene-2_0-st).CEL E_(HuGene-2_0-st).CEL ... T_(HuGene-2_0-st).CEL (6 total)
  varLabels: index
  varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.hugene.2.0.st

Now, How can I to remove control probesets from ST arrays? Thank you

James W. MacDonald  Could you please help me in this?

oligo • 2.7k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 20 hours ago
United States
library(affycoretools)
celfiles.rma <- getMainProbes(celfiles.rma)

 

ADD COMMENT
0
Entering edit mode

Hi James,

Im working on Affymetrix Human Gene 2.0 ST Array.

library(oligo)
#Read cel files
celfiles = list.files(path = ".", pattern = ".CEL$", all.files = FALSE,
                      full.names = FALSE, recursive = FALSE, ignore.case = FALSE)

celfiles
[1] "#8_(HuGene-2_0-st).CEL"  "E_(HuGene-2_0-st).CEL"   "GG7_(HuGene-2_0-st).CEL"
[4] "J_(HuGene-2_0-st).CEL"   "O_(HuGene-2_0-st).CEL"   "T_(HuGene-2_0-st).CEL"                   
Data <- read.celfiles(celfiles)
#RMA
celfiles.rma <- rma(Data, target="core")

library(limma)

design <- model.matrix(~ 0+factor(c(1,1,1,2,2,2)))

colnames(design) <- c("group1", "group2")
fit <- lmFit(celfiles.rma, design)

contrast.matrix <- makeContrasts(group2-group1, group1-group2, levels=design)

fit2 <- contrasts.fit(fit, contrast.matrix)

fit2 <- eBayes(fit2)

tab <- topTable(fit2, coef=2, adjust="fdr", sort.by="none", number=Inf)

head(tab)
               logFC   AveExpr           t    P.Value adj.P.Val         B
16650001 -0.67713091 0.7013795 -2.11246203 0.07463104 0.8414740 -3.923065
16650003  0.01775323 0.8173575  0.05752413 0.95581904 0.9922585 -5.247542
16650005  0.20727356 1.2926154  0.28654513 0.78318782 0.9636612 -5.214740
16650007  0.13198594 0.8028477  0.37395467 0.72008320 0.9502862 -5.191016
16650009 -0.31117759 0.7187522 -0.92432176 0.38765340 0.8815813 -4.917247
16650011 -0.18142907 0.7450577 -0.46483921 0.65689248 0.9382330 -5.160082

write.table(tab, file="DEG.xls",row.names=F, sep="\t")

results <- tab[which(tab$logFC >= 1.5 & tab$P.Value <= 0.05),]
write.table(results, file="DEGp.xls", row.names=T, sep="\t")

idx = which(tab$P.Value < 0.05 & tab$logFC > 1.5)

heatmap(exprs(celfiles.rma[idx,]),trace='none',scale='row')

Could you please tell me how can I add the annotation for the genes?

Thank you

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 20 hours ago
United States

I have added a function called annotateEset in the devel version of affycoretools, which will make this dead simple. The only slight hiccup is that you need to use R-devel and the devel version of Bioconductor. But we are getting really close to release, so the devel versions are likely to be OK.

After installing R-devel and Bioc-devel (including affycoretools, obvs), you would just do

celfiles.rma <- annotateEset(celfiles.rma, hugene20sttranscriptcluster.db)

OR

celfiles.rma <- annotateEset(celfiles.rma, pd.hugene.2.0.st)

which will annotate your ExpressionSet, and those annotations will then be propagated through to your MArrayLM object, and to your topTable output. The annotation will be in the 'genes' list item of the MArrayLM object, and you can get that directly using the $ operator, if you need the annotation for anything else.

The difference in the second argument has to do with the source of the annotation. The usual way to annotate is to use the ChipDb package (e.g., hugene20sttranscriptcluster.db), in which case you can use an additional argument of the columns you want to annotate with.

The Gene ST arrays all come with a parsed version of the Affy annotation file that you can also use, but in that case you will only get what I have already pre-specified as being the interesting columns. I consider this a fall-back annotation method, and primarily intended for use with the non-model organism arrays for which there are not Bioc ChipDb packages.

ADD COMMENT

Login before adding your answer.

Traffic: 493 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6