Question

HUgene 2.0 analyses with CEL files, oligo package

0

Entering edit mode

kanacska ▴ 10

@kanacska-7375

Last seen 10.4 years ago

Hungary

Hi!

I want to do a Vendiagramm of 2 samples(3-3 repeats) and a list of differencial expressed genes, with hugene 2.0 CEL files, and i'm using oligo package.

Has someone got a pipeline?? Or can someone help with the following:

celdir <- create celdir path
setwd(celdir)
setwd(celdir) #set working directory

# create sampleNames from the list of celFiles - substitute file names with sample names
raw<- list.celfiles('path of my cell files', full.names=TRUE)
filename2sample <- ('path of the txt')
fn2sp <- read.table(filename2sample, header=T)
sNames <- unlist(lapply(celFiles, convertFilename2Sample, tabledict=fn2sp)) # co

#I've found this for reading cell file

affyGeneFS <- read.celfiles(raw)
norm <- rma(affyGeneFS, target = "probeset")

Background correcting
Normalizing
Calculating Expression
Warning message:
'isIdCurrent' is deprecated.
Use 'dbIsValid' instead.
See help("Deprecated")

What do i have to do if I get these message, or if i can continue working then it can stay like that???

expr<- exprs(norm)

AFTER THIS STEP CAN I USE limma as same as in affy u133 plus??

And in u133 there was a function for annotation:

library(hgu133plus2.db)

transcriptclusterid2symbol_many <- select(hgu133plus2.db,
keys = keys(hgu133plus2.db),
columns=c("PROBEID","ENTREZID","SYMBOL","MAP"),
keytype="PROBEID")

Is there a same function in hugene 2.0??

Cheers,

Anna

ps.

> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=Hungarian_Hungary.1250 LC_CTYPE=Hungarian_Hungary.1250
[3] LC_MONETARY=Hungarian_Hungary.1250 LC_NUMERIC=C
[5] LC_TIME=Hungarian_Hungary.1250

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] pd.hugene.2.0.st_3.10.0 RSQLite_1.0.0 DBI_0.3.1
[4] oligo_1.30.0 Biostrings_2.34.1 XVector_0.6.0
[7] IRanges_2.0.1 S4Vectors_0.4.0 oligoClasses_1.28.0
[10] pheatmap_1.0.2 limma_3.22.7 optparse_1.3.0
[13] hgu133plus2cdf_2.15.0 affy_1.44.0 Biobase_2.26.0
[16] BiocGenerics_0.12.1 BiocInstaller_1.16.2

loaded via a namespace (and not attached):
[1] affxparser_1.38.0 affyio_1.34.0 AnnotationDbi_1.28.2 bit_1.1-12
[5] codetools_0.2-11 colorspace_1.2-6 ff_2.2-13 foreach_1.4.2
[9] GenomeInfoDb_1.2.4 GenomicRanges_1.18.4 getopt_1.20.0 grid_3.1.3
[13] gtable_0.1.2 iterators_1.0.7 munsell_0.4.2 plyr_1.8.1
[17] preprocessCore_1.28.0 RColorBrewer_1.1-2 Rcpp_0.11.5 scales_0.2.4
[21] splines_3.1.3 tools_3.1.3 zlibbioc_1.12.0

microarray annotation hgu133plus2 hugene20 • 2.9k views

ADD COMMENT • link 10.8 years ago kanacska ▴ 10

score 0 · Answer 1 · 2015-03-23

Three things:

1.) You probably don't want to summarize a Gene ST array at the probeset level, so just use

norm <- rma(affyGeneFS)

And you don't have to worry about the warning messages.

2.) There is no need to extract the expression values from your 'norm' exprSet. The limma package will do that for you. And you can use limma, just as you would for any other Affy arrays.

3.) The annotation for this array is supplied in the hugene20sttranscriptcluster.db package (if you summarize at the 'core' level, which I highly recommend). If you have used limma, then you will have an MArrayLM object that you can annotate.

design <- model.matrix(<args go here>)

contrast <- makeContrasts(<args go here>)

fit <- lmFit(norm, design)

fit2 <- contrasts.fit(fit, contrast)

fit2 <- eBayes(fit2)

gns <- select(hugene20sttranscriptcluster.db, row.names(fit2$coef), c("ENTREZID","SYMBOL","MAP"))

## here you need to account for the one-to-many mappings that result. My usual solution is

gns <- gns[!duplicated(gns[,1]),]

fit2$genes <- gns

if(!isTRUE(all.equal(row.names(fit2$coef), fit2$genes$PROBEID)))

stop("There has been an error when annotating these data, check the gns object!\n")

Then topTable() will output tables with the annotation appended.

score 0 · Answer 2 · 2015-03-24

Dear James,

Thank you for xour quick help!!

Unfortunatly I have another question: when i want to orther my toptable sort.by ='t'

logFC AveExpr t P.Value adj.P.Val B
17047868 -3.528982 7.859722 -23.23726 3.602484e-08 0.001603732 4.176879
17059383 -3.599709 8.576114 -21.67394 5.982178e-08 0.001603732 4.093493
17056791 1.562905 7.379065 17.19374 3.204670e-07 0.005727493 3.739025
17059491 -3.694398 10.375221 -14.72316 9.775825e-07 0.013103760 3.423587
16996017 -1.823099 7.487446 -13.84718 1.515446e-06 0.016250736 3.279348
16675045 1.534715 6.423220 12.85058 2.578229e-06 0.023039484 3.087940

..........................

R thinks plus is minus. What do i have to do so it does my list in a right orther from minus till plus.

Plus: what do i have to do if i get the gene symbol list but some places it says NA, what do i have to do to get all of the ProbeID-s genesymbol in R??

Thank you

Anna