How to annotate Data with oligo and Affymetrix HuGene-2_0-st package?
Entering edit mode
sebast1an • 0
Last seen 4 months ago

Dear everyone,

im a PhD-Student with zero background in bioinformatics and R but I have to analyze the expression of my Protein of interest in an online available Affymetrix Array. So what comes next is completely self-taught: I downloaded the RAW-Files (.CEL) and used this code in R:

> library(oligo)
> celFiles = list.celfiles()
> affyRaw <- read.celfiles(celFiles)
 > eset <- rma(affyRaw)

By using:


I get a text-file with my "normalized" Data, right? But now I have no idea how I can annotate my Data? I loaded the package: but I can't figure out, how to work this out...

Output of >sessioninfo()

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.0.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

[1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pd.hugene.2.0.st_3.14.1 DBI_1.1.3               RSQLite_2.2.17          oligo_1.60.0            Biostrings_2.64.1       GenomeInfoDb_1.32.4    
 [7] XVector_0.36.0          IRanges_2.30.1          S4Vectors_0.34.0        Biobase_2.56.0          oligoClasses_1.58.0     BiocGenerics_0.42.0    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9                  compiler_4.2.1              BiocManager_1.30.18         MatrixGenerics_1.8.1        bitops_1.0-7               
 [6] iterators_1.0.14            tools_4.2.1                 zlibbioc_1.42.0             bit_4.0.4                   memoise_2.0.1              
[11] preprocessCore_1.58.0       lattice_0.20-45             ff_4.0.7                    pkgconfig_2.0.3             rlang_1.0.5                
[16] Matrix_1.5-1                foreach_1.5.2               cli_3.4.0                   DelayedArray_0.22.0         fastmap_1.1.0              
[21] GenomeInfoDbData_1.2.8      affxparser_1.68.1           vctrs_0.4.1                 bit64_4.0.5                 grid_4.2.1                 
[26] blob_1.2.3                  codetools_0.2-18            matrixStats_0.62.0          GenomicRanges_1.48.0        splines_4.2.1              
[31] SummarizedExperiment_1.26.1 RCurl_1.98-1.8              cachem_1.0.6                crayon_1.5.1                affyio_1.66.0

Thank you so much!

oligo • 419 views
Entering edit mode
Last seen 9 hours ago
United States

The simple solution is

eset <- annotateEset(eset, hugene20sttranscriptcluster.db)

And then after you fit the model using limma your topTable output will be automatically annotated with the Gene ID, symbol, and gene name. You can add other things in as well, but those are the defaults.

I would normally not do something like write.exprs because what are you going to do with those data that is more sophisticated than fitting models using limma? You can always fit the model and then output the entire topTable to Excel if that's your jam, but usually I would generate an HTML document and use Glimma to make an interactive plot of the data which people seem to like bigly.

Entering edit mode
sebast1an • 0
Last seen 4 months ago

Thank you so much for taking the time!

I just want to look at one specific gene and see how It is expressed in different tumor grades. So I wanted to avoid fitting a model and using limma...

By using ur solution:

eset <- annotateEset(eset, hugene20sttranscriptcluster.db)

I get a perfect table with the SPOT_ID and the expression value. How do I get the annotation with the Gene_ID, symbol etc. without using limma?

I was thinking of a table like this:

k <- head(keys(hugene20sttranscriptcluster.db, keytpye="PROBEID"))
> select (hugene20sttranscriptcluster.db, keys=k, columns=c("SYMBOL","GENENAME"), keytype="PROBEID")

And how do I extract this as a text file?

Thank you so much!

Entering edit mode

Unless you are providing an answer, use the ADD COMMENT button.

If you are set on doing an eyeballometric analysis, then something like

out <- data.frame(pData(eset), exprs(eset))
write.table(out, "My gene data.txt", sep = "\t", quote = FALSE, row.names = FALSE)

Will do what you want.

Entering edit mode

And if you are planning on opening that file in Excel, don't. You will convert any number of symbols to date format and will not be able to revert. Instead do

write.xlsx(out, "my gene data.xlsx")
Entering edit mode

When I do it like this, I get:

index   GPX_LY046_HuGene2.CEL
1   2.57320452441004
1   3.3205696954071
1   3.76983715048658
1   3.12194407738683

So i am losing the annotation?

I think my biggest issue right now is that I still have the Spot_ID and not the GENE_ID annotated..

Entering edit mode

Oh, my bad. It should be

out <- data.frame(fData(eset), exprs(eset))

As an example

> out <- data.frame(fData(eset), exprs(eset))
> subset(out, SYMBOL %in% c("BRCA1","BRCA2"))
          PROBEID ENTREZID SYMBOL                    GENENAME  Sample1  Sample2
16773840 16773840      675  BRCA2 BRCA2 DNA repair associated 4.763485 4.901088
16845349 16845349      672  BRCA1 BRCA1 DNA repair associated 5.458340 4.986312
          Sample3  Sample4  Sample5  Sample6  Sample7  Sample8  Sample9
16773840 4.677338 4.508517 5.015828 5.131274 4.173618 3.839428 4.711530
16845349 5.283993 5.078028 5.349303 5.449031 5.342959 5.209156 5.351528
16773840 3.953347
16845349 5.331211

Login before adding your answer.

Traffic: 329 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6