Question

How to annotate Data with oligo and Affymetrix HuGene-2_0-st package?

0

Entering edit mode

sebast1an • 0

@ab9fa7c8

Last seen 19 months ago

Germany

Dear everyone,

im a PhD-Student with zero background in bioinformatics and R but I have to analyze the expression of my Protein of interest in an online available Affymetrix Array. So what comes next is completely self-taught: I downloaded the RAW-Files (.CEL) and used this code in R:


> library(oligo)
> celFiles = list.celfiles()
> affyRaw <- read.celfiles(celFiles)
 > eset <- rma(affyRaw)

By using:

>write.exprs(eset,file="data.txt")

I get a text-file with my "normalized" Data, right? But now I have no idea how I can annotate my Data? I loaded the package: pd.hugene.2.0.st but I can't figure out, how to work this out...

Output of >sessioninfo()

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.0.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pd.hugene.2.0.st_3.14.1 DBI_1.1.3               RSQLite_2.2.17          oligo_1.60.0            Biostrings_2.64.1       GenomeInfoDb_1.32.4    
 [7] XVector_0.36.0          IRanges_2.30.1          S4Vectors_0.34.0        Biobase_2.56.0          oligoClasses_1.58.0     BiocGenerics_0.42.0    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9                  compiler_4.2.1              BiocManager_1.30.18         MatrixGenerics_1.8.1        bitops_1.0-7               
 [6] iterators_1.0.14            tools_4.2.1                 zlibbioc_1.42.0             bit_4.0.4                   memoise_2.0.1              
[11] preprocessCore_1.58.0       lattice_0.20-45             ff_4.0.7                    pkgconfig_2.0.3             rlang_1.0.5                
[16] Matrix_1.5-1                foreach_1.5.2               cli_3.4.0                   DelayedArray_0.22.0         fastmap_1.1.0              
[21] GenomeInfoDbData_1.2.8      affxparser_1.68.1           vctrs_0.4.1                 bit64_4.0.5                 grid_4.2.1                 
[26] blob_1.2.3                  codetools_0.2-18            matrixStats_0.62.0          GenomicRanges_1.48.0        splines_4.2.1              
[31] SummarizedExperiment_1.26.1 RCurl_1.98-1.8              cachem_1.0.6                crayon_1.5.1                affyio_1.66.0

Thank you so much!

oligo • 1.4k views

ADD COMMENT • link updated 19 months ago by James W. MacDonald 65k • written 19 months ago by sebast1an • 0

score 1 · Answer 1 · 2022-09-16

The simple solution is

library(affycoretools)
library(hugene20sttranscriptcluster.db)
eset <- annotateEset(eset, hugene20sttranscriptcluster.db)

And then after you fit the model using limma your topTable output will be automatically annotated with the Gene ID, symbol, and gene name. You can add other things in as well, but those are the defaults.

I would normally not do something like write.exprs because what are you going to do with those data that is more sophisticated than fitting models using limma? You can always fit the model and then output the entire topTable to Excel if that's your jam, but usually I would generate an HTML document and use Glimma to make an interactive plot of the data which people seem to like bigly.

score 0 · Answer 2 · 2022-09-16

0

Entering edit mode

sebast1an • 0

@ab9fa7c8

Last seen 19 months ago

Germany

Thank you so much for taking the time!

I just want to look at one specific gene and see how It is expressed in different tumor grades. So I wanted to avoid fitting a model and using limma...

By using ur solution:

library(affycoretools)
library(hugene20sttranscriptcluster.db)
eset <- annotateEset(eset, hugene20sttranscriptcluster.db)

I get a perfect table with the SPOT_ID and the expression value. How do I get the annotation with the Gene_ID, symbol etc. without using limma?

I was thinking of a table like this:

k <- head(keys(hugene20sttranscriptcluster.db, keytpye="PROBEID"))
> select (hugene20sttranscriptcluster.db, keys=k, columns=c("SYMBOL","GENENAME"), keytype="PROBEID")

And how do I extract this as a text file?

Thank you so much!

ADD COMMENT • link 19 months ago sebast1an • 0

0

Entering edit mode

Unless you are providing an answer, use the ADD COMMENT button.

If you are set on doing an eyeballometric analysis, then something like

out <- data.frame(pData(eset), exprs(eset))
write.table(out, "My gene data.txt", sep = "\t", quote = FALSE, row.names = FALSE)

Will do what you want.

ADD REPLY • link 19 months ago James W. MacDonald 65k

0

Entering edit mode

And if you are planning on opening that file in Excel, don't. You will convert any number of symbols to date format and will not be able to revert. Instead do

library(openxlsx)
write.xlsx(out, "my gene data.xlsx")

ADD REPLY • link 19 months ago James W. MacDonald 65k

0

Entering edit mode

When I do it like this, I get:

index   GPX_LY046_HuGene2.CEL
1   2.57320452441004
1   3.3205696954071
1   3.76983715048658
1   3.12194407738683
...

So i am losing the annotation?

I think my biggest issue right now is that I still have the Spot_ID and not the GENE_ID annotated..

ADD REPLY • link 19 months ago sebast1an • 0

0

Entering edit mode

Oh, my bad. It should be

out <- data.frame(fData(eset), exprs(eset))

As an example

> out <- data.frame(fData(eset), exprs(eset))
> subset(out, SYMBOL %in% c("BRCA1","BRCA2"))
          PROBEID ENTREZID SYMBOL                    GENENAME  Sample1  Sample2
16773840 16773840      675  BRCA2 BRCA2 DNA repair associated 4.763485 4.901088
16845349 16845349      672  BRCA1 BRCA1 DNA repair associated 5.458340 4.986312
          Sample3  Sample4  Sample5  Sample6  Sample7  Sample8  Sample9
16773840 4.677338 4.508517 5.015828 5.131274 4.173618 3.839428 4.711530
16845349 5.283993 5.078028 5.349303 5.449031 5.342959 5.209156 5.351528
         Sample10
16773840 3.953347
16845349 5.331211

ADD REPLY • link 19 months ago James W. MacDonald 65k