Seeking Advice on HTA 2.0 Microarray Data Processing with Oligo Package
Entering edit mode
Yao Lipu • 0
Last seen 6 days ago
United States

Dear Bioconductor Community,

I am currently working with HTA 2.0 microarray data and using the oligo package for RMA normalization. I have encountered a few questions and would appreciate any guidance.The platform I use is GPL17586.

(1) After performing RMA normalization with oligo::rma(), I noticed that the extracted matrix contains multiple types of probe IDs. I would like to conduct differential expression analysis using gene symbols. Is it appropriate to directly convert the probe IDs to gene symbols, or is there a risk of introducing errors in the process?

enter image description here

Probetype: 2924323_st , TC10002874.hg.1,there are two kinds.

(2)The HTA 2.0 microarray includes both "gene" and "exon" level data, but I am only interested in gene-level expression. How can I properly distinguish and extract gene-level information while ensuring the integrity of my analysis?

hta20transcriptcluster.db org.Sc.sgd.db HTA2.0 MicroRNAArrayData • 299 views
Entering edit mode
Last seen 38 minutes ago
United States

As an example

> library(GEOquery)
> library(oligo)
> library(limma)
> library(affycoretools)
## you need this package to annotate
> library(hta20transcriptcluster.db)
## some example data
> getGEOSuppFiles("GSE54143")
> setwd("GSE54143/")
> untar("GSE54143_RAW.tar")
> dat <- read.celfiles(dir(".", "gz$"))
> eset <- rma(dat)
## this function is in my affycoretools package
> eset <- annotateEset(eset, hta20transcriptcluster.db)
> head(fData(eset))
              PROBEID ENTREZID
2824546_st 2824546_st     <NA>
2824549_st 2824549_st     <NA>
2824551_st 2824551_st     <NA>
2824554_st 2824554_st     <NA>
2827992_st 2827992_st     <NA>
2827995_st 2827995_st     <NA>
2824546_st   <NA>     <NA>
2824549_st   <NA>     <NA>
2824551_st   <NA>     <NA>
2824554_st   <NA>     <NA>
2827992_st   <NA>     <NA>
2827995_st   <NA>     <NA>
> tab <- table(fData(eset)$SYMBOL)
> table(tab)
    1     2     3     4     5     6 
19246  4122   284    47    27    33 
    7     8     9    10    11    12 
   62    42     7     5     3     9 
   13    14    15    16    18    22 
   10     5     5     4     1     1 
> tab[tab > 14L]

       BTNL2        DHX16 
          16           16 
      DUX4L2        HCG23 
          16           15 
LOC100507547         LST1 
          15           15 
         LTB    POLR1HASP 
          22           15 
   PSMB8-AS1          TNF 
          15           16 
## remove unannotated stuff
> eset2 <- eset[!$SYMBOL),]
## average duplicates
> avg <- avereps(exprs(eset2), ID = fData(eset2)$SYMBOL)
> head(avg[,1])
DDX11L1                 5.220588
OR4F5                   1.977769
LINC01001               8.980209
LINC01061               9.255836
OR4F29                  1.773047
LOC101928626            2.862830

> any(duplicated(rownames(avg)))
Entering edit mode

Thank you for your help! Your explanation was very clear and really helped me solve my issue. I truly appreciate your time and effort!


Login before adding your answer.

Traffic: 595 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6