read.idat and empty Symbols
1
0
Entering edit mode
h.mon • 0
@hmon-8976
Last seen 4.1 years ago
Brazil

I am reading Illumina Human HT-12 v4 Expression BeadChip with read.idat from the limma package. While the reading works apparently without problems, the resulting object has lots (3270, to be exact) of empty strings for genes$Symbols.

What may be causing this?

> idatfiles <- list.files( path = "../array", pattern = ".idat$", full.names = TRUE )
> bgxfile <- list.files( path = "../array", pattern = ".bgx$", full.names = TRUE )
> x <- read.idat( idatfiles, bgxfile, dateinfo = T )

> length( which( y$genes$Symbol == "", arr.ind = F ) )
[1] 3270
> y$genes[8446,]
         Probe_Id Array_Address_Id Symbol
8446 ILMN_1906423          5310327    

And here is one example of the correnponding annotation from the bgx file:

Homo sapiens    Unigene    Hs.390407    ILMN_89369    HS.390407    Hs.390407        Hs.390407        27828963    BX097705            ILMN_1906423    0005310327    S    640    GAGAGGCAGGGTGAAGAGGTCGAAGGAGCCTGAGTTAGCAGGGATGAGCA    2    -    87520225-87520274        BX097705 NCI_CGAP_Kid5 Homo sapiens cDNA clone IMAGp998E053890, mRNA sequence                    

 

> sessionInfo()

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux stretch/sid

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] GO.db_3.3.0                SummarizedExperiment_1.2.3 GenomicRanges_1.24.3      
 [4] GenomeInfoDb_1.8.7         RColorBrewer_1.1-2         pheatmap_1.0.8            
 [7] ggplot2_2.2.0              pathview_1.12.0            gage_2.22.0               
[10] org.Hs.eg.db_3.3.0         AnnotationDbi_1.34.4       IRanges_2.6.1             
[13] S4Vectors_0.10.3           Biobase_2.32.0             BiocGenerics_0.18.0       
[16] illuminaio_0.14.0          limma_3.28.21             

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8        plyr_1.8.4         XVector_0.12.1     tools_3.3.2       
 [5] zlibbioc_1.18.0    digest_0.6.10      base64_2.0         RSQLite_1.1       
 [9] memoise_1.0.0      tibble_1.2         gtable_0.2.0       png_0.1-7         
[13] KEGGgraph_1.30.0   graph_1.50.0       DBI_0.5-1          Rgraphviz_2.16.0  
[17] curl_2.3           httr_1.2.1         Biostrings_2.40.2  grid_3.3.2        
[21] R6_2.2.0           XML_3.98-1.5       org.Bt.eg.db_3.3.0 scales_0.4.1      
[25] KEGGREST_1.12.3    assertthat_0.1     colorspace_1.3-1   openssl_0.9.5     
[29] lazyeval_0.2.0     munsell_0.4.3    

illuminaio limma • 1.2k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 6 minutes ago
United States

Not everything has a HUGO symbol - those are reserved for transcripts that are considered to be an actual thing. In other words, the example you have put forth is an IMAGE clone, which is at present a hypothetical transcript that may or may not get transcribed in humans. This particular clone was uploaded to GenBank in 2003 and hasn't really been updated since, so I would bet nobody has ever detected it in the wild, so it just persists as a hypothetical in NCBI's databases.

ADD COMMENT
0
Entering edit mode

Thanks. I will investigate some other probes, but now I am less anxious abut the subject.

ADD REPLY

Login before adding your answer.

Traffic: 971 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6