Question: AnnotationHub: A RPKM data.frame of Epigenomics RoadMap Project seems strange
15 months ago by
wcstcyx30 wrote:

Dear All,

I found that a RPKM data.frame seems strange. This data.frame is obtained from AnnotationHub and the source is from Epigenomics RoadMap Project. The below is codes help you see the problem.

ah <- AnnotationHub()
epiFiles <- query(ah, "EpigenomeRoadMap")
dfs <- subset(epiFiles, rdataclass == "data.frame")
# View(data.frame(dfs$title, dfs$description, dfs$sourceurl))
rpkm <- dfs[[8]]
# View(rpkm) # the title seems not right, and the last column are all NAs
# download it by myself
url <- dfs$sourceurl[8]
filename <-  basename(url)
download.file(url, destfile=filename)
if (file.exists(filename))
  myrpkm <- read.table(filename, header = TRUE, row.names = 1)
# View(myrpkm) # it seems right
# See
# =========================
# =========================
# in

My sessionInfo is

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936   
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C                                                   
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936    

attached base packages:
[1] parallel  stats    
[3] graphics  grDevices
[5] utils     datasets 
[7] methods   base     

other attached packages:
[1] AnnotationHub_2.5.12
[2] BiocGenerics_0.19.2 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.7                  
 [2] IRanges_2.7.17               
 [3] digest_0.6.10                
 [4] mime_0.5                     
 [5] R6_2.2.0                     
 [6] xtable_1.8-2                 
 [7] DBI_0.5-1                    
 [8] stats4_3.3.1                 
 [9] RSQLite_1.0.0                
[10] BiocInstaller_1.23.9         
[11] httr_1.2.1                   
[12] curl_2.1                     
[13] S4Vectors_0.11.18            
[14] tools_3.3.1                  
[15] Biobase_2.33.4               
[16] shiny_0.14.1                 
[17] httpuv_1.3.3                 
[18] AnnotationDbi_1.35.4         
[19] htmltools_0.3.5              
[20] interactiveDisplayBase_1.11.3

After I read 



I think the first column should be gene id and the first numeric column should be expression index of sample E000. So I load it by read.table(filename, header = TRUE, row.names = 1).

I found that more than one data.frame with this problem. Hope this kind of data could be reloaded appropriately by AnnotationHub.

Thanks in advance,
Can Wang

modified 14 months ago • written 15 months ago by wcstcyx30
15 months ago by
Valerie Obenchain ♦♦ 6.4k
United States
Valerie Obenchain ♦♦ 6.4k wrote:

Hi Can,

Thanks for reporting this bug. As you described, the problem was how the data were read in, the gene_id column was not being used as the row names. This has been fixed in AnnotationHub 2.5.13 (devel) and 2.4.3 (release). Both should be available via biocLite() Thursday Oct. 13 after noon PST or from svn immediately.


written 15 months ago by Valerie Obenchain ♦♦ 6.4k

Thank you! I have checked that. It's OK now.


written 15 months ago by wcstcyx30
