Question: AnnotationHub: A RPKM data.frame of Epigenomics RoadMap Project seems strange
gravatar for wcstcyx
15 months ago by
wcstcyx30 wrote:

Dear All,

I found that a RPKM data.frame seems strange. This data.frame is obtained from AnnotationHub and the source is from Epigenomics RoadMap Project. The below is codes help you see the problem.

ah <- AnnotationHub()
epiFiles <- query(ah, "EpigenomeRoadMap")
dfs <- subset(epiFiles, rdataclass == "data.frame")
# View(data.frame(dfs$title, dfs$description, dfs$sourceurl))
rpkm <- dfs[[8]]
# View(rpkm) # the title seems not right, and the last column are all NAs
# download it by myself
url <- dfs$sourceurl[8]
filename <-  basename(url)
download.file(url, destfile=filename)
if (file.exists(filename))
  myrpkm <- read.table(filename, header = TRUE, row.names = 1)
# View(myrpkm) # it seems right
# See
# =========================
# =========================
# in

My sessionInfo is

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936   
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C                                                   
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936    

attached base packages:
[1] parallel  stats    
[3] graphics  grDevices
[5] utils     datasets 
[7] methods   base     

other attached packages:
[1] AnnotationHub_2.5.12
[2] BiocGenerics_0.19.2 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.7                  
 [2] IRanges_2.7.17               
 [3] digest_0.6.10                
 [4] mime_0.5                     
 [5] R6_2.2.0                     
 [6] xtable_1.8-2                 
 [7] DBI_0.5-1                    
 [8] stats4_3.3.1                 
 [9] RSQLite_1.0.0                
[10] BiocInstaller_1.23.9         
[11] httr_1.2.1                   
[12] curl_2.1                     
[13] S4Vectors_0.11.18            
[14] tools_3.3.1                  
[15] Biobase_2.33.4               
[16] shiny_0.14.1                 
[17] httpuv_1.3.3                 
[18] AnnotationDbi_1.35.4         
[19] htmltools_0.3.5              
[20] interactiveDisplayBase_1.11.3

After I read 



I think the first column should be gene id and the first numeric column should be expression index of sample E000. So I load it by read.table(filename, header = TRUE, row.names = 1).

I found that more than one data.frame with this problem. Hope this kind of data could be reloaded appropriately by AnnotationHub.

Thanks in advance,
Can Wang

ADD COMMENTlink modified 14 months ago • written 15 months ago by wcstcyx30
gravatar for Valerie Obenchain
15 months ago by
Valerie Obenchain ♦♦ 6.4k
United States
Valerie Obenchain ♦♦ 6.4k wrote:

Hi Can,

Thanks for reporting this bug. As you described, the problem was how the data were read in, the gene_id column was not being used as the row names. This has been fixed in AnnotationHub 2.5.13 (devel) and 2.4.3 (release). Both should be available via biocLite() Thursday Oct. 13 after noon PST or from svn immediately.


ADD COMMENTlink written 15 months ago by Valerie Obenchain ♦♦ 6.4k

Thank you! I have checked that. It's OK now.


ADD REPLYlink written 15 months ago by wcstcyx30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 285 users visited in the last hour