Search
Question: AnnotationHub: A RPKM data.frame of Epigenomics RoadMap Project seems strange
1
gravatar for wcstcyx
13 months ago by
wcstcyx30
China/Beijing/AMSS,CAS
wcstcyx30 wrote:

Dear All,

I found that a RPKM data.frame seems strange. This data.frame is obtained from AnnotationHub and the source is from Epigenomics RoadMap Project. The below is codes help you see the problem.

library("AnnotationHub")
ah <- AnnotationHub()
epiFiles <- query(ah, "EpigenomeRoadMap")
dfs <- subset(epiFiles, rdataclass == "data.frame")
# View(data.frame(dfs$title, dfs$description, dfs$sourceurl))
rpkm <- dfs[[8]]
# View(rpkm) # the title seems not right, and the last column are all NAs
# download it by myself
url <- dfs$sourceurl[8]
filename <-  basename(url)
download.file(url, destfile=filename)
if (file.exists(filename))
  myrpkm <- read.table(filename, header = TRUE, row.names = 1)
# View(myrpkm) # it seems right
# See
# =========================
# EXPRESSION QUANTIFICATION
# =========================
# in http://egg2.wustl.edu/roadmap/data/byDataType/rna/README

My sessionInfo is

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936   
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C                                                   
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936    

attached base packages:
[1] parallel  stats    
[3] graphics  grDevices
[5] utils     datasets 
[7] methods   base     

other attached packages:
[1] AnnotationHub_2.5.12
[2] BiocGenerics_0.19.2 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.7                  
 [2] IRanges_2.7.17               
 [3] digest_0.6.10                
 [4] mime_0.5                     
 [5] R6_2.2.0                     
 [6] xtable_1.8-2                 
 [7] DBI_0.5-1                    
 [8] stats4_3.3.1                 
 [9] RSQLite_1.0.0                
[10] BiocInstaller_1.23.9         
[11] httr_1.2.1                   
[12] curl_2.1                     
[13] S4Vectors_0.11.18            
[14] tools_3.3.1                  
[15] Biobase_2.33.4               
[16] shiny_0.14.1                 
[17] httpuv_1.3.3                 
[18] AnnotationDbi_1.35.4         
[19] htmltools_0.3.5              
[20] interactiveDisplayBase_1.11.3

After I read 

EXPRESSION QUANTIFICATION

from http://egg2.wustl.edu/roadmap/data/byDataType/rna/README

I think the first column should be gene id and the first numeric column should be expression index of sample E000. So I load it by read.table(filename, header = TRUE, row.names = 1).

I found that more than one data.frame with this problem. Hope this kind of data could be reloaded appropriately by AnnotationHub.

Thanks in advance,
Can Wang

ADD COMMENTlink modified 12 months ago • written 13 months ago by wcstcyx30
2
gravatar for Valerie Obenchain
13 months ago by
Valerie Obenchain ♦♦ 6.4k
United States
Valerie Obenchain ♦♦ 6.4k wrote:

Hi Can,

Thanks for reporting this bug. As you described, the problem was how the data were read in, the gene_id column was not being used as the row names. This has been fixed in AnnotationHub 2.5.13 (devel) and 2.4.3 (release). Both should be available via biocLite() Thursday Oct. 13 after noon PST or from svn immediately.

Valerie

ADD COMMENTlink written 13 months ago by Valerie Obenchain ♦♦ 6.4k

Thank you! I have checked that. It's OK now.

Can

ADD REPLYlink written 13 months ago by wcstcyx30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 150 users visited in the last hour