Dear All,
I found that a RPKM data.frame seems strange. This data.frame is obtained from AnnotationHub and the source is from Epigenomics RoadMap Project. The below is codes help you see the problem.
library("AnnotationHub") ah <- AnnotationHub() epiFiles <- query(ah, "EpigenomeRoadMap") dfs <- subset(epiFiles, rdataclass == "data.frame") # View(data.frame(dfs$title, dfs$description, dfs$sourceurl)) rpkm <- dfs[[8]] # View(rpkm) # the title seems not right, and the last column are all NAs # download it by myself url <- dfs$sourceurl[8] filename <- basename(url) download.file(url, destfile=filename) if (file.exists(filename)) myrpkm <- read.table(filename, header = TRUE, row.names = 1) # View(myrpkm) # it seems right # See # ========================= # EXPRESSION QUANTIFICATION # ========================= # in http://egg2.wustl.edu/roadmap/data/byDataType/rna/README
My sessionInfo is
R version 3.3.1 (2016-06-21) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936 [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936 [4] LC_NUMERIC=C [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936 attached base packages: [1] parallel stats [3] graphics grDevices [5] utils datasets [7] methods base other attached packages: [1] AnnotationHub_2.5.12 [2] BiocGenerics_0.19.2 loaded via a namespace (and not attached): [1] Rcpp_0.12.7 [2] IRanges_2.7.17 [3] digest_0.6.10 [4] mime_0.5 [5] R6_2.2.0 [6] xtable_1.8-2 [7] DBI_0.5-1 [8] stats4_3.3.1 [9] RSQLite_1.0.0 [10] BiocInstaller_1.23.9 [11] httr_1.2.1 [12] curl_2.1 [13] S4Vectors_0.11.18 [14] tools_3.3.1 [15] Biobase_2.33.4 [16] shiny_0.14.1 [17] httpuv_1.3.3 [18] AnnotationDbi_1.35.4 [19] htmltools_0.3.5 [20] interactiveDisplayBase_1.11.3
After I read
EXPRESSION QUANTIFICATION
from http://egg2.wustl.edu/roadmap/data/byDataType/rna/README
I think the first column should be gene id and the first numeric column should be expression index of sample E000. So I load it by read.table(filename, header = TRUE, row.names = 1)
.
I found that more than one data.frame with this problem. Hope this kind of data could be reloaded appropriately by AnnotationHub.
Thanks in advance,
Can Wang
Thank you! I have checked that. It's OK now.
Can