I have noticed that there are two GRanges in AnnotationHub from the same url of ENCODE. However, they are different in the start locations.
library(AnnotationHub)
ahub <- AnnotationHub()
## snapshotDate(): 2016-10-11
qhs <- query(ahub, c("H1", "H3K27me3"))
qhs$title[1:15]
## [1] "wgEncodeBroadHistoneH1hescH3k27me3StdPk"
## [2] "wgEncodeBroadHistoneH1hescH3k27me3StdPk.broadPeak.gz"
## [3] "E003-H3K27me3.broadPeak.gz"
## [4] "E004-H3K27me3.broadPeak.gz"
## [5] "E005-H3K27me3.broadPeak.gz"
## [6] "E006-H3K27me3.broadPeak.gz"
## [7] "E007-H3K27me3.broadPeak.gz"
## [8] "E003-H3K27me3.narrowPeak.gz"
## [9] "E004-H3K27me3.narrowPeak.gz"
## [10] "E005-H3K27me3.narrowPeak.gz"
## [11] "E006-H3K27me3.narrowPeak.gz"
## [12] "E007-H3K27me3.narrowPeak.gz"
## [13] "E003-H3K27me3.gappedPeak.gz"
## [14] "E004-H3K27me3.gappedPeak.gz"
## [15] "E005-H3K27me3.gappedPeak.gz"
qhs[1] ## tags are different from qhs[2]
## AnnotationHub with 1 record
## # snapshotDate(): 2016-10-11
## # names(): AH761
## # $dataprovider: EncodeDCC
## # $species: Homo sapiens
## # $rdataclass: GRanges
## # $title: wgEncodeBroadHistoneH1hescH3k27me3StdPk
## # $description: wgEncodeBroadHistoneH1hescH3k27me3StdPk
## # $taxonomyid: 9606
## # $genome: hg19
## # $sourcetype: BED
## # $sourceurl: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/...
## # $sourcelastmodifieddate: NA
## # $sourcesize: NA
## # $tags: c("H3K27me3", "H1-hESC", "wgEncodeBroadHistone", "std",
## # "wgEncodeEH000088", "ChipSeq", "ENCODE Jan 2011 Freeze",
## # "2010-11-05", "2011-08-05", "wgEncodeEH000074", "GSM733748",
## # "Bernstein", "Broad", "e87748c5982ce0c3ba1a17562af463b2",
## # "hg18", "wgEncode", "exp", "426000", "ScriptureVPaperR3",
## # "2812", "wgEncodeBroadHistoneH1hescH3k27me3StdPk", "None",
## # "broadPeak", "Peaks")
## # retrieve record with 'object[["AH761"]]'
qhs[2]
## AnnotationHub with 1 record
## # snapshotDate(): 2016-10-11
## # names(): AH23276
## # $dataprovider: UCSC
## # $species: Homo sapiens
## # $rdataclass: GRanges
## # $title: wgEncodeBroadHistoneH1hescH3k27me3StdPk.broadPeak.gz
## # $description: broadPeak file from ENCODE
## # $taxonomyid: 9606
## # $genome: hg19
## # $sourcetype: BED
## # $sourceurl: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/...
## # $sourcelastmodifieddate: 2010-11-05
## # $sourcesize: 436199
## # $tags: c("wgEncode", "ChipSeq", "broadPeak", "H1-hESC cell",
## # "Bernstein grant")
## # retrieve record with 'object[["AH23276"]]'
## sourceurls are the same
stopifnot(qhs$sourceurl[1] == qhs$sourceurl[2])
## download it from source url
url <- qhs$sourceurl[1]
filename <- basename(url)
download.file(url, destfile = filename)
if (file.exists(filename))
sourcetable <- read.table(filename)
## sourcetable is 0-based ranges, because it is a bed
head(sourcetable)
## V1 V2 V3 V4 V5 V6 V7 V8 V9
## 1 chr22 16847787 16865007 . 319 . 2.615498 13.7 -1
## 2 chr22 16855946 16856477 . 554 . 7.896940 8.7 -1
## 3 chr22 16860583 16860804 . 609 . 9.135676 2.0 -1
## 4 chr22 17411031 17416061 . 294 . 2.068695 1.9 -1
## 5 chr22 17430525 17439251 . 291 . 1.993391 5.2 -1
## 6 chr22 17487493 17489498 . 368 . 3.718061 7.4 -1
## the below is 0-based ranges, which should be converted to 1-based
head(table_1 <- qhs[[1]]) # qhs[["AH761"]]
## GRanges object with 6 ranges and 5 metadata columns:
## seqnames ranges strand | name score
## <Rle> <IRanges> <Rle> | <character> <integer>
## [1] chr22 [16847787, 16865007] * | . 319
## [2] chr22 [16855946, 16856477] * | . 554
## [3] chr22 [16860583, 16860804] * | . 609
## [4] chr22 [17411031, 17416061] * | . 294
## [5] chr22 [17430525, 17439251] * | . 291
## [6] chr22 [17487493, 17489498] * | . 368
## signalValue pValue qValue
## <numeric> <numeric> <numeric>
## [1] 2.615498 13.7 -1
## [2] 7.896940 8.7 -1
## [3] 9.135676 2.0 -1
## [4] 2.068695 1.9 -1
## [5] 1.993391 5.2 -1
## [6] 3.718061 7.4 -1
## -------
## seqinfo: 24 sequences from hg19 genome
## the below is 1-based ranges, which is right
head(table_2 <- qhs[[2]]) # qhs[["AH23276"]]
## GRanges object with 6 ranges and 5 metadata columns:
## seqnames ranges strand | name score
## <Rle> <IRanges> <Rle> | <character> <numeric>
## [1] chr22 [16847788, 16865007] * | <NA> 319
## [2] chr22 [16855947, 16856477] * | <NA> 554
## [3] chr22 [16860584, 16860804] * | <NA> 609
## [4] chr22 [17411032, 17416061] * | <NA> 294
## [5] chr22 [17430526, 17439251] * | <NA> 291
## [6] chr22 [17487494, 17489498] * | <NA> 368
## signalValue pValue qValue
## <numeric> <numeric> <numeric>
## [1] 2.615498 13.7 -1
## [2] 7.896940 8.7 -1
## [3] 9.135676 2.0 -1
## [4] 2.068695 1.9 -1
## [5] 1.993391 5.2 -1
## [6] 3.718061 7.4 -1
## -------
## seqinfo: 93 sequences (1 circular) from hg19 genome
## I correct table_1 as below
table_1_corrected <- table_1
start(table_1_corrected) <- start(table_1_corrected) + 1L
## the below has been corrected now
head(table_1_corrected)
## GRanges object with 6 ranges and 5 metadata columns:
## seqnames ranges strand | name score
## <Rle> <IRanges> <Rle> | <character> <integer>
## [1] chr22 [16847788, 16865007] * | . 319
## [2] chr22 [16855947, 16856477] * | . 554
## [3] chr22 [16860584, 16860804] * | . 609
## [4] chr22 [17411032, 17416061] * | . 294
## [5] chr22 [17430526, 17439251] * | . 291
## [6] chr22 [17487494, 17489498] * | . 368
## signalValue pValue qValue
## <numeric> <numeric> <numeric>
## [1] 2.615498 13.7 -1
## [2] 7.896940 8.7 -1
## [3] 9.135676 2.0 -1
## [4] 2.068695 1.9 -1
## [5] 1.993391 5.2 -1
## [6] 3.718061 7.4 -1
## -------
## seqinfo: 24 sequences from hg19 genome
sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936
attached base packages:
[1] stats4 parallel stats graphics grDevices
[6] utils datasets methods base
other attached packages:
[1] BSgenome.Hsapiens.UCSC.hg19_1.4.0
[2] BSgenome_1.42.0
[3] Biostrings_2.42.1
[4] XVector_0.14.0
[5] rtracklayer_1.34.1
[6] GenomicRanges_1.26.1
[7] GenomeInfoDb_1.10.1
[8] IRanges_2.8.1
[9] S4Vectors_0.12.1
[10] AnnotationHub_2.6.4
[11] BiocGenerics_0.20.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.8
[2] BiocInstaller_1.24.0
[3] bitops_1.0-6
[4] tools_3.3.2
[5] zlibbioc_1.20.0
[6] digest_0.6.10
[7] lattice_0.20-34
[8] RSQLite_1.1
[9] memoise_1.0.0
[10] tibble_1.2
[11] Matrix_1.2-7.1
[12] shiny_0.14.2
[13] DBI_0.5-1
[14] curl_2.3
[15] yaml_2.1.14
[16] httr_1.2.1
[17] grid_3.3.2
[18] Biobase_2.34.0
[19] R6_2.2.0
[20] AnnotationDbi_1.36.0
[21] XML_3.98-1.5
[22] BiocParallel_1.8.1
[23] Rsamtools_1.26.1
[24] htmltools_0.3.5
[25] GenomicAlignments_1.10.0
[26] assertthat_0.1
[27] SummarizedExperiment_1.4.0
[28] mime_0.5
[29] interactiveDisplayBase_1.12.0
[30] xtable_1.8-2
[31] httpuv_1.3.3
[32] RCurl_1.95-4.8
Indeed, qhs[1] and qhs[2] are from the same track, so should one of them be deleted from the AnnotationHub? Or should the two be combined into one?
Are there any other wrong GRanges in AnnotationHub like what I have found here that should be corrected?
Thanks in advance,
Can Wang

Hi Valerie,
Thanks very much for your timely response and efficient work! I have checked and the old resources have been removed as you mentioned so users won't be confused about that problem. :)
Can