I'm getting the opposite answer than I would expect. This has nothing to do with getting opposite behaviour between '+' and '-' strand. My problem here is all on the '+' strand.
# Subject ref <- GenomicRanges::makeGRangesFromDataFrame( structure(list(seqnames = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), class = "factor", .Label = c("Chr1", "Chr2", "Chr3", "Chr4", "Chr5", "mitochondria", "chloroplast")), start = c(3631L, 5928L, 11649L, 23146L, 28500L, 31170L, 33379L, 38752L, 44677L, 45296L), end = c(5899L, 8737L, 13714L, 31227L, 28706L, 33153L, 37871L, 40944L, 44787L, 47019L), width = c(2269L, 2810L, 2066L, 8082L, 207L, 1984L, 4493L, 2193L, 111L, 1724L), strand = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L), class = "factor", .Label = c("+", "-", "*")), gene_id = c("AT1G01010", "AT1G01020", "AT1G01030", "AT1G01040", "AT1G01046", "AT1G01050", "AT1G01060", "AT1G01070", "AT1G01073", "AT1G01080")), .Names = c("seqnames", "start", "end", "width", "strand", "gene_id"), row.names = c("AT1G01010", "AT1G01020", "AT1G01030", "AT1G01040", "AT1G01046", "AT1G01050", "AT1G01060", "AT1G01070", "AT1G01073", "AT1G01080"), class = "data.frame") ) # Query qry <- GenomicRanges::makeGRangesFromDataFrame( structure(list(seqnames = structure(1L, class = "factor", .Label = "Chr1"), start = 31137L, end = 31183L, width = 47L, strand = structure(1L, class = "factor", .Label = c("+", "-", "*")), value = 9.90299791781146, area = 465.440902137139, indexStart = 39L, indexEnd = 85L, cluster = 2L, clusterL = 47), .Names = c("seqnames", "start", "end", "width", "strand", "value", "area", "indexStart", "indexEnd", "cluster", "clusterL"), row.names = "2", class = "data.frame") ) # Overlap looks as expected. # The 4th reference directly overlaps and is on the correct strand. # The 6th would also overlap but is on the wrong strand. GenomicRanges::findOverlaps(qry, ref, ignore.strand=FALSE) # Getting the upstream gene does not produce the expected result! # Region 9 is to the RIGHT, DOWNstream of my query. GenomicRanges::precede(qry, ref, ignore.strand=FALSE) # Getting the downstream gene does not produce the expected result either! # Region 5 is to the LEFT, UPstream of my query! GenomicRanges::follow(qry, ref, ignore.strand=FALSE)
Either follow/precede don't do what I think they do, or they have a very convoluted way of defining which way is up and down.
My session (Issue also reproducible on older versions of Bioconductor):
R version 3.4.1 (2017-06-30) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Sierra 10.12.6 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] derfinder_1.10.6 loaded via a namespace (and not attached): [1] Biobase_2.36.2 bit64_0.9-7 splines_3.4.1 foreach_1.4.3 GenomicFiles_1.12.0 [6] Formula_1.2-2 bumphunter_1.16.0 stats4_3.4.1 latticeExtra_0.6-28 doRNG_1.6.6 [11] blob_1.1.0 BSgenome_1.44.1 GenomeInfoDbData_0.99.0 Rsamtools_1.28.0 RSQLite_2.0 [16] backports_1.1.0 lattice_0.20-35 digest_0.6.12 GenomicRanges_1.28.4 RColorBrewer_1.1-2 [21] XVector_0.16.0 checkmate_1.8.3 qvalue_2.8.0 colorspace_1.3-2 htmltools_0.3.6 [26] Matrix_1.2-11 plyr_1.8.4 XML_3.98-1.9 biomaRt_2.32.1 zlibbioc_1.22.0 [31] xtable_1.8-2 scales_0.5.0 BiocParallel_1.10.1 htmlTable_1.9 tibble_1.3.4 [36] pkgmaker_0.22 IRanges_2.10.3 ggplot2_2.2.1 SummarizedExperiment_1.6.3 GenomicFeatures_1.28.4 [41] nnet_7.3-12 BiocGenerics_0.22.0 lazyeval_0.2.0 survival_2.41-3 magrittr_1.5 [46] memoise_1.1.0 foreign_0.8-69 tools_3.4.1 registry_0.3 data.table_1.10.4 [51] matrixStats_0.52.2 stringr_1.2.0 S4Vectors_0.14.3 locfit_1.5-9.1 munsell_0.4.3 [56] rngtools_1.2.4 cluster_2.0.6 DelayedArray_0.2.7 AnnotationDbi_1.38.2 Biostrings_2.44.2 [61] compiler_3.4.1 GenomeInfoDb_1.12.2 rlang_0.1.2 grid_3.4.1 RCurl_1.95-4.8 [66] iterators_1.0.8 VariantAnnotation_1.22.3 htmlwidgets_0.9 bitops_1.0-6 base64enc_0.1-3 [71] derfinderHelper_1.10.0 gtable_0.2.0 codetools_0.2-15 DBI_0.7 reshape2_1.4.2 [76] GenomicAlignments_1.12.2 gridExtra_2.2.1 knitr_1.17 rtracklayer_1.36.4 bit_1.1-12 [81] Hmisc_4.0-3 stringi_1.1.5 parallel_3.4.1 Rcpp_0.12.12 rpart_4.1-11 [86] acepack_1.4.1
I did look at the docs repeatedly, though not carefully enough it seems. I focused on the strand orientation bit, which can also have the effect of giving the seemingly opposite answer. Friday evening overtime probably not the best time to be learning new things.
So the query precedes or follows the subject. That is an equally valid approach, but in that case, the term �subject� is misleading, as it is not the actual subject of the action implied by the function name but rather the object of it.
So, thanks for the clarification! I�m relatively new to Bioconductor and the learning curve is steep. Sent from my Windows 10 phone