Hi, I am Hao. I have an issue with the "getPromoterSeq" function from "GenomicFeatures" package.
I was trying to obtain potential promoter sequences for all genes in maize chromosome 1. I have the sequence file "Zea_mays.AGPv4.dna.chromosome.1.fa", and the total genome annotation file "Zea_mays.AGPv4.40.gff3".
I arbitrarily defined promoter region as 2,000 bp upstream from TSS and 500 bp downstream of TSS. The code I wrote is as follows.
library("Rsamtools") FaFile=FaFile("Zea_mays.AGPv4.dna.chromosome.1.fa") #subject library("GenomicRanges") library("ape") gffRangedData=read.gff("Zea_mays.AGPv4.40.gff3", na.strings = c(".", "?")) myGranges<-as(gffRangedData, "GRanges") #query library("GenomicFeatures") Promoter=getPromoterSeq(myGranges,FaFile,upstream=2000, downstream=500)
However, it returned me an error after using the "getPromoterSeq" function (see below).
Error in value[[3L]](cond) : record 1 (1:-1999-500) was truncated
file: Zea_mays.AGPv4.dna.chromosome.1.fa
In addition: Warning message:
In .local(x, upstream, downstream, ...) : '*' ranges were treated as '+'
Below is the sessionInfo() output
> sessionInfo() R version 3.4.4 (2018-03-15) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.30.3 AnnotationDbi_1.40.0 Biobase_2.38.0 ape_5.1 [5] Rsamtools_1.30.0 Biostrings_2.46.0 XVector_0.18.0 GenomicRanges_1.30.3 [9] GenomeInfoDb_1.14.0 IRanges_2.12.0 S4Vectors_0.16.0 BiocGenerics_0.24.0 [13] BiocInstaller_1.28.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.18 compiler_3.4.4 prettyunits_1.0.2 [4] bitops_1.0-6 tools_3.4.4 zlibbioc_1.24.0 [7] progress_1.2.0 biomaRt_2.34.2 digest_0.6.16 [10] bit_1.1-14 RSQLite_2.1.1 memoise_1.1.0 [13] nlme_3.1-131.1 lattice_0.20-35 pkgconfig_2.0.2 [16] rlang_0.2.2 Matrix_1.2-12 DelayedArray_0.4.1 [19] DBI_1.0.0 rstudioapi_0.7 GenomeInfoDbData_1.0.0 [22] rtracklayer_1.38.3 httr_1.3.1 stringr_1.3.1 [25] hms_0.4.2 bit64_0.9-7 grid_3.4.4 [28] R6_2.2.2 XML_3.98-1.16 RMySQL_0.10.15 [31] BiocParallel_1.12.0 magrittr_1.5 blob_1.1.1 [34] matrixStats_0.54.0 GenomicAlignments_1.14.2 SummarizedExperiment_1.8.1 [37] assertthat_0.2.0 stringi_1.1.7 RCurl_1.95-4.11 [40] crayon_1.3.4
I was wondering if anyone could help me out? Many thanks!
Please update to a current version of R/Bioconductor (release is R 3.5 / Bioc 3.7). See this page for help:
https://www.bioconductor.org/install/
If you still see the error after updating, post back and show your sessionInfo() again. Likely the error is triggered by specific ranges in
myGranges
. It would be good to identify them if possible.Valerie