I am working with a non-model organism for which there are 2 versions of the genome. I would like to use the newer version as it has a better resolution and completeness. There is no Ensembl gene model for this version of the genome. I have instead used Gnomon gene model included with the genome version deposited at NCBI. I was able to perform DESeq analysis. My questions is with regards to extracting fasta sequences from the original genome based on the results.
The results obtained from DESeq use the following gene ID format:
gene-NC_027757.2:10030825..10032141 gene-NC_027757.2:10036650..10043774 gene-NC_027757.2:10047118..10049010 gene-NC_027757.2:10076606..10077880 gene-NC_027757.2:10099173..10101111
The GFF file associated with the Gnomon gene model has the following entry format (per line):
NC_027757.2 Gnomon gene 45762 45986 . - . ID=geneNC_027757.2:45762..45986;description=gene.37187;gbkey=Gene;gene_biotype=protein_coding
The problem lies with the fact that the results format for each gene corresponds to the 9th column of the GFF file and is embedded with other information within this column. I am wondering if there is package such as Biostrings that allows me to retrieve the DNA sequences from the genome based on the Gnomon gene ID format. I have attempted to do this in Biostrings however, from what I can understand unless the "gene-NC_027757.2:10030825..10032141" is used as the gene name rather than being in the 9th column of the GFF file the Biostrings can't retrieve the corresponding sequences.
Are there packages or code chunk that you might be able to suggest which would allow me to use the results from DESeq2 to retrieve the corresponding genomic sequences within Bioconductor?
R version 4.0.4 (2021-02-15) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 10.16 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib locale:  en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages:  stats4 parallel stats graphics grDevices utils datasets  methods base other attached packages:  stringr_1.4.0 genefilter_1.72.0  ggplot2_3.3.2 PoiClaClu_18.104.22.168  RColorBrewer_1.1-2 pheatmap_1.0.12  DESeq2_1.30.0 BiocParallel_1.24.1  GenomicAlignments_1.26.0 SummarizedExperiment_1.20.0  MatrixGenerics_1.2.0 matrixStats_0.57.0  GenomicFeatures_1.42.1 AnnotationDbi_1.52.0  Biobase_2.50.0 Rsamtools_2.6.0  Biostrings_2.58.0 XVector_0.30.0  GenomicRanges_1.42.0 GenomeInfoDb_1.26.1  IRanges_2.24.0 S4Vectors_0.28.0  BiocGenerics_0.36.0 loaded via a namespace (and not attached):  httr_1.4.2 bit64_4.0.5 splines_4.0.4  assertthat_0.2.1 askpass_1.1 BiocFileCache_1.14.0  blob_1.2.1 GenomeInfoDbData_1.2.4 yaml_2.2.1  progress_1.2.2 pillar_1.4.7 RSQLite_2.2.1  lattice_0.20-41 glue_1.4.2 digest_0.6.27  colorspace_2.0-0 Matrix_1.3-2 XML_3.99-0.5  pkgconfig_2.0.3 biomaRt_2.46.0 zlibbioc_1.36.0  purrr_0.3.4 xtable_1.8-4 scales_1.1.1  tibble_3.0.4 openssl_1.4.3 annotate_1.68.0  generics_0.1.0 ellipsis_0.3.1 withr_2.3.0  survival_3.2-7 magrittr_2.0.1 crayon_1.3.4  memoise_1.1.0 xml2_1.3.2 tools_4.0.4  prettyunits_1.1.1 hms_0.5.3 lifecycle_0.2.0  locfit_1.5-9.4 munsell_0.5.0 DelayedArray_0.16.0  compiler_4.0.4 tinytex_0.27 rlang_0.4.8  grid_4.0.4 RCurl_1.98-1.2 rstudioapi_0.13  rappdirs_0.3.1 bitops_1.0-6 gtable_0.3.0  DBI_1.1.0 curl_4.3 R6_2.5.0  dplyr_1.0.2 rtracklayer_1.50.0 bit_4.0.4  stringi_1.5.3 Rcpp_1.0.5 vctrs_0.3.5  geneplotter_1.68.0 dbplyr_2.0.0 tidyselect_1.1.0