MafH5.gnomAD.v3.1.2.GRCh38 omitted indels
1
0
Entering edit mode
Robert • 0
@b14a6f0d
Last seen 2.4 years ago
United States

I believe MafH5.gnomAD.v3.1.2.GRCh38 is supposed to include indels. The example in the manual gscores(mafh5, GRanges("3:46373452-46373484"), type="nonsnrs") does output rs333. However, I've got an example of indels not reported and I can't figure why.

The region I'm looking at

and the missed indels : 17-80112300-CTTGG-C and 17-80112305-AGAGGAAGGACCCTGGATGC-A

MRE

library("BiocManager")
library("GenomeInfoDb")
library("MafH5.gnomAD.v3.1.2.GRCh38")
test_score <- GRanges(seqnames=17, IRanges(start=80112300:80112310, width=1))
test_result <- gscores(mafdb_hg38,test_score)
as.data.frame(test_result)

output

   seqnames    start      end width strand    AF
1     chr17 80112300 80112300     1      *    NA
2     chr17 80112301 80112301     1      *    NA
3     chr17 80112302 80112302     1      *    NA
4     chr17 80112303 80112303     1      *    NA
5     chr17 80112304 80112304     1      *    NA
6     chr17 80112305 80112305     1      *    NA
7     chr17 80112306 80112306     1      *    NA
8     chr17 80112307 80112307     1      *    NA
9     chr17 80112308 80112308     1      * 7e-06
10    chr17 80112309 80112309     1      *    NA
11    chr17 80112310 80112310     1      *    NA

I found type="nonsnrs" but that didn't help.

test_result <- gscores(mafdb_hg38,test_score,type="nonsnrs")
as.data.frame(test_result)

output

seqnames    start      end width strand AF
1     chr17 80112300 80112300     1      * NA
2     chr17 80112301 80112301     1      * NA
3     chr17 80112302 80112302     1      * NA
4     chr17 80112303 80112303     1      * NA
5     chr17 80112304 80112304     1      * NA
6     chr17 80112305 80112305     1      * NA
7     chr17 80112306 80112306     1      * NA
8     chr17 80112307 80112307     1      * NA
9     chr17 80112308 80112308     1      * NA
10    chr17 80112309 80112309     1      * NA
11    chr17 80112310 80112310     1      * NA

please also include the results of running the following in an R session

sessionInfo( ) sessionInfo( ) R version 4.2.0 (2022-04-22 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale: 1 LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: 1 stats4 stats graphics grDevices utils datasets [7] methods base

other attached packages: 1 BSgenome.Hsapiens.UCSC.hg38_1.4.4 [2] BSgenome.Hsapiens.UCSC.hg19_1.4.3 [3] MafH5.gnomAD.v3.1.2.GRCh38_3.15.0 [4] MafDb.gnomAD.r2.1.hs37d5_3.10.0
[5] GenomicScores_2.8.2
[6] BSgenome_1.64.0
[7] rtracklayer_1.56.1
[8] Biostrings_2.64.0
[9] XVector_0.36.0
[10] GenomicRanges_1.48.0
[11] GenomeInfoDb_1.32.2
[12] IRanges_2.30.0
[13] S4Vectors_0.34.0
[14] BiocGenerics_0.42.0
[15] BiocManager_1.30.18

loaded via a namespace (and not attached): 1 MatrixGenerics_1.8.1 Biobase_2.56.0
[3] httr_1.4.3 bit64_4.0.5
[5] AnnotationHub_3.4.0 shiny_1.7.1
[7] assertthat_0.2.1 interactiveDisplayBase_1.34.0 [9] BiocFileCache_2.4.0 blob_1.2.3
[11] GenomeInfoDbData_1.2.8 Rsamtools_2.12.0
[13] yaml_2.3.5 BiocVersion_3.15.2
[15] pillar_1.7.0 RSQLite_2.2.14
[17] lattice_0.20-45 glue_1.6.2
[19] digest_0.6.29 promises_1.2.0.1
[21] htmltools_0.5.2 httpuv_1.6.5
[23] Matrix_1.4-1 XML_3.99-0.10
[25] pkgconfig_2.0.3 zlibbioc_1.42.0
[27] purrr_0.3.4 xtable_1.8-4
[29] HDF5Array_1.24.1 later_1.3.0
[31] BiocParallel_1.30.3 tibble_3.1.7
[33] KEGGREST_1.36.2 generics_0.1.3
[35] ellipsis_0.3.2 cachem_1.0.6
[37] SummarizedExperiment_1.26.1 cli_3.3.0
[39] magrittr_2.0.3 crayon_1.5.1
[41] mime_0.12 memoise_2.0.1
[43] fansi_1.0.3 tools_4.2.0
[45] BiocIO_1.6.0 lifecycle_1.0.1
[47] matrixStats_0.62.0 Rhdf5lib_1.18.2
[49] DelayedArray_0.22.0 AnnotationDbi_1.58.0
[51] compiler_4.2.0 rlang_1.0.2
[53] rhdf5_2.40.0 grid_4.2.0
[55] RCurl_1.98-1.7 rhdf5filters_1.8.0
[57] rstudioapi_0.13 rjson_0.2.21
[59] rappdirs_0.3.3 bitops_1.0-7
[61] restfulr_0.0.15 codetools_0.2-18
[63] DBI_1.1.3 curl_4.3.2
[65] R6_2.5.1 GenomicAlignments_1.32.0
[67] dplyr_1.0.9 fastmap_1.1.0
[69] bit_4.0.4 utf8_1.2.2
[71] filelock_1.0.2 parallel_4.2.0
[73] Rcpp_1.0.8.3 png_0.1-7
[75] vctrs_0.4.1 dbplyr_2.2.1
[77] tidyselect_1.1.2
```

GenomicScores • 1.5k views
ADD COMMENT
2
Entering edit mode
Robert Castelo ★ 3.4k
@rcastelo
Last seen 17 hours ago
Barcelona/Universitat Pompeu Fabra

hi,

in the case of indels, for one given indel, the input GRanges should have one range spanning the genomic positions of the indel, e.g., for 17-80112300-CTTGG-C:

test_score <- GRanges(seqnames=17, IRanges(start=80112300, width=5))
gscores(mafdb_hg38, test_score, type="nonsnrs")
GRanges object with 1 range and 1 metadata column:
      seqnames            ranges strand |        AF
         <Rle>         <IRanges>  <Rle> | <numeric>
  [1]    chr17 80112300-80112304      * |     7e-06
  -------
  seqinfo: 25 sequences (1 circular) from hg38 genome
ADD COMMENT
0
Entering edit mode

ah... that's why the example in the manual worked! Is there a way to get all scores (in this case indels) that span a specific range (or single bp)? I tried minoverlap=0 in gscores but I'm still not picking the indel up. thanks!

ADD REPLY
0
Entering edit mode

The gscores() function is designed to return the scores in the form of a base numeric vector (by setting the argument scores.only=TRUE) or a GRanges vector (default) that is _parallel_ to the input GRanges vector. This is because the original motivation for developing GenomicScores was to be able to query genomic scores for a given set of positions, such as when you have a variant set in a VCF file, but not to query what positions have genomic scores.

The behavior you are asking would imply to alter that design decision of the function gscores(), so I think we need another function for that purpose that you would call first, and then use the result to call gscores(). Could you please open an issue at the GenomicScores GitHub repo and I'll see what I can do.

ADD REPLY
0
Entering edit mode

Ok, that makes sense. I'll open an issue there. Thanks for your help.

ADD REPLY
0
Entering edit mode

For other users interested in this thread, the link to the issue with this feature request is here, which has been recently updated with the addition of this feature in devel version 2.9.5 of GenomicScores through a new function called wgscores(). Anyone with comments or suggestions on this new feature, please add them to the conversation in the GitHub issue.

ADD REPLY

Login before adding your answer.

Traffic: 758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6