Entering edit mode
Last seen 7 weeks ago
I'd like to run Aspli packages but when i run the following code:
library(GenomicFeatures) library(ASpli) genomeTxDb <- makeTxDbFromGFF(par.l$GTF) features <-binGenome(genomeTxDb) `
I get back this error message:
181 genes were dropped because they have exons located on both strands of the same reference sequence or on more than one reference sequence, so cannot be represented by a single genomic range. Use 'single.strand.genes.only=FALSE' to get all the genes in a GRangesList object, or use suppressMessages() to suppress this message. Error in .Call2("Rle_constructor", values, lengths, PACKAGE = "S4Vectors") : Rle of type 'list' is not supported
Here is the sessionInfo and traceback:
sessionInfo( ) R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) Matrix products: default BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.3.so locale:  LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8  LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C  LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages:  stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached):  Rcpp_1.0.10 lattice_0.20-45 prettyunits_1.1.1 png_0.1-8  Rsamtools_2.14.0 Biostrings_2.66.0 assertthat_0.2.1 digest_0.6.31  utf8_1.2.2 BiocFileCache_2.6.0 R6_2.5.1 GenomeInfoDb_1.34.7  stats4_4.2.0 RSQLite_2.2.20 httr_1.4.4 pillar_1.8.1  zlibbioc_1.44.0 rlang_1.0.6 GenomicFeatures_1.50.4 progress_1.2.2  curl_5.0.0 rstudioapi_0.14 blob_1.2.3 S4Vectors_0.36.1  Matrix_1.5-3 BiocParallel_1.32.5 stringr_1.5.0 RCurl_1.98-1.9  bit_4.0.5 biomaRt_2.54.0 DelayedArray_0.24.0 compiler_4.2.0  rtracklayer_1.58.0 pkgconfig_2.0.3 BiocGenerics_0.44.0 tidyselect_1.2.0  KEGGREST_1.38.0 SummarizedExperiment_1.28.0 tibble_3.1.8 GenomeInfoDbData_1.2.9  matrixStats_0.63.0 IRanges_2.32.0 codetools_0.2-18 XML_3.99-0.13  fansi_1.0.4 crayon_1.5.2 dplyr_1.0.10 dbplyr_2.3.0  GenomicAlignments_1.34.0 bitops_1.0-7 rappdirs_0.3.3 grid_4.2.0  lifecycle_1.0.3 DBI_1.1.3 magrittr_2.0.3 cli_3.6.0  stringi_1.7.12 cachem_1.0.6 XVector_0.38.0 xml2_1.3.3  ellipsis_0.3.2 filelock_1.0.2 generics_0.1.3 vctrs_0.5.2  rjson_0.2.21 restfulr_0.0.15 tools_4.2.0 bit64_4.0.5  Biobase_2.58.0 glue_1.6.2 MatrixGenerics_1.10.0 hms_1.1.2  parallel_4.2.0 fastmap_1.1.0 yaml_2.3.7 AnnotationDbi_1.60.0  BiocManager_1.30.19 GenomicRanges_1.50.2 memoise_2.0.1 BiocIO_1.8.0
> traceback() 13: .Call2("Rle_constructor", values, lengths, PACKAGE = "S4Vectors") 12: new_Rle(values, lengths) 11: Rle(seqnames) 10: Rle(seqnames) 9: .normarg_seqnames1(seqnames) 8: new_GRanges("GRanges", seqnames = seqnames, ranges = ranges, strand = strand, mcols = mcols, seqinfo = seqinfo) 7: GRanges(ans_seqnames, ans_ranges, strand = ans_strand, ans_mcols, seqinfo = ans_seqinfo) 6: makeGRangesFromDataFrame(df, ...) 5: makeGRangesListFromDataFrame(rangos, names.field = "group_name") 4: .createGRangesGenes.getLocusOverlap(exons.by.gene.disjoint) 3: .createGRangesGenes(genome, geneSymbols) 2: binGenome(genomeTxDb)
I have no idea how to work around this error. Can anyone help me debug this?
I was having a similar problem using the hg19 genome from UCSC. I tried many different ways of accessing the gtf and txdb object (i.e. downloading directly from the ucsc website as a gtf file, using R packages to directly load the txdb etc.) Apparently there are some lines in the file that are not formatted in a way that is compatible with Aspli. If you have a relatively small genome, you can troubleshoot this in sections by building up your gtf file line by line until you hit an error with the bingenome on your resultant txdb object. However for large genomes, this is way too time consuming. In the end what worked for me was using the hg19 ensmbl gtf version instead of the hg19 known gene gtf file. If there are different versions of your gtf file, I would try these and see if any of them work.