Entering edit mode
filipeje
•
0
@filipeje-13493
Last seen 7.3 years ago
Hi,
I am trying to create TxDb files and i am using the bioconductor guide (here: https://www.bioconductor.org/help/workflows/rnaseqGene/) as a reference.
However i am having issues with the sequence naming in the gtf file.
library(Rsubread) library("Rsamtools") library("GenomicFeatures") cell <- factor(c("CAT.S", "CAT.S", "IL13NG", "IL13NG", "LOX.1", "LOX.1")) file <- c("A_ACAGTG/sorted.bam", "B_GCCAAT/sorted.bam", "C_CAGATC/sorted.bam", "D_CTTGTA/sorted.bam", "E_AGTCAA/sorted.bam", "F_GTGAAA/sorted.bam") data <- data.frame(cell, file) bamfiles <- BamFileList(file) seqinfo(bamfiles[1]) gtffile <- "CriGri_1.0.gtf" txdb <- makeTxDbFromGFF(gtffile, format = "gtf", dbxrefTag = "GeneID") head(gtffile)
This is the output I am receiving:
> txdb <- makeTxDbFromGFF(gtffile, format = "gtf", dbxrefTag = "GeneID") Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... Error in .merge_transcript_parts(transcripts) : The following transcripts have multiple parts that cannot be merged because of incompatible seqnames: NM_001243977.1, NM_001244004.1, NM_001244022.1, NM_001244033.1, NM_001244041.1, NM_001244052.1, NM_001244378.1, NM_001246707.1, NM_001246723.1, NM_001246741.1, NM_001246788.1, NM_001246827.1, NM_001246828.1, NR_045132.1
this is my session info
sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server release 6.9 (Santiago) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.26.4 AnnotationDbi_1.36.2 Biobase_2.34.0 BiocInstaller_1.24.0 Rsamtools_1.26.2 Biostrings_2.42.1 [7] XVector_0.14.1 GenomicRanges_1.26.4 GenomeInfoDb_1.10.3 IRanges_2.8.2 S4Vectors_0.12.2 BiocGenerics_0.20.0 [13] bindrcpp_0.2 Rsubread_1.24.2 dplyr_0.7.1 tidyr_0.6.3 edgeR_3.16.5 limma_3.30.13 loaded via a namespace (and not attached): [1] Rcpp_0.12.11 bindr_0.1 bitops_1.0-6 tools_3.3.2 zlibbioc_1.20.0 [6] biomaRt_2.30.0 digest_0.6.9 bit_1.1-12 RSQLite_2.0 memoise_1.0.0 [11] tibble_1.3.3 lattice_0.20-34 pkgconfig_2.0.1 rlang_0.1.1 Matrix_1.2-7.1 [16] DBI_0.7 rtracklayer_1.34.2 locfit_1.5-9.1 bit64_0.9-7 grid_3.3.2 [21] glue_1.1.1 R6_2.1.2 XML_3.98-1.9 BiocParallel_1.8.2 blob_1.1.0 [26] magrittr_1.5 GenomicAlignments_1.10.1 SummarizedExperiment_1.4.0 assertthat_0.2.0 RCurl_1.95-4.8 |
|
makeTxDbFromGFF is a function in the GenomicFeatures package, so I added this to the tags of the post. It's a good idea to tag the relevant package so you can get advice from the package maintainers.
Hi,
Something looks wrong with your GTF file. Do you think you can make it available somewhere so we can look at it? Thanks,
H.