Question: Error in .merge_transcript_parts(transcripts)
0
gravatar for csijcs
12 months ago by
csijcs0
csijcs0 wrote:

Hello,

I am trying to make a txdb from a .gff of sncRNA obtained from the DASHR database (DASHR v2.0 hg38 sncRNA annotation [GFF]). I had to do a little formatting to remove the 10th column of some lines, but once that was done I tried importing and making a txdb with makeTxDbFromGFF and receive the following error:

>TxDb <- makeTxDbFromGFF(file = "/data2/csijcs/hg38/dashr.v2.sncRNA.annotation.hg38.edited.gff", format="auto")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Error in .merge_transcript_parts(transcripts) : 
  The following transcripts have multiple parts that cannot be merged
  because of incompatible type: U13, U3, U8

I tried removing those lines, but got even more errors:

> TxDb <- makeTxDbFromGFF(file = "/data2/csijcs/hg38/dashr.v2.sncRNA.annotation.hg38.edited.noU13U3U6U8.gff", format="auto")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Error in .merge_transcript_parts(transcripts) : 
  The following transcripts have multiple parts that cannot be merged
  because of incompatible seqnames: 5S, LSU-rRNA_Hsa, SSU-rRNA_Hsa, U1,
  U14, U17, U2, U4, U5, U6, U7

 

Is it possible to make a TxDb for this annotation file?  I am trying to perform differential expression with DESeq.

Here is my sessionInfo:

> sessionInfo() 
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /home/csijcs/anaconda2/lib/R/lib/libRblas.so
LAPACK: /home/csijcs/anaconda2/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] rtracklayer_1.40.6                     
 [2] TxDb.Hsapiens.UCSC.hg38.knownGene_3.4.0
 [3] apeglm_1.2.1                           
 [4] tximportData_1.8.0                     
 [5] readr_1.1.1                            
 [6] tximport_1.8.0                         
 [7] RColorBrewer_1.1-2                     
 [8] ggplot2_3.1.0                          
 [9] DESeq2_1.20.0                          
[10] SummarizedExperiment_1.10.1            
[11] DelayedArray_0.6.6                     
[12] BiocParallel_1.14.2                    
[13] matrixStats_0.54.0                     
[14] GenomicFeatures_1.32.3                 
[15] AnnotationDbi_1.42.1                   
[16] Biobase_2.40.0                         
[17] GenomicRanges_1.32.7                   
[18] GenomeInfoDb_1.16.0                    
[19] IRanges_2.14.12                        
[20] S4Vectors_0.18.3                       
[21] BiocGenerics_0.26.0                    

loaded via a namespace (and not attached):
 [1] bitops_1.0-6             mirbase.db_1.2.0         bit64_0.9-7             
 [4] progress_1.2.0           httr_1.3.1               numDeriv_2016.8-1       
 [7] tools_3.5.0              backports_1.1.2          R6_2.3.0                
[10] rpart_4.1-13             Hmisc_4.1-1              DBI_1.0.0               
[13] lazyeval_0.2.1           colorspace_1.3-2         nnet_7.3-12             
[16] withr_2.1.2              tidyselect_0.2.5         gridExtra_2.3           
[19] prettyunits_1.0.2        bit_1.1-14               compiler_3.5.0          
[22] htmlTable_1.12           scales_1.0.0             checkmate_1.8.5         
[25] genefilter_1.62.0        stringr_1.3.1            digest_0.6.18           
[28] Rsamtools_1.32.3         foreign_0.8-71           XVector_0.20.0          
[31] base64enc_0.1-3          pkgconfig_2.0.2          htmltools_0.3.6         
[34] bbmle_1.0.20             htmlwidgets_1.3          rlang_0.3.0.1           
[37] rstudioapi_0.8           RSQLite_2.1.1            bindr_0.1.1             
[40] acepack_1.4.1            dplyr_0.7.8              RCurl_1.95-4.11         
[43] magrittr_1.5             GenomeInfoDbData_1.1.0   Formula_1.2-3           
[46] Matrix_1.2-15            Rcpp_1.0.0               munsell_0.5.0           
[49] stringi_1.2.4            MASS_7.3-51.1            zlibbioc_1.26.0         
[52] plyr_1.8.4               grid_3.5.0               blob_1.1.1              
[55] crayon_1.3.4             lattice_0.20-38          Biostrings_2.48.0       
[58] splines_3.5.0            annotate_1.58.0          hms_0.4.2               
[61] locfit_1.5-9.1           knitr_1.20               pillar_1.3.0            
[64] geneplotter_1.58.0       biomaRt_2.36.1           XML_3.98-1.16           
[67] glue_1.3.0               latticeExtra_0.6-28      data.table_1.11.8       
[70] BiocManager_1.30.4       gtable_0.2.0             purrr_0.2.5             
[73] assertthat_0.2.0         emdbook_1.3.10           xtable_1.8-3            
[76] coda_0.19-2              survival_2.43-1          tibble_1.4.2            
[79] GenomicAlignments_1.16.0 memoise_1.1.0            bindrcpp_0.2.2          
[82] cluster_2.0.7-1         
 

maketxdbfromgff • 267 views
ADD COMMENTlink written 12 months ago by csijcs0

I've tested this with a modified file (10th column removed since GFF files have 9 columns), and got the same error as you. The error is thrown by the .merge_transcript_parts() function in GenomicFeatures. In essence, tt seems the reason you are getting the error is that the tx_type value generated in the function from the ID=<something> 9th column in your file contains values that are not unique (e.g. ID=U4 is not unique). To get maketxdbdbfromgff() to work, it seems that you need to make all of the 9th columns values in your gff file unique, or remove non-unique columns.
 

ADD REPLYlink modified 12 months ago • written 12 months ago by daniel.vantwisk50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 441 users visited in the last hour