create a txdb using makeTxDbFromGFF
0
0
Entering edit mode
@naderaryamanesh-14634
Last seen 7.3 years ago

Hi,

I am trying to make a txdb for Arabidopsis lyrata. the annotation file could be downloaded here:

ftp://ftp.ensemblgenomes.org/pub/plants/release-30/gff3/arabidopsis_lyrata/Arabidopsis_lyrata.v.1.0.30.chr.gff3.gz

I am using the following command to create txdb:

txdb <- makeTxDbFromGFF(file="/PATH/rawdata/annotations/Arabidopsis_lyrata.v.1.0.30.chrb.gff3",

format=c("auto", "gff3", "gtf"),

dataSource="gtf file for Arabidopsis lyrata",

organism="Arabidopsis lyrata")

Above command creates the txdb as below:

Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK

> txdb TxDb object: # Db type: TxDb # Supporting package: GenomicFeatures # Data source: gtf file for Arabidopsis lyrata # Organism: Arabidopsis lyrata # Taxonomy ID: 59689 # miRBase build ID: NA # Genome: NA # transcript_nrow: 31478 # exon_nrow: 170022 # cds_nrow: 154686 # Db created by: GenomicFeatures package from Bioconductor # Creation time: 2017-12-14 14:59:40 +0200 (Thu, 14 Dec 2017) # GenomicFeatures version at creation time: 1.28.4 # RSQLite version at creation time: 2.0 # DBSCHEMAVERSION: 1.1

However when I use seqinfo(txdb) it shows empty:

> seqinfo(txdb) Seqinfo object with 8 sequences from an unspecified genome; no seqlengths: seqnames seqlengths isCircular genome chr1 NA NA <NA> chr2 NA NA <NA> chr3 NA NA <NA> chr4 NA NA <NA> chr5 NA NA <NA> chr6 NA NA <NA> chr7 NA NA <NA> chr8 NA NA <NA>

While it should be similar to:

> library("BSgenome.Alyrata.JGI.v1")

> seqinfo(Alyrata) Seqinfo object with 8 sequences from Assembly V1.0 genome: seqnames seqlengths isCircular genome chr1 33132539 FALSE Assembly V1.0 chr2 19320864 FALSE Assembly V1.0 chr3 24464547 FALSE Assembly V1.0 chr4 23328337 FALSE Assembly V1.0 chr5 21221946 FALSE Assembly V1.0 chr6 25113588 FALSE Assembly V1.0 chr7 24649197 FALSE Assembly V1.0 chr8 22951293 FALSE Assembly V1.0

I really appreciate it if you pin point the problem or if there is a better way to make the txdb?

Kind regards,

Nader

 

 
bioconductor txdb maketxdbfromgff • 4.4k views
ADD COMMENT
0
Entering edit mode

sessionInfo() please! Your version of Bioconductor seems outdated.

Note that makeTxDbFromGFF() uses rtracklayer::import.gff3() internally as a first step of importing the GFF3 file as a GRanges object. And even though the sequence lengths are present in the file, for some reasons rtracklayer::import.gff3() fails to import them:

library(rtracklayer)
gr <- import.gff3("Arabidopsis_lyrata.v.1.0.30.chr.gff3.gz")
seqinfo(gr)
# Seqinfo object with 8 sequences from an unspecified genome; no seqlengths:
#   seqnames seqlengths isCircular genome
#   1                NA         NA   <NA>
#   2                NA         NA   <NA>
#   3                NA         NA   <NA>
#   4                NA         NA   <NA>
#   5                NA         NA   <NA>
#   6                NA         NA   <NA>
#   7                NA         NA   <NA>
#   8                NA         NA   <NA>

You could either ask a new question on this site with tag rtracklayer and focus on the rtracklayer::import.gff3() issue, or open an issue on GitHub: https://github.com/lawremi/rtracklayer/issues

Thanks,

H.

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /home/hpages/R/R-3.4.3/lib/libRblas.so
LAPACK: /home/hpages/R/R-3.4.3/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base   

other attached packages:
[1] rtracklayer_1.38.2   GenomicRanges_1.30.0 GenomeInfoDb_1.14.0 
[4] IRanges_2.12.0       S4Vectors_0.16.0     BiocGenerics_0.24.0 

loaded via a namespace (and not attached):
 [1] lattice_0.20-35            matrixStats_0.52.2        
 [3] XML_3.98-1.9               Rsamtools_1.30.0          
 [5] Biostrings_2.46.0          GenomicAlignments_1.14.1  
 [7] bitops_1.0-6               grid_3.4.3                
 [9] zlibbioc_1.24.0            XVector_0.18.0            
[11] Matrix_1.2-12              BiocParallel_1.12.0       
[13] tools_3.4.3                Biobase_2.38.0            
[15] RCurl_1.95-4.8             DelayedArray_0.4.1        
[17] compiler_3.4.3             SummarizedExperiment_1.8.0
[19] GenomeInfoDbData_0.99.1   
ADD REPLY

Login before adding your answer.

Traffic: 725 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6