The **goal** for my question is to follow the procedure in the 'C. Differential Expression' presentation, April 06-07, 2015, bioconductor workshop, section '5. Reduction, Setup, part c', to prepare a TxDb object from a gff3 file for the subsequent 'Counting' section (preparation of SummarizedExperiment) and for section '6. Analysis using DESeq2'.
The Problem: makeTxDbFromGFF is producing a TxDb file with a
GRangesList object of length 0:
when trying to list exons by gene.
for the TxDb file produced from the gff3 file at
ftp://ftp.ensemblgenomes.org/pub/plants/release-26/gff3/setaria_italica/
> gff3file <- file.path(path, "Setaria_italica.JGIv2.0.26.gff3")
> txdbSitalica <- makeTxDbFromGFF(gff3file, format="gff3", circ_seqs=character())
Prepare the 'metadata' data frame ... metadata: OK
> genesSitalica <- exonsBy(txdbSitalica, by="gene")
> genesSitalica
GRangesList object of length 0:
<0 elements>
-------
seqinfo: 76 sequences from an unspecified genome; no seqlengths
I note this gff3 file has 35,471 records/rows with 'gene' in column 3, of its gff3 format.
By way of comparison, I can display 'exons by transcripts' from the txdbSitalica variable class TxDb,
grlsitalica <- exonsBy(txdbSitalica, "tx", use.names=TRUE)
...
grlsitalica ## 'GRangesList object of length 40599:'
## So, in txdbSitalica, can find all the records/rows with 'transcript' in column 3 of the gff3 file.
## that is, 40599 records with 'transcript' in column 3 of gff3 file MATCHES the number in grlsitalica.
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.3 (Yosemite)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.1.2
[2] GenomicFeatures_1.20.0
[3] AnnotationDbi_1.30.1
[4] Biobase_2.28.0
[5] GenomicRanges_1.20.3
[6] GenomeInfoDb_1.4.0
[7] IRanges_2.2.1
[8] S4Vectors_0.6.0
[9] BiocGenerics_0.14.0
loaded via a namespace (and not attached):
[1] XML_3.98-1.1 Rsamtools_1.20.1 Biostrings_2.36.0
[4] GenomicAlignments_1.4.1 bitops_1.0-6 futile.options_1.0.0
[7] DBI_0.3.1 RSQLite_1.0.0 zlibbioc_1.14.0
[10] XVector_0.8.0 futile.logger_1.4.1 lambda.r_1.1.7
[13] BiocParallel_1.2.1 tools_3.2.0 biomaRt_2.24.0
[16] RCurl_1.95-4.6 rtracklayer_1.28.2
>
Thanks for comments,
One more thing: you can also obtain the TxDb for Setaria italica from the "plants mart":
See
?makeTxDbFromBiomart
for more information.Unfortunately the above code fails with the current version of GenomicFeatures but today I committed a change to
makeTxDbFromBiomart()
to address the problem. This change is also available in the release and devel versions of the GenomicFeatures package that will become available tomorrow (versions 1.20.1 and 1.21.2 respectively).Note that the 3 methods (use of
makeTxDbFromGFF()
on Setaria_italica.JGIv2.0.26.gtf.gz or Setaria_italica.JGIv2.0.26.gff3.gz, or use ofmakeTxDbFromBiomart()
to get the sitalica_eg_gene dataset from the plants mart) produce the same TxDb objects (e.g. same set of transcripts and genes, and same transcript "geometries").H.