Question: makeTxDbFromGFF returns empty object
3.1 years ago by
France
TimothéeFlutre70 wrote:

I would like to make a TxDb package from a GFF file using GenomicFeatures, but can't get it to work. Below is a reproducible example on a small subset.

Retrieve the GFF file:

gff.file <- "Vitis_vinifera_annotation.gff.gz"
cmd <- paste0("wget ", url, " ", gff.file)
system(cmd)

Extract a small subset:

gff.file.small <- "subset.gff"
cmd <- paste0("zcat ", gff.file, " | grep -w 'chr2' | head -100 > ", gff.file.small)
system(cmd)

Make a txdb object:

library(GenomicFeatures)
library(BSgenome.Vvinifera.URGI.IGGP12Xv0)
txdb <- makeTxDbFromGFF(file=gff.file.small, format="auto", dataSource=url,
organism="Vitis vinifera", taxonomyId=29760,
chrominfo=seqinfo(BSgenome.Vvinifera.URGI.IGGP12Xv0))
txdb # shows transcript_nrow=0, exon_nrow=0, etc
length(tmp <- transcripts(txdb)) # 0

Is it because the initial GFF file is badly formatted?

Answer: C: makeTxDbFromGFF returns empty object
3.1 years ago by
United States
Michael Lawrence11k wrote:

It's a GFF2 file, while the TxDb stuff only supports GTF and GFF3. There is no standard way of expressing gene models with GFF2. You could probably figure out a way to convert that file to GFF3.

Just jumping in; alternatively you could try to get a GFF3 or a GTF file from Ensembl plants, e.g.

ftp://ftp.ensemblgenomes.org/pub/plants/current/gff3/vitis_vinifera

(for other versions than "current" just browse the ftp)

By the way, if you're working with Ensembl annotations you could also consider to give a quick glance to the ensembldb package. The EnsDb objects from that package provide a similar (almost the same) functionality than the TxDb objects. Also, you have the ensDbFromGtf and ensDbFromGff methods to create such an EnsDb from a GTF or GFF3; ideally check out the current devel version of the package (will be released soon with Bioc 3.3).

cheers, jo

@MichaelLawrence Thanks, I will (try to) figure out a way to convert the GFF2 file into GFF3

@Johannes Rainer I am aware that I can retrieve annotations at Ensembl, but it happens that I specifically want these, which may be a bit different than the ones at Ensembl, which is something that I should indeed check at some point

It looks like there are GFF3 annotations under the "V1" heading. It's only V0 that are GFF2.