Error in trying to convert a gff file -> txdb object using GenomicFeatures::makeTxDbFromGFF()
Entering edit mode
Last seen 2.8 years ago


I want to create a single TxDb object which contains gene annotations across all RefSeq complete genomes. My working pipeline is:

  1. download all refseq genome annotation files (GFF)
  2. concatenate all GFF files into one "master" GFF file
  3. txdb <- GenomicFeatures::makeTxDbFromGFF("path/to/master.gff")

Step 3. is where I encounter this error:

> Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Error in .merge_transcript_parts(transcripts) : 
  The following transcripts have multiple parts that cannot be merged
  because of incompatible Name: gene-OENI_0024, gene-OENI_0151,
  gene-OENI_0152, gene-OENI_0153, gene-OENI_0154, gene-OENI_0155,
  gene-OENI_0156, gene-OENI_0157, gene-OENI_0158, gene-OENI_0159,
  gene-OENI_0160, gene-OENI_0161, gene-OENI_0162, gene-OENI_0163,
  gene-OENI_0165, gene-OENI_0166, gene-OENI_0170, gene-OENI_0171,
  gene-OENI_0174, gene-OENI_0175, gene-OENI_0176, gene-OENI_0177,
  gene-OENI_0179, gene-OENI_0180, gene-OENI_0181, gene-OENI_0182,
  gene-OENI_0183, gene-OENI_0184, gene-OENI_0185, gene-OENI_0187,
  gene-OENI_0188, gene-OENI_0189, gene-OENI_0190, gene-OENI_0191,
  gene-OENI_0198, gene-OENI_0202, gene-OENI_0207, gene-OENI_0208,
  gene-OENI_0209, gene-OENI_0211, gene-OENI_0214, gene-OENI_0216,
  gene-OENI_0217, gene-OENI_0218, gene-OENI_0220, gene-OENI_0221,
  gene-OENI_0222, gene-OENI_0223, gene-OENI_0224, gene-OENI_0225,
  gene-OENI_0228, gene-OENI_0229, gene-OENI_0230, gene-OENI_0232,
Calls: <Anonymous> ... .extract_transcripts_from_GRanges -> .merge_transcript_parts
In addition: Warning messages:
1: In .extract_exons_from_GRanges(exon_IDX, gr, mcols0, tx_IDX, feature = "exon",  :
  555 exons couldn't be linked to a transcript so were dropped (showing
  only the first 6):
       seqid   start     end strand                     ID
1 CP002457.1  835785  837944      -   exon-Sput200_R0132-1
2 CP002457.1  838107  840054      +   exon-Sput200_R0131-1
3 CP011974.1 2349467 2349533      +       exon-BEH_24290-1
4 CP011968.1  168443  168622      + exon-CDIF1296T_00235-1
5 CP012373.2 2263358 2263427      -     exon-AL038_18910-1
6 CP012373.2 2264049 2264115      -     exon-AL038_18915-1
                    Name              Parent Parent_type
1   exon-Sput200_R0132-1   rna-Sput200_R0132        <NA>
2   exon-Sput200_R0131-1   rna-Sput200_R0131        <NA>
3       exon-BEH_24290-1       rna-BEH_24290        <NA>
4 exon-CDIF1296T_00235-1 rna-CDIF1296T_00235        <NA>
5     exon-AL038_18910-1     rna-AL038_18910        <NA>
6     exon-AL038_18915-1     rna-AL038_18915        <NA>
2: In .extract_exons_from_GRanges(cds_IDX, gr, mcols0, tx_IDX, feature = "cds",  :
  104 CDS couldn't be linked to a transcript so were dropped (showing
  only the first 6):
     seqid  start    end strand                    ID Name     Parent
1 U00096.3 257829 257899      + cds-gnl|b0240|CDS=288 <NA> gene-b0240
2 U00096.3 258676 259006      + cds-gnl|b0240|CDS=288 <NA> gene-b0240
3 U00096.3 270278 270540      + cds-gnl|b4587|CDS=313 <NA> gene-b4587
4 U00096.3 271764 272190      + cds-gnl|b4587|CDS=313 <NA> gene-b4587
5 U00096.3 382591 382872      - cds-gnl|b4579|CDS=473 <NA> gene-b4579
6 U00096.3 380844 381260      - cds-gnl|b4579|CDS=473 <NA> gene-b4579
1        <NA>
2        <NA>
3        <NA>
4        <NA>
5        <NA>
6        <NA>
Execution halted

My question is, is there an easy fix for this error where some bad 'Name' Fields in the GFF files were given? Or this the problem in step 2. where I concatenated a bunch of separate GFF files and then tried to make a txdb object? Also, why were the other conflicts stated in the above message just "Warnings" and could be removed on the spot?

Any help here would be much appreciated. -Domenick

GenomicFeatures annotation gff GenomicRanges • 447 views

Login before adding your answer.

Traffic: 463 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6