Entering edit mode
domenick.braccia
•
0
@domenickbraccia-19917
Last seen 4.7 years ago
Hello,
I want to create a single TxDb object which contains gene annotations across all RefSeq complete genomes. My working pipeline is:
- download all refseq genome annotation files (GFF)
- concatenate all GFF files into one "master" GFF file
txdb <- GenomicFeatures::makeTxDbFromGFF("path/to/master.gff")
Step 3. is where I encounter this error:
> Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Error in .merge_transcript_parts(transcripts) :
The following transcripts have multiple parts that cannot be merged
because of incompatible Name: gene-OENI_0024, gene-OENI_0151,
gene-OENI_0152, gene-OENI_0153, gene-OENI_0154, gene-OENI_0155,
gene-OENI_0156, gene-OENI_0157, gene-OENI_0158, gene-OENI_0159,
gene-OENI_0160, gene-OENI_0161, gene-OENI_0162, gene-OENI_0163,
gene-OENI_0165, gene-OENI_0166, gene-OENI_0170, gene-OENI_0171,
gene-OENI_0174, gene-OENI_0175, gene-OENI_0176, gene-OENI_0177,
gene-OENI_0179, gene-OENI_0180, gene-OENI_0181, gene-OENI_0182,
gene-OENI_0183, gene-OENI_0184, gene-OENI_0185, gene-OENI_0187,
gene-OENI_0188, gene-OENI_0189, gene-OENI_0190, gene-OENI_0191,
gene-OENI_0198, gene-OENI_0202, gene-OENI_0207, gene-OENI_0208,
gene-OENI_0209, gene-OENI_0211, gene-OENI_0214, gene-OENI_0216,
gene-OENI_0217, gene-OENI_0218, gene-OENI_0220, gene-OENI_0221,
gene-OENI_0222, gene-OENI_0223, gene-OENI_0224, gene-OENI_0225,
gene-OENI_0228, gene-OENI_0229, gene-OENI_0230, gene-OENI_0232,
Calls: <Anonymous> ... .extract_transcripts_from_GRanges -> .merge_transcript_parts
In addition: Warning messages:
1: In .extract_exons_from_GRanges(exon_IDX, gr, mcols0, tx_IDX, feature = "exon", :
555 exons couldn't be linked to a transcript so were dropped (showing
only the first 6):
seqid start end strand ID
1 CP002457.1 835785 837944 - exon-Sput200_R0132-1
2 CP002457.1 838107 840054 + exon-Sput200_R0131-1
3 CP011974.1 2349467 2349533 + exon-BEH_24290-1
4 CP011968.1 168443 168622 + exon-CDIF1296T_00235-1
5 CP012373.2 2263358 2263427 - exon-AL038_18910-1
6 CP012373.2 2264049 2264115 - exon-AL038_18915-1
Name Parent Parent_type
1 exon-Sput200_R0132-1 rna-Sput200_R0132 <NA>
2 exon-Sput200_R0131-1 rna-Sput200_R0131 <NA>
3 exon-BEH_24290-1 rna-BEH_24290 <NA>
4 exon-CDIF1296T_00235-1 rna-CDIF1296T_00235 <NA>
5 exon-AL038_18910-1 rna-AL038_18910 <NA>
6 exon-AL038_18915-1 rna-AL038_18915 <NA>
2: In .extract_exons_from_GRanges(cds_IDX, gr, mcols0, tx_IDX, feature = "cds", :
104 CDS couldn't be linked to a transcript so were dropped (showing
only the first 6):
seqid start end strand ID Name Parent
1 U00096.3 257829 257899 + cds-gnl|b0240|CDS=288 <NA> gene-b0240
2 U00096.3 258676 259006 + cds-gnl|b0240|CDS=288 <NA> gene-b0240
3 U00096.3 270278 270540 + cds-gnl|b4587|CDS=313 <NA> gene-b4587
4 U00096.3 271764 272190 + cds-gnl|b4587|CDS=313 <NA> gene-b4587
5 U00096.3 382591 382872 - cds-gnl|b4579|CDS=473 <NA> gene-b4579
6 U00096.3 380844 381260 - cds-gnl|b4579|CDS=473 <NA> gene-b4579
Parent_type
1 <NA>
2 <NA>
3 <NA>
4 <NA>
5 <NA>
6 <NA>
Execution halted
My question is, is there an easy fix for this error where some bad 'Name' Fields in the GFF files were given? Or this the problem in step 2. where I concatenated a bunch of separate GFF files and then tried to make a txdb
object? Also, why were the other conflicts stated in the above message just "Warnings" and could be removed on the spot?
Any help here would be much appreciated. -Domenick