rtracklayer: how to export gff file with gene names included?
1
1
Entering edit mode
Jon Bråte ▴ 250
@jon-brate-6263
Last seen 2.6 years ago
Norway

I'm having trouble understanding the export function in rtracklayer. I have created an object of 3'-UTRs and can export them as gff. But I would like to also include the gene names in the gff-file. They are there in the column Name, but i don't know how to include this in the gff-file.

My script:

txdb = makeTxDbFromGFF(file = "transcripts.gtf", format = "gtf")
txdb

utr = threeUTRsByTranscript(txdb, use.names = TRUE)
asGFF(utr)
export(utr, "3_UTR.gff", format = "GFF")

 

I would like to also have the gene names under the column Name to appear in my gff-file:

> asGFF(utr)
GRanges object with 14983 ranges and 7 metadata columns:
                   seqnames         ranges strand |        type          ID
                      <Rle>      <IRanges>  <Rle> | <character> <character>
      [1]   Supercontig_1.1 [ 8533,  8777]      + |        mRNA       mRNA1
      [2]   Supercontig_1.1 [22263, 22394]      + |        mRNA       mRNA2
      [3]   Supercontig_1.1 [34734, 34943]      + |        mRNA       mRNA3
      [4]   Supercontig_1.1 [56043, 56381]      + |        mRNA       mRNA4
      [5]   Supercontig_1.1 [70457, 70507]      + |        mRNA       mRNA5
      ...               ...            ...    ... .         ...         ...
  [14979] Supercontig_1.994 [ 8764,  9020]      - |        exon    exon7593
  [14980] Supercontig_1.995 [23924, 24302]      + |        exon    exon7594
  [14981] Supercontig_1.995 [25442, 25477]      - |        exon    exon7595
  [14982] Supercontig_1.997 [18118, 19806]      - |        exon    exon7596
  [14983] Supercontig_1.998 [22519, 23126]      - |        exon    exon7597
                  Name   exon_id   exon_name exon_rank      Parent
           <character> <integer> <character> <integer> <character>
      [1] SARC_00001T0      <NA>        <NA>      <NA>        <NA>
      [2] SARC_00003T0      <NA>        <NA>      <NA>        <NA>
      [3] SARC_00004T0      <NA>        <NA>      <NA>        <NA>
      [4] SARC_00008T0      <NA>        <NA>      <NA>        <NA>
      [5] SARC_00011T0      <NA>        <NA>      <NA>        <NA>
      ...          ...       ...         ...       ...         ...
  [14979]         <NA>     65051        <NA>         6    mRNA7382
  [14980]         <NA>     65078        <NA>         5    mRNA7383
  [14981]         <NA>     65079        <NA>         1    mRNA7384
  [14982]         <NA>     65115        <NA>         4    mRNA7385
  [14983]         <NA>     65133        <NA>         5    mRNA7386
  -------
  seqinfo: 7751 sequences from an unspecified genome; no seqlengths

 

Output of my current gff-file:

##gff-version 1
##source-version rtracklayer 1.28.10
##date 2016-03-03
Supercontig_1.1    rtracklayer    mRNA    8533    8777    .    +    .    Supercontig_1.1
Supercontig_1.1    rtracklayer    mRNA    22263    22394    .    +    .    Supercontig_1.1
Supercontig_1.1    rtracklayer    mRNA    34734    34943    .    +    .    Supercontig_1.1
Supercontig_1.1    rtracklayer    mRNA    56043    56381    .    +    .    Supercontig_1.1
Supercontig_1.1    rtracklayer    mRNA    70457    70507    .    +    .    Supercontig_1.1
Supercontig_1.1    rtracklayer    mRNA    86426    87450    .    +    .    Supercontig_1.1

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.3

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
[1] rtracklayer_1.32.2     GenomicFeatures_1.24.5 AnnotationDbi_1.34.4  
[4] Biobase_2.32.0         GenomicRanges_1.24.3   GenomeInfoDb_1.8.7    
[7] IRanges_2.6.1          S4Vectors_0.10.3       BiocGenerics_0.18.0   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8                XVector_0.12.1             zlibbioc_1.18.0           
 [4] GenomicAlignments_1.8.4    BiocParallel_1.6.6         tools_3.3.2               
 [7] SummarizedExperiment_1.2.3 DBI_0.5-1                  digest_0.6.10             
[10] bitops_1.0-6               RCurl_1.95-4.8             biomaRt_2.28.0            
[13] memoise_1.0.0              RSQLite_1.1                Biostrings_2.40.2         
[16] Rsamtools_1.24.0           XML_3.98-1.5      
rtracklayer gff • 4.2k views
ADD COMMENT
2
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States

You probably want to use version 3 of the format, by using the extension "gff3" or specifying format="3". Otherwise, it uses GFF version 1 by default, which is limited and so old that I can't even find the spec anymore.

ADD COMMENT
0
Entering edit mode

That solved it, thanks!

export(utr, "3_UTR.gff3", format = "GFF3")
ADD REPLY

Login before adding your answer.

Traffic: 910 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6