Search
Question: rtracklayer: how to export gff file with gene names included?
1
gravatar for Jon Bråte
21 months ago by
Jon Bråte150
Norway
Jon Bråte150 wrote:

I'm having trouble understanding the export function in rtracklayer. I have created an object of 3'-UTRs and can export them as gff. But I would like to also include the gene names in the gff-file. They are there in the column Name, but i don't know how to include this in the gff-file.

My script:

txdb = makeTxDbFromGFF(file = "transcripts.gtf", format = "gtf")
txdb

utr = threeUTRsByTranscript(txdb, use.names = TRUE)
asGFF(utr)
export(utr, "3_UTR.gff", format = "GFF")

 

I would like to also have the gene names under the column Name to appear in my gff-file:

> asGFF(utr)
GRanges object with 14983 ranges and 7 metadata columns:
                   seqnames         ranges strand |        type          ID
                      <Rle>      <IRanges>  <Rle> | <character> <character>
      [1]   Supercontig_1.1 [ 8533,  8777]      + |        mRNA       mRNA1
      [2]   Supercontig_1.1 [22263, 22394]      + |        mRNA       mRNA2
      [3]   Supercontig_1.1 [34734, 34943]      + |        mRNA       mRNA3
      [4]   Supercontig_1.1 [56043, 56381]      + |        mRNA       mRNA4
      [5]   Supercontig_1.1 [70457, 70507]      + |        mRNA       mRNA5
      ...               ...            ...    ... .         ...         ...
  [14979] Supercontig_1.994 [ 8764,  9020]      - |        exon    exon7593
  [14980] Supercontig_1.995 [23924, 24302]      + |        exon    exon7594
  [14981] Supercontig_1.995 [25442, 25477]      - |        exon    exon7595
  [14982] Supercontig_1.997 [18118, 19806]      - |        exon    exon7596
  [14983] Supercontig_1.998 [22519, 23126]      - |        exon    exon7597
                  Name   exon_id   exon_name exon_rank      Parent
           <character> <integer> <character> <integer> <character>
      [1] SARC_00001T0      <NA>        <NA>      <NA>        <NA>
      [2] SARC_00003T0      <NA>        <NA>      <NA>        <NA>
      [3] SARC_00004T0      <NA>        <NA>      <NA>        <NA>
      [4] SARC_00008T0      <NA>        <NA>      <NA>        <NA>
      [5] SARC_00011T0      <NA>        <NA>      <NA>        <NA>
      ...          ...       ...         ...       ...         ...
  [14979]         <NA>     65051        <NA>         6    mRNA7382
  [14980]         <NA>     65078        <NA>         5    mRNA7383
  [14981]         <NA>     65079        <NA>         1    mRNA7384
  [14982]         <NA>     65115        <NA>         4    mRNA7385
  [14983]         <NA>     65133        <NA>         5    mRNA7386
  -------
  seqinfo: 7751 sequences from an unspecified genome; no seqlengths

 

Output of my current gff-file:

##gff-version 1
##source-version rtracklayer 1.28.10
##date 2016-03-03
Supercontig_1.1    rtracklayer    mRNA    8533    8777    .    +    .    Supercontig_1.1
Supercontig_1.1    rtracklayer    mRNA    22263    22394    .    +    .    Supercontig_1.1
Supercontig_1.1    rtracklayer    mRNA    34734    34943    .    +    .    Supercontig_1.1
Supercontig_1.1    rtracklayer    mRNA    56043    56381    .    +    .    Supercontig_1.1
Supercontig_1.1    rtracklayer    mRNA    70457    70507    .    +    .    Supercontig_1.1
Supercontig_1.1    rtracklayer    mRNA    86426    87450    .    +    .    Supercontig_1.1

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.3

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
[1] rtracklayer_1.32.2     GenomicFeatures_1.24.5 AnnotationDbi_1.34.4  
[4] Biobase_2.32.0         GenomicRanges_1.24.3   GenomeInfoDb_1.8.7    
[7] IRanges_2.6.1          S4Vectors_0.10.3       BiocGenerics_0.18.0   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8                XVector_0.12.1             zlibbioc_1.18.0           
 [4] GenomicAlignments_1.8.4    BiocParallel_1.6.6         tools_3.3.2               
 [7] SummarizedExperiment_1.2.3 DBI_0.5-1                  digest_0.6.10             
[10] bitops_1.0-6               RCurl_1.95-4.8             biomaRt_2.28.0            
[13] memoise_1.0.0              RSQLite_1.1                Biostrings_2.40.2         
[16] Rsamtools_1.24.0           XML_3.98-1.5      
ADD COMMENTlink modified 21 months ago by Michael Lawrence10k • written 21 months ago by Jon Bråte150
2
gravatar for Michael Lawrence
21 months ago by
United States
Michael Lawrence10k wrote:

You probably want to use version 3 of the format, by using the extension "gff3" or specifying format="3". Otherwise, it uses GFF version 1 by default, which is limited and so old that I can't even find the spec anymore.

ADD COMMENTlink written 21 months ago by Michael Lawrence10k

That solved it, thanks!

export(utr, "3_UTR.gff3", format = "GFF3")
ADD REPLYlink written 21 months ago by Jon Bråte150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 229 users visited in the last hour