Gene information (GRanges + sequence) to GenBank
1
0
Entering edit mode
biomiha ▴ 20
@biomiha-11346
Last seen 4 months ago
UK/Cambridge

Hi,

Apologies for the naive post but I'm wondering if there is a way of exporting information I have on a gene acquired from ensembldb to a GenBank file?

I have the following code, where I fetch the sequence and annotation of the Actin beta gene:

library(EnsDb.Hsapiens.v86)
library(dplyr)

Hs_edb <- EnsDb.Hsapiens.v86
Hs_dna <- getGenomeTwoBitFile(Hs_edb)

ACTB_db <- Hs_edb %>% 
  ensembldb::filter(filter = GeneNameFilter("ACTB")) %>% 
  ensembldb::filter(filter = ~tx_biotype == "protein_coding")
ACTB_gene <- genes(ACTB_db)
ACTB_seq <- getSeq(Hs_dna, ACTB_gene)

This gives me a GRanges object with the features and a DNAStringSet object with the sequence. What I would like is to export the sequence and features into a .gb or .gbk file. The reason I'd like to do this is that many of my colleagues use different forms of software to view sequence information and are not skilled with R. The only commonly interchangeable format that most pieces of software are able to parse is GenBank.

Thank you.

granges genomic ranges genbankr ensembldb • 1.5k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

I would be surprised if there were any functionality to do this. I can see a use case for parsing a .gbk file, but what's the use case for generating one? If you want the β-actin .gbk file, why wouldn't you just go to NCBI and download it, particularly since what you will be providing your colleagues is just a small proportion of what is actually in a .gbk file?

ADD COMMENT
0
Entering edit mode

Hi James,

Thank you for replying. Actin beta was just used as a representative example. What I am actually dealing with are molecules that are not in the NCBI, e.g. chimeric molecules from transgenic mice. I've found a github package (gschofl/biofiles) but it only writes out files that have been parsed from .gbk files.

ADD REPLY
0
Entering edit mode

There's nothing that I can find in the Bioconductor corpus. You can search here. The biofiles package might be useful, but you would have to instantiate a gbRecord object and then a gbFeatureTable and a seqinfo object and jam them into the gbRecord. Sounds like fun!

Or you could just roll yer own that just uses writeLines or cat to output a text file that is similar enough that your colleagues can read it in.

ADD REPLY
0
Entering edit mode

Thank you for the code search link James. Yes, I had high hopes for the biofiles idea, but became progressively less enthusiastic the more I went down the gbRecord rabbit hole. Hence the impulse to ask the community if anything else existed. writeLines it is then. Cheers.

ADD REPLY

Login before adding your answer.

Traffic: 335 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6