Search
Question: Exporting a FASTA file from a getBM vs getSequence query
0
gravatar for Glubbdrubb
9 weeks ago by
Glubbdrubb0
Glubbdrubb0 wrote:

A summary:

I want to get sequences for a list of genes, with their stable gene IDs, transcript IDs, and gene symbols in the ID lines in the resultant fasta file, like so:

>MGP_SPRETEiJ_G0015912|MGP_SPRETEiJ_T0023337|Tab2
ATGACTTCTCTCAACTTGGACTTGCAGTCACAGAATGTTTACCACCATGGAAGAGAAGGA
AGTCGAGTGAATGGAAGCAG

MGP_SPRETEiJ is the species ID.

 

getSequence provides easy export of sequences to FASTA file, but does not provide all that data in the ID line for each sequence

getBM provides all the sections I need, but I can't export the sequences to FASTA.

 

My methods:

First, getSequence:

mart <- useMart("ensembl", dataset="caperea_gene_ensembl")
seq = getSequence(id = genes$V1,
                  seqType = "cdna",
                  type = "hgnc_symbol",
                  mart = mart)

This is exported to FASTA by:

exportFASTA(seq, file='Data/biomartExportTest.seq')

But the fasta ID lines only have the gene name, not the ensembl stable gene ID, or the transcript ID:

>PDPK1
ATCCAGTCCAGCGTGGTGTTATGTGCCTGCCCATCCCCATCAATGGTGAGGACACAGAC

 

Now the getBM method:

fasta = getBM(attributes=c("coding",
                           "ensembl_gene_id",
                           "ensembl_transcript_id",
                           "external_gene_name"),
                    filters = "hgnc_symbol",
                    values = genes$V1,
                    mart = ensembl)

The resultant dataframe has all the data I need, but I  don't know how to export it to FASTA:

 

ADD COMMENTlink modified 9 weeks ago by James W. MacDonald45k • written 9 weeks ago by Glubbdrubb0
2
gravatar for James W. MacDonald
9 weeks ago by
United States
James W. MacDonald45k wrote:
library(Biostrings)

z <- DNAStringSet(fasta[,1])

names(z) <- apply(fasta[,-1], 2, paste, collapse = "; ")

writeXStringSet(z, "biomartExportTest.seq")
ADD COMMENTlink written 9 weeks ago by James W. MacDonald45k

Biostrings looks like a very useful package! Thanks for your answer :-)

But something went wrong during the paste function.

Here's the error I get:

> writeXStringSet(z, 'Data/biomartEportTest.fasta')
Error in .Call2("write_XStringSet_to_fasta", x, filexp_list, width, lkup,  :
  'names(x)' contains NAs

 

When I view z:

> z
  A DNAStringSet instance of length 89
     width seq                                                                         names               
 [1]  1776 GACCGTGTGGCGCGCATCGGACTGGGCGTGGTTTTG...AGTTTGGAAGGGGCTGCACCCATGGGCCCACCTTAA ENSCAPG0000000376...
 [2]  3714 ATGACTACAAAACGAAGTTTGTTTGTGCGGTTGGTA...AAAATTATTCAGTCTGGCCTTGCACTCAGTATGTAG ENSCAPT0000000419...
 [3]  1176 ATGCTAACCCCTTCCCCGGTCCTGGTTCTCAACCTC...GCGCTCAGCCAGCCCCATCCGCCCTTCTTCAGGTAG SARM1; ERBIN; MAP...
 [4]  1851 ATGCAGAGCACTCCTAACCATCTGTGGCTTTTATCT...ATGGATGTTGGCCTACGGAATGTCGACGGTCTTTAG <NA>
 [5]  1611 GAACTTGATCTCAAAATAGCTATTAAGTCTTGCCGA...ATGATGAATCTTGATTGGAGTTGGTTAACTGAATGA <NA>
 ...   ... ...
[85]  1320 ATCCAGTCCAGCGTGGTGTTATGTGCCTGCCCATCC...CGATACCAGAGCCACCCAGATGCTGCTGTGCAGTGA <NA>
[86]  1113 ATGGACAGTAAATGTGTGTTCATCTCTGTCAGTAAA...AGCAAGAGAAACAGCTGGTATGTGATTACAACTTGA <NA>
[87]  2418 ATGTCTGACAGTGGATCACAACCTGGTTCAATGGGT...GCCATTGAAGAAACAGAAGGATTTGGACAAGAGTAA <NA>
[88]   999 ATCCAGTCCAGCGTGGTGTTATGTGCCTGCCCATCC...CGATACCAGAGCCACCCAGATGCTGCTGTGCAGTGA <NA>
[89]  1119 ACATTACTACCTCCCCATTCATACCCCAAAGAAACT...CCAGCTACTCCCAGCTCTCCCATGTATGTTGATTGA <NA>

Many of the sequences have NA instead of names. And many sequences have more than one name set.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Glubbdrubb0

I fixed it!

In the line:

names(z) <- apply(fasta[,-1], 2, paste, collapse = "; ")

I changed 2 to 1:

names(z) <- apply(fasta[,-1], 1, paste, collapse = "; ")

 

 

And it works now!

 

Thank you so much for your help!

 

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Glubbdrubb0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 297 users visited in the last hour