Search
Question: converting a fasta file to .2bit format
1
gravatar for saripalligautam86
8 days ago by
saripalligautam8610 wrote:

Dear All,

 

I am using biostrings and rtracklayer to convert a wheat reference sequence which is a fasta file to a .2bit file in order to use it for BS forgenomepackage. I first used biostrings to get a DNAStringSet, the output is shown below:

> Biostrings::readDNAStringSet("wheatreference.fasta")
  A DNAStringSet instance of length 22
         width seq                                                                                                                                                                                                      names               
 [1] 594102056 CTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCCTAACCCTAAACCCTAACCAAAACCCTAAACCCTAAACCCCTAAACCCTAAACCCTAACCTAA...ATCGCTTCATGTGCTGCCTCCGAGGCTGCCATGTACTCCGCTTCACACGTAGATCCCGCCACGACGCTCTGCTTGCAGCTGCACCAGCTTACTGCTCC chr1A
 [2] 689851870 ACTGCCAAAACTATTGTTTTTCATCCTGTAGTCCCATTTAGAATTACTAAACGTCCTTTTTTTGGGCCGAGATTTAGAGGAACGCTTTCTCGGGGTTTG...GGCATATCATCGGCACGCCCTCCGACACGGCTTCCACCGTGGAGTTCCAGCCGCTGTGCGTGATGAAGGCGCCTACCGCAGGGTGGGCGAGCACCTCC chr1B
 [3] 495453186 CTAGGCTTCTTGGGCGTGTATGGGAAGACAAAAGATACACCTGAGGCACGGGAGGACCTGCAACGTTTGCACGAAAAAGACGGCATGCCTCCGAAGCAG...TTCGCTTACGTGTCGGTCATCAACAATCACAGTTTGGTCCGATTCTAGCATGTTTCATGGACTATTACTCAGTTTTGGGGTCCTGGAGGCATTTCCAT chr1D
 [4] 780798557 CAGTTCCTAAACTGCTCCAGTCGGCGCACGTTATCATTCTTGTTAGTTCTGAGCGAATTCCCTCGGTATGATCATTCTTTCATCAGTTTGTTCCCCGTT...GTGGGATGACAAGGATGCGTACTACTTCAGAGTGGTACATAATCCTGTCTTGGAAGTCATGTTGCTAGACAACAATGAACCTACGAGCTATGGAGAAG chr2A
 [5] 801256715 GCATAGCCACGCCCCGAAGGCCACCCCGAATCCCAGTTGAACAGGAGATCTAGCCCTTTGACTTTTGCCGGACGGGCTTTGACCAGTGGTCTTTTCCAC...ACTAGGGCGGCTTTGCCCGACCGAGAGAGTGCCGACCGAGAGACTAGGCGTGCCGACCGAGAGAGCGCCGCCCGCGCCGCTTCCCGGTGCAGGCGACT chr2B
 ...       ... ...
[18] 473592718 GCGGGCACATGTGCTGTAGGACTGATGGTAATTTTCATAATTGTTCGTGATGGAGTAGTATCTGAACAATTCCCTCTATGCGGATTGCTCTTCGCGTGT...TAGTCCACAAAACAGGCAAGAATTAGCCAAAACTGCGTGTGTTGATGACCGACACGTAAATGCACCCCGGGGTTCATCAATCGCGGAAATCAGCCCGG chr6D
[19] 736706236 ACCCTAAACCCTAAACCCTAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAATCCTAAACCCTAACCCTAACCC...CTATTTGTGTTGCTGATAAGGATTATGAATTCTCTGTTGATCCTGATATAATTACTTTGGTTGAATCTGATCGTTTCCATGGCTATGAATCTGAAACT chr7A
[20] 750620385 AACCCTAAACCCTAAACCGTAAACCCTAAACCCTAAACCCTAAACCCTAAAAACCCTAAACCCTAAACCCTAAACCCTAAAGCCTAAAAACCCTAAACC...TTACCAACAATAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAGCAACTACAATAACAGC chr7B
[21] 638686055 AACCCTAAACCCTAAACCCTAAACCCTAAACCCTAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCGGGTTCACAAA...TTATTTGCGAGTCGATGGTGGACTTGCAACTGAAGCGGAAAAAATCGAGCCTAGTCGTGAGTCAAAGGATGGACTTGTATACTGGGGCAAAAAAAAAA chr7D
[22] 480980714 TGTGTGTGTGCGCGCGCGCGCGTGTACTGCTATTTATGGTCTCCAGCCTTTCACCCCTCTAATTAGGTTCTACTCTGATAATATTTGTTCTTTCTGATA...TGAGTAATAGTACACGAAACGGGCCAGAATCGGCCAAAACTACGAGCGTTGATGACAGACAAGTAAGCGCACCTTGGGGTTCAACAACTATGCAAATC chrUn

Now, when i am trying to export it using the below code:

>  test_2bit_out <- file.path(tempdir(), "test_out.2bit")
> rtracklayer::export.2bit(DNAStringSet, test_2bit_out, 2bit)
Error: unexpected symbol in "rtracklayer::export.2bit(DNAStringSet, test_2bit_out, 2bit"
> rtracklayer::export.2bit(DNAStringSet, test_2bit_out)
Error in as.character(x) : 
  cannot coerce type 'closure' to vector of type 'character'

I get the above error.

The output of my sessionInfo() is also shown below:

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BSgenome_1.48.0      rtracklayer_1.40.6   Biostrings_2.48.0    XVector_0.20.0       GenomicRanges_1.32.7 GenomeInfoDb_1.16.0  IRanges_2.14.12      S4Vectors_0.18.3     BiocGenerics_0.26.0  BiocInstaller_1.30.0

loaded via a namespace (and not attached):
 [1] zlibbioc_1.26.0             GenomicAlignments_1.16.0    BiocParallel_1.14.2         lattice_0.20-35             tools_3.5.1                 SummarizedExperiment_1.10.1 grid_3.5.1                  Biobase_2.40.0             
 [9] matrixStats_0.54.0          Matrix_1.2-14               GenomeInfoDbData_1.1.0      bitops_1.0-6                RCurl_1.95-4.11             DelayedArray_0.6.6          compiler_3.5.1              Rsamtools_1.32.3           
[17] XML_3.98-1.16              

Please suggest me a code to export the DNAStringSet object to .2bit format.

Looking forward to hearing from you,

 

Gautam Saripalli

ADD COMMENTlink modified 8 days ago by Hervé Pagès ♦♦ 13k • written 8 days ago by saripalligautam8610

You can also have a try faToTwoBit command lilne program

ADD REPLYlink written 8 days ago by Pengcheng Yang50
2
gravatar for Hervé Pagès
8 days ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi,

Some very basic programming concepts are the notion of value, class, and variable.

A call like Biostrings::readDNAStringSet("wheatreference.fasta") returns a value (in this particular case the value is a DNAStringSet instance i.e. an object of class DNAStringSet). If you don't store this value somewhere, for example by assigning it to a variable, then the value gets displayed (that's why you can see the content of the DNAStringSet instance) but you "loose it" i.e. you won't be able to use it later. So if you are planning to do something with this DNAStringSet instance, you need to store it somewhere e.g.:

wheat_genome_seqs <- Biostrings::readDNAStringSet("wheatreference.fasta")

Then you can export it with rtracklayer::export.2bit():

test_2bit_out <- file.path(tempdir(), "test_out.2bit")
rtracklayer::export.2bit(wheat_genome_seqs, test_2bit_out)

These basic concepts are not R specific but are found in programming languages in general. It's worth taking the time to read and familiarize yourself with such basic things. It will be time well spent as it will save you a great amount of time and frustration in the future.

Hope this helps,

H.

ADD COMMENTlink written 8 days ago by Hervé Pagès ♦♦ 13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 381 users visited in the last hour