Question: Handling DNA sequence loaded from a file
gravatar for Agaz Hussain Wani
2.4 years ago by
Agaz Hussain Wani260 wrote:

I have a file with DNA sequence of length 900 characters. I use Biostrings package to read the sequence from the file into R.


ref <- readDNAStringSet(filepath)


I get something like


A DNAStringSet instance of length 1
width seq                               names               

I would like to know, how to get the full DNA sequence into other variable. How to access the full sequence.


ADD COMMENTlink written 2.4 years ago by Agaz Hussain Wani260

It is there, only some of the letters are showing. What do you want to do with your sequence now that you have read it in?

ADD REPLYlink written 2.4 years ago by Martin Morgan ♦♦ 22k

I want to have the full sequence as a character string and do some operation on that. Suppose I want to display the first 500 characters or do some string operation on that.

ADD REPLYlink written 2.4 years ago by Agaz Hussain Wani260

On the one hand you could display them with as.character(ref) but if you wanted to do something you'd typically use a function, e.g., reverseComplement(ref) or narrow(ref, 500). This would return another DNAStringSet, which you could then coerce to character if you wanted, but really there's not much to be gained by looking at 500 letters (or 3 billion, or 10 million 100-mers) with the naked eye. Check out the help page ?DNAStringSet including the examples to get some ideas; the infrastructure is very flexible.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Martin Morgan ♦♦ 22k

Looking at your DNAStringSet, I wonder what the file that you are reading from looks like? readDNAStringSet is expecting a FASTA file

> An.identifer

and the identifier is used as the 'name'; it seems like you have a 'name' that is a sequence. If you have a plain text file with 1 DNA sequence per line, you could do something like

ADD REPLYlink written 2.4 years ago by Martin Morgan ♦♦ 22k
Thanks for your comments. When I try as.character(ref), I get something like 


When I try to find the number of characters 


It returns 0. What is going wrong??. 

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Agaz Hussain Wani260

Do you mean nchar(ref) ? Please cut-and-paste from your R session into the support site, because it is very important to be precise. I have

> ref = DNAStringSet(c("AAATT", "AAA"))
> nchar(ref)
[1] 5 3

using the current version of software. What is the output of your sessionInfo()?

> sessionInfo()
R version 3.2.4 Patched (2016-03-16 r70348)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] Biostrings_2.38.4    XVector_0.10.0       IRanges_2.4.8       
[4] S4Vectors_0.8.11     BiocGenerics_0.16.1  BiocInstaller_1.20.1

loaded via a namespace (and not attached):
[1] zlibbioc_1.16.0


ADD REPLYlink written 2.3 years ago by Martin Morgan ♦♦ 22k

Thanks a lot for writing . Its my bad that I could not notice the error early in my FASTA file. Everything looks fine now. 

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Agaz Hussain Wani260
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 351 users visited in the last hour