Search
Question: Handling DNA sequence loaded from a file
0
gravatar for Agaz Hussain Wani
20 months ago by
India
Agaz Hussain Wani240 wrote:

I have a file with DNA sequence of length 900 characters. I use Biostrings package to read the sequence from the file into R.

library(Biostrings)

ref <- readDNAStringSet(filepath)

 

I get something like

 

A DNAStringSet instance of length 1
width seq                               names               
[1]   900 AACTGGTTACCTGCCGTGAGTAAATTAAAATT...GACGCAACGGTTCCGACTACTCTGCTGCGGTG AGCTTTTCATTCTGACT...

I would like to know, how to get the full DNA sequence into other variable. How to access the full sequence.

 

ADD COMMENTlink written 20 months ago by Agaz Hussain Wani240

It is there, only some of the letters are showing. What do you want to do with your sequence now that you have read it in?

ADD REPLYlink written 20 months ago by Martin Morgan ♦♦ 21k

I want to have the full sequence as a character string and do some operation on that. Suppose I want to display the first 500 characters or do some string operation on that.

ADD REPLYlink written 20 months ago by Agaz Hussain Wani240
1

On the one hand you could display them with as.character(ref) but if you wanted to do something you'd typically use a function, e.g., reverseComplement(ref) or narrow(ref, 500). This would return another DNAStringSet, which you could then coerce to character if you wanted, but really there's not much to be gained by looking at 500 letters (or 3 billion, or 10 million 100-mers) with the naked eye. Check out the help page ?DNAStringSet including the examples to get some ideas; the infrastructure is very flexible.

ADD REPLYlink modified 20 months ago • written 20 months ago by Martin Morgan ♦♦ 21k

Looking at your DNAStringSet, I wonder what the file that you are reading from looks like? readDNAStringSet is expecting a FASTA file

> An.identifer
AAACAACCA

and the identifier is used as the 'name'; it seems like you have a 'name' that is a sequence. If you have a plain text file with 1 DNA sequence per line, you could do something like

DNAStringSet(readLines(filepath))
ADD REPLYlink written 20 months ago by Martin Morgan ♦♦ 21k
Thanks for your comments. When I try as.character(ref), I get something like 

ATGGCCCCCTTTGCCCAGGGAAAGGGCCCCTGGGGGGGGTTTTTTTGCCCACGTGCAGTCAGTCAGTAAGTCAAGTCAATGACGTCAATGCAACGTGCAAAAAAAAAAAAAGGAAATTAAAGGGCCCTGACGTGCAGCCAGTCAGTCAAAGGG 
                                                                                                                                                       "" 

When I try to find the number of characters 

nchar(ref)

It returns 0. What is going wrong??. 

ADD REPLYlink modified 20 months ago • written 20 months ago by Agaz Hussain Wani240

Do you mean nchar(ref) ? Please cut-and-paste from your R session into the support site, because it is very important to be precise. I have

> ref = DNAStringSet(c("AAATT", "AAA"))
> nchar(ref)
[1] 5 3

using the current version of software. What is the output of your sessionInfo()?

> sessionInfo()
R version 3.2.4 Patched (2016-03-16 r70348)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] Biostrings_2.38.4    XVector_0.10.0       IRanges_2.4.8       
[4] S4Vectors_0.8.11     BiocGenerics_0.16.1  BiocInstaller_1.20.1

loaded via a namespace (and not attached):
[1] zlibbioc_1.16.0

 

ADD REPLYlink written 20 months ago by Martin Morgan ♦♦ 21k

Thanks a lot for writing . Its my bad that I could not notice the error early in my FASTA file. Everything looks fine now. 

ADD REPLYlink modified 20 months ago • written 20 months ago by Agaz Hussain Wani240
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 424 users visited in the last hour