Handling DNA sequence loaded from a file
0
0
Entering edit mode
@agaz-hussain-wani-7620
Last seen 6.1 years ago
India

I have a file with DNA sequence of length 900 characters. I use Biostrings package to read the sequence from the file into R.

library(Biostrings)

ref <- readDNAStringSet(filepath)

 

I get something like

 

A DNAStringSet instance of length 1
width seq                               names               
[1]   900 AACTGGTTACCTGCCGTGAGTAAATTAAAATT...GACGCAACGGTTCCGACTACTCTGCTGCGGTG AGCTTTTCATTCTGACT...

I would like to know, how to get the full DNA sequence into other variable. How to access the full sequence.

 

r biostrings DNAsequence • 2.4k views
ADD COMMENT
0
Entering edit mode

It is there, only some of the letters are showing. What do you want to do with your sequence now that you have read it in?

ADD REPLY
0
Entering edit mode

I want to have the full sequence as a character string and do some operation on that. Suppose I want to display the first 500 characters or do some string operation on that.

ADD REPLY
1
Entering edit mode

On the one hand you could display them with as.character(ref) but if you wanted to do something you'd typically use a function, e.g., reverseComplement(ref) or narrow(ref, 500). This would return another DNAStringSet, which you could then coerce to character if you wanted, but really there's not much to be gained by looking at 500 letters (or 3 billion, or 10 million 100-mers) with the naked eye. Check out the help page ?DNAStringSet including the examples to get some ideas; the infrastructure is very flexible.

ADD REPLY
0
Entering edit mode

Looking at your DNAStringSet, I wonder what the file that you are reading from looks like? readDNAStringSet is expecting a FASTA file

> An.identifer
AAACAACCA

and the identifier is used as the 'name'; it seems like you have a 'name' that is a sequence. If you have a plain text file with 1 DNA sequence per line, you could do something like

DNAStringSet(readLines(filepath))
ADD REPLY
0
Entering edit mode
Thanks for your comments. When I try as.character(ref), I get something like 

ATGGCCCCCTTTGCCCAGGGAAAGGGCCCCTGGGGGGGGTTTTTTTGCCCACGTGCAGTCAGTCAGTAAGTCAAGTCAATGACGTCAATGCAACGTGCAAAAAAAAAAAAAGGAAATTAAAGGGCCCTGACGTGCAGCCAGTCAGTCAAAGGG 
                                                                                                                                                       "" 

When I try to find the number of characters 

nchar(ref)

It returns 0. What is going wrong??. 

ADD REPLY
0
Entering edit mode

Do you mean nchar(ref) ? Please cut-and-paste from your R session into the support site, because it is very important to be precise. I have

> ref = DNAStringSet(c("AAATT", "AAA"))
> nchar(ref)
[1] 5 3

using the current version of software. What is the output of your sessionInfo()?

> sessionInfo()
R version 3.2.4 Patched (2016-03-16 r70348)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] Biostrings_2.38.4    XVector_0.10.0       IRanges_2.4.8       
[4] S4Vectors_0.8.11     BiocGenerics_0.16.1  BiocInstaller_1.20.1

loaded via a namespace (and not attached):
[1] zlibbioc_1.16.0

 

ADD REPLY
0
Entering edit mode

Thanks a lot for writing . Its my bad that I could not notice the error early in my FASTA file. Everything looks fine now. 

ADD REPLY
0
Entering edit mode

will this only work with a FATSA file or can I do this with a txt file?

ADD REPLY
0
Entering edit mode

Please don't add comments to a 6 year old post. If you have a question, please submit a new question.

ADD REPLY

Login before adding your answer.

Traffic: 783 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6