replace nucleotide at fixed position in a DNAStringSet object
1
0
Entering edit mode
Robert Castelo ★ 3.4k
@rcastelo
Last seen 2 days ago
Barcelona/Universitat Pompeu Fabra
hi!! i'd like to know if there is some efficient way to replace a nucleotide at a fixed position in a DNAStringSet object. let's say we have the following toy DNAStringSet object with 3 DNA sequences: x <- DNAStringSet(c("ATGACCACG", "ACTGGGGAA", "GCCGATGCG")) x A DNAStringSet instance of length 3 width seq [1] 9 ATGACCACG [2] 9 ACTGGGGAA [3] 9 GCCGATGCG and a DNAStringSetList object with the following 3 nucleotides y <- DNAStringSetList(DNAStringSet("G"), DNAStringSet("C"), DNAStringSet("C")) y DNAStringSetList of length 3 [[1]] G [[2]] C [[3]] C i'd like to replace the, let's say, fourth nucleotide along the DNA sequences in 'x' by those in 'y'. i can imagine how to do it coercing back and forth to character and so on but i guess there must be some more efficient way to do it. my interest come from the fact that the DNAStringSet object i have to work with can have many DNA sequences. thanks!! robert. -- Robert Castelo, PhD Associate Professor Dept. of Experimental and Health Sciences Universitat Pompeu Fabra (UPF) Barcelona Biomedical Research Park (PRBB) Dr Aiguader 88 E-08003 Barcelona, Spain telf: +34.933.160.514 fax: +34.933.160.550
• 1.7k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.9 years ago
United States
Hi, On 09/13/2013 07:13 AM, Robert Castelo wrote: > hi!! > > i'd like to know if there is some efficient way to replace a nucleotide > at a fixed position in a DNAStringSet object. > > let's say we have the following toy DNAStringSet object with 3 DNA > sequences: > > x <- DNAStringSet(c("ATGACCACG", "ACTGGGGAA", "GCCGATGCG")) > x > A DNAStringSet instance of length 3 > width seq > [1] 9 ATGACCACG > [2] 9 ACTGGGGAA > [3] 9 GCCGATGCG > > and a DNAStringSetList object with the following 3 nucleotides > > y <- DNAStringSetList(DNAStringSet("G"), DNAStringSet("C"), > DNAStringSet("C")) > y > DNAStringSetList of length 3 > [[1]] G > [[2]] C > [[3]] C > > i'd like to replace the, let's say, fourth nucleotide along the DNA > sequences in 'x' by those in 'y'. i can imagine how to do it coercing > back and forth to character and so on but i guess there must be some > more efficient way to do it. I don't think so. XString objects are immutable. The data are accessed through an external pointer to an environment where they are written/stored as raw. To subset/replace positions in 'x' with values from 'y' you would need to go through the 'as.character' conversion and create a new DNAStringSet. I've cc Herve in case I've gotten this wrong or he has a different solution to the problem. Valerie my interest come from the fact that the > DNAStringSet object i have to work with can have many DNA sequences. > > thanks!! > robert. >
ADD COMMENT
0
Entering edit mode
Hi guys, With Bioc-devel, you can use replaceAt() for this: x <- DNAStringSet(c("ATGACCACG", "ACTGGGGAA", "GCCGATGCG")) y <- DNAStringSetList(DNAStringSet("G"), DNAStringSet("C"), DNAStringSet("C")) Then: > replaceAt(x, IRanges(4, 4), y) A DNAStringSet instance of length 3 width seq [1] 9 ATGGCCACG [2] 9 ACTCGGGAA [3] 9 GCCCATGCG An important clarification: An XString or XStringSet object is not more immutable than a character vector or an R object in general in the sense that we are not supposed to modify it *in-place*, except in some particular situations where we know it's safe to do so. When it's not safe to do so, then the object (or part of it) is copied and the copy is modified. Of course all this is transparent to the end-user who should never need to worry about whether it is safe or not to call [<-, [[<- or replaceAt() on his/her DNAStringSet object: copies are made if needed so those operations are always safe. Cheers, H. On 09/16/2013 09:46 AM, Valerie Obenchain wrote: > Hi, > > On 09/13/2013 07:13 AM, Robert Castelo wrote: >> hi!! >> >> i'd like to know if there is some efficient way to replace a nucleotide >> at a fixed position in a DNAStringSet object. >> >> let's say we have the following toy DNAStringSet object with 3 DNA >> sequences: >> >> x <- DNAStringSet(c("ATGACCACG", "ACTGGGGAA", "GCCGATGCG")) >> x >> A DNAStringSet instance of length 3 >> width seq >> [1] 9 ATGACCACG >> [2] 9 ACTGGGGAA >> [3] 9 GCCGATGCG >> >> and a DNAStringSetList object with the following 3 nucleotides >> >> y <- DNAStringSetList(DNAStringSet("G"), DNAStringSet("C"), >> DNAStringSet("C")) >> y >> DNAStringSetList of length 3 >> [[1]] G >> [[2]] C >> [[3]] C >> >> i'd like to replace the, let's say, fourth nucleotide along the DNA >> sequences in 'x' by those in 'y'. i can imagine how to do it coercing >> back and forth to character and so on but i guess there must be some >> more efficient way to do it. > > I don't think so. XString objects are immutable. The data are accessed > through an external pointer to an environment where they are > written/stored as raw. To subset/replace positions in 'x' with values > from 'y' you would need to go through the 'as.character' conversion and > create a new DNAStringSet. > > I've cc Herve in case I've gotten this wrong or he has a different > solution to the problem. > > Valerie > > > > > my interest come from the fact that the >> DNAStringSet object i have to work with can have many DNA sequences. >> >> thanks!! >> robert. >> > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Herv?, Valerie, this function replaceAt() is doing exactly what i was looking for, thanks a lot!!!! robert. ps: FYI, the current version at SVN (.18) seems to break with the instruction below y <- .. but after checking out the very latest .19 the example works. On 09/16/2013 08:40 PM, Hervé Pagès wrote: > Hi guys, > > With Bioc-devel, you can use replaceAt() for this: > > x <- DNAStringSet(c("ATGACCACG", "ACTGGGGAA", "GCCGATGCG")) > y <- DNAStringSetList(DNAStringSet("G"), DNAStringSet("C"), > DNAStringSet("C")) > > Then: > > > replaceAt(x, IRanges(4, 4), y) > A DNAStringSet instance of length 3 > width seq > [1] 9 ATGGCCACG > [2] 9 ACTCGGGAA > [3] 9 GCCCATGCG > > An important clarification: An XString or XStringSet object is not more > immutable than a character vector or an R object in general in the > sense that we are not supposed to modify it *in-place*, except in some > particular situations where we know it's safe to do so. When it's not > safe to do so, then the object (or part of it) is copied and the copy > is modified. Of course all this is transparent to the end-user who > should never need to worry about whether it is safe or not to call [<-, > [[<- or replaceAt() on his/her DNAStringSet object: copies are made > if needed so those operations are always safe. > > Cheers, > H. > > > > On 09/16/2013 09:46 AM, Valerie Obenchain wrote: >> Hi, >> >> On 09/13/2013 07:13 AM, Robert Castelo wrote: >>> hi!! >>> >>> i'd like to know if there is some efficient way to replace a nucleotide >>> at a fixed position in a DNAStringSet object. >>> >>> let's say we have the following toy DNAStringSet object with 3 DNA >>> sequences: >>> >>> x <- DNAStringSet(c("ATGACCACG", "ACTGGGGAA", "GCCGATGCG")) >>> x >>> A DNAStringSet instance of length 3 >>> width seq >>> [1] 9 ATGACCACG >>> [2] 9 ACTGGGGAA >>> [3] 9 GCCGATGCG >>> >>> and a DNAStringSetList object with the following 3 nucleotides >>> >>> y <- DNAStringSetList(DNAStringSet("G"), DNAStringSet("C"), >>> DNAStringSet("C")) >>> y >>> DNAStringSetList of length 3 >>> [[1]] G >>> [[2]] C >>> [[3]] C >>> >>> i'd like to replace the, let's say, fourth nucleotide along the DNA >>> sequences in 'x' by those in 'y'. i can imagine how to do it coercing >>> back and forth to character and so on but i guess there must be some >>> more efficient way to do it. >> >> I don't think so. XString objects are immutable. The data are accessed >> through an external pointer to an environment where they are >> written/stored as raw. To subset/replace positions in 'x' with values >> from 'y' you would need to go through the 'as.character' conversion and >> create a new DNAStringSet. >> >> I've cc Herve in case I've gotten this wrong or he has a different >> solution to the problem. >> >> Valerie >> >> >> >> >> my interest come from the fact that the >>> DNAStringSet object i have to work with can have many DNA sequences. >>> >>> thanks!! >>> robert. >>> >> > -- Robert Castelo, PhD Associate Professor Dept. of Experimental and Health Sciences Universitat Pompeu Fabra (UPF) Barcelona Biomedical Research Park (PRBB) Dr Aiguader 88 E-08003 Barcelona, Spain telf: +34.933.160.514 fax: +34.933.160.550
ADD REPLY

Login before adding your answer.

Traffic: 735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6