Question: Comparing AAStrings using identical() instead of ==
gravatar for jdavison
4.6 years ago by
United States
jdavison10 wrote:

Hi, I made a mistake using identical() to compare 2 AAStrings, as in 

identical(AAString('A'), AAString('Z'))

which returns TRUE.  That seems to compare lengths.  It took me a while to see I was not getting the result I expected and to learn the correct comparison operator is '=='.  Maybe I should have been warned off?  Thank you.

R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Biostrings_2.32.0    XVector_0.4.0        IRanges_1.22.7      
[4] BiocGenerics_0.10.0  BiocInstaller_1.14.3

loaded via a namespace (and not attached):
[1] compiler_3.1.1  stats4_3.1.1    tools_3.1.1     zlibbioc_1.10.0


ADD COMMENTlink modified 4.5 years ago by Hervé Pagès ♦♦ 14k • written 4.6 years ago by jdavison10
Answer: Comparing AAStrings using identical() instead of ==
gravatar for Hervé Pagès
4.5 years ago by
Hervé Pagès ♦♦ 14k
United States
Hervé Pagès ♦♦ 14k wrote:

Sorry Jerry for the late answer. The problem is that the sequence data in an XString object is stored behind an external pointer and that identical() always considers 2 external pointers to be identical (which IMO doesn't make sense but I bet you'll find someone to argue it's a feature):

aa1 <- AAString("A")
aa2 <- AAString("Z")

Each object contains an external pointer to the sequence data:

aa1@shared@xp  # pointer to "A"
# <pointer: (nil)>
aa2@shared@xp  # pointer to "Z"
# <pointer: (nil)>

As expected these external pointers have different addresses:

# [1] "0x9106b80"
# [1] "0x90cb760"

and the sequence data at these addresses is different. However, identical() reports that the 2 external pointers are identical:

identical(aa1@shared@xp, aa2@shared@xp)
# [1] TRUE

Turning identical() into a generic function and defining methods for XString objects would in theory address the problem but doesn't seem advisable here. So there is not much I can do until someone in the R Core team fixes the behavior of identical() on external pointers. But note that such a fix to identical() would still not make it reliable on XString objects because we would still get false negatives in some situations (but no more false positives):

aa3 <- subseq(AAString("ZZA"), start=3, end=3)
#   1-letter "AAString" instance
# seq: A

identical(aa1, aa3)
# [1] FALSE

That's because even though aa1 and aa2 represent the same sequence, their internal representations differ. So the advice is really to use == as you figured out.




ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by Hervé Pagès ♦♦ 14k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 146 users visited in the last hour