Comparing AAStrings using identical() instead of ==
1
1
Entering edit mode
jdavison ▴ 10
@jdavison-7081
Last seen 8.1 years ago
United States

Hi, I made a mistake using identical() to compare 2 AAStrings, as in 

identical(AAString('A'), AAString('Z'))

which returns TRUE.  That seems to compare lengths.  It took me a while to see I was not getting the result I expected and to learn the correct comparison operator is '=='.  Maybe I should have been warned off?  Thank you.

sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Biostrings_2.32.0    XVector_0.4.0        IRanges_1.22.7      
[4] BiocGenerics_0.10.0  BiocInstaller_1.14.3

loaded via a namespace (and not attached):
[1] compiler_3.1.1  stats4_3.1.1    tools_3.1.1     zlibbioc_1.10.0

 

AAstring Biostrings == identical • 1.1k views
ADD COMMENT
1
Entering edit mode
@herve-pages-1542
Last seen 11 hours ago
Seattle, WA, United States

Sorry Jerry for the late answer. The problem is that the sequence data in an XString object is stored behind an external pointer and that identical() always considers 2 external pointers to be identical (which IMO doesn't make sense but I bet you'll find someone to argue it's a feature):

aa1 <- AAString("A")
aa2 <- AAString("Z")

Each object contains an external pointer to the sequence data:

aa1@shared@xp  # pointer to "A"
# <pointer: (nil)>
aa2@shared@xp  # pointer to "Z"
# <pointer: (nil)>

As expected these external pointers have different addresses:

XVector:::address(aa1@shared@xp)
# [1] "0x9106b80"
XVector:::address(aa2@shared@xp)
# [1] "0x90cb760"

and the sequence data at these addresses is different. However, identical() reports that the 2 external pointers are identical:

identical(aa1@shared@xp, aa2@shared@xp)
# [1] TRUE

Turning identical() into a generic function and defining methods for XString objects would in theory address the problem but doesn't seem advisable here. So there is not much I can do until someone in the R Core team fixes the behavior of identical() on external pointers. But note that such a fix to identical() would still not make it reliable on XString objects because we would still get false negatives in some situations (but no more false positives):

aa3 <- subseq(AAString("ZZA"), start=3, end=3)
aa3
#   1-letter "AAString" instance
# seq: A

identical(aa1, aa3)
# [1] FALSE

That's because even though aa1 and aa2 represent the same sequence, their internal representations differ. So the advice is really to use == as you figured out.

Cheers,

H.

 

ADD COMMENT

Login before adding your answer.

Traffic: 691 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6