Question: Comparing AAStrings using identical() instead of ==
1
4.8 years ago by
jdavison10
United States
jdavison10 wrote:

Hi, I made a mistake using identical() to compare 2 AAStrings, as in

identical(AAString('A'), AAString('Z'))

which returns TRUE.  That seems to compare lengths.  It took me a while to see I was not getting the result I expected and to learn the correct comparison operator is '=='.  Maybe I should have been warned off?  Thank you.

sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Biostrings_2.32.0    XVector_0.4.0        IRanges_1.22.7
[4] BiocGenerics_0.10.0  BiocInstaller_1.14.3

loaded via a namespace (and not attached):
[1] compiler_3.1.1  stats4_3.1.1    tools_3.1.1     zlibbioc_1.10.0

modified 4.8 years ago by Hervé Pagès ♦♦ 14k • written 4.8 years ago by jdavison10
1
4.8 years ago by
Hervé Pagès ♦♦ 14k
United States
Hervé Pagès ♦♦ 14k wrote:

Sorry Jerry for the late answer. The problem is that the sequence data in an XString object is stored behind an external pointer and that identical() always considers 2 external pointers to be identical (which IMO doesn't make sense but I bet you'll find someone to argue it's a feature):

aa1 <- AAString("A")
aa2 <- AAString("Z")

Each object contains an external pointer to the sequence data:

aa1@shared@xp  # pointer to "A"
# <pointer: (nil)>
aa2@shared@xp  # pointer to "Z"
# <pointer: (nil)>

As expected these external pointers have different addresses:

XVector:::address(aa1@shared@xp)
# [1] "0x9106b80"
# [1] "0x90cb760"

and the sequence data at these addresses is different. However, identical() reports that the 2 external pointers are identical:

identical(aa1@shared@xp, aa2@shared@xp)
# [1] TRUE

Turning identical() into a generic function and defining methods for XString objects would in theory address the problem but doesn't seem advisable here. So there is not much I can do until someone in the R Core team fixes the behavior of identical() on external pointers. But note that such a fix to identical() would still not make it reliable on XString objects because we would still get false negatives in some situations (but no more false positives):

aa3 <- subseq(AAString("ZZA"), start=3, end=3)
aa3
#   1-letter "AAString" instance
# seq: A

identical(aa1, aa3)
# [1] FALSE

That's because even though aa1 and aa2 represent the same sequence, their internal representations differ. So the advice is really to use == as you figured out.

Cheers,

H.