Compare DNA sequence pairs and compute lengths and coordinates
Entering edit mode
Md Abrar • 0
Last seen 2.2 years ago

Dear altruists, the following code is running properly when I'm trying to compare AXT (from UCSC) sequence pairs and keep only [A, T, C, G] uppercases and compute the matched sequence lengths. But somehow, this function is producing a lot of NaN values. Actually it should not contain any NaN values at all. I implemented the same algorithm for Python and it worked properly. But it seems that Bioconductor's DNAStringSet has limitations like it can't compare lower cases. It converts lowercase to uppercase automatically and then compares. So, I converted the AXT file's lowercases to '-' and then used it as an input. Still, I'm getting NaN values now. How can I overcome this?

The full code can be found here:

N.B: I changed the logic for the identical_bases variable. After this change, I'm getting NaN values.

  #x must be an Axt object imported by CNEr
  qr<-BStringSet(apply(, 2, function(z){
    tmp<-paste0("M", z)
    out<-paste0(tmp, "N")
  tr<-BStringSet(apply(, 2, function(z){
    tmp<-paste0("K", z)
    out<-paste0(tmp, "V")

#  identical_bases <- mapply(x=qr, y=tr, function(x,y){
#    out<-which(as.raw(x)==as.raw(y) & (as.raw(x)==charToRaw('A') | as.raw(x)==charToRaw('T') | as.raw(x)==charToRaw('C') | as.raw(x)==charToRaw('G')) & (as.raw(y)==charToRaw('A') | as.raw(y)==charToRaw('T') | as.raw(y)==charToRaw('C') | as.raw(y)==charToRaw('G')))
#  })

  identical_bases <- mapply(x=qr, y=tr, function(x,y){
        out<-which((as.matrix(x) == as.matrix(y)) & 
                           (as.matrix(x) != '-')  & (as.matrix(y) != '-') & 
                           (as.matrix(x)=='A' | as.matrix(x)=='T' | as.matrix(x)=='C' | as.matrix(x)=='G') & 
                           (as.matrix(y)=='A' | as.matrix(y)=='T' | as.matrix(y)=='C' | as.matrix(y)=='G'))

  relative_ranges <- lapply(identical_bases, function(x){
    reduce(IRanges(x-2, x-2))

  absolute_ranges <- mapply(z=relative_ranges, y=as.list(x), function(z,y){
    if(length(z) == 0){
    } else {
                           "first.start"=(start(first(y)) + start(z)),
                           "first.end"=(start(first(y)) + end(z)),
                           "first.width"=(end(z) - start(z) + 1),
                           "second.start"=(start(second(y)) + start(z)),
                           "second.end"=(start(second(y)) + end(z)),
                           "second.width"=(end(z) - start(z) + 1),

DNASeq CNEr • 623 views

Login before adding your answer.

Traffic: 431 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6