Question about parallel manipulation on CharacterList objects
Entering edit mode
li lilingdu ▴ 450
Last seen 4.0 years ago

Hi, I want to know how to parallelly manipulate CharacterList objects in IRanges package.

For example, for very large list of letters:

ir = successiveIRanges(width=sample(1:26,1000000,replace=T))
dat = relist(sample(letters,sum(width(ir)),replace=T),ir)

For each element of the length-1000000-CharacterList, I want get the setdiff of 26 letters and the members in each element. I try the psetdiff function, however it dosn't work for CompressedCharacterList object. Also, I don't know how to combine two CharacterList objects parallel.

no.letters = psetdiff(letters, dat)  ##psetdiff does not work here
combined = puion(upper(no.letters),dat) ###try to combine two CharacterList objects.

Any suggestions, thanks.

s4vectors IRanges • 585 views
Entering edit mode
Last seen 6 days ago
United States

Probably there is something fast already. I did this naively as

CharacterList(lapply(dat, setdiff, x=letters))

It took about 30 seconds to evaluate. To be more clever, and thinking that the data wasn't too large I made a matrix of TRUE values, where rows represent elements of dat and columns the letters.

m <- matrix(TRUE, nrow=length(dat), ncol=length(letters))

Then I indexed into the matrix and set to FALSE each position that was in dat

row = rep(seq_along(dat), lengths(dat))
col = match(unlist(dat), letters)
m[cbind(row, col)] = FALSE

And finally retrieved the remaining TRUE values and placed them in a list

row = row(m)[m]
col = letters[col(m)[m]]
splitAsList(col, row)

This takes about 3 seconds to evaluate. A problem is when an element of dat contains all letters so there is no row element of character(0) created. A solution is to update the original data structure

row = row(m)[m]; urow = unique(row)
col = letters[col(m)[m]]
dat[urow] = splitAsList(col, row)
dat[-urow] = list(character())

Login before adding your answer.

Traffic: 263 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6