Question about parallel manipulation on CharacterList objects
1
1
Entering edit mode
li lilingdu ▴ 450
@li-lilingdu-1884
Last seen 6.7 years ago

Hi, I want to know how to parallelly manipulate CharacterList objects in IRanges package.

For example, for very large list of letters:

ir = successiveIRanges(width=sample(1:26,1000000,replace=T))
dat = relist(sample(letters,sum(width(ir)),replace=T),ir)

For each element of the length-1000000-CharacterList, I want get the setdiff of 26 letters and the members in each element. I try the psetdiff function, however it dosn't work for CompressedCharacterList object. Also, I don't know how to combine two CharacterList objects parallel.

no.letters = psetdiff(letters, dat)  ##psetdiff does not work here
combined = puion(upper(no.letters),dat) ###try to combine two CharacterList objects.

Any suggestions, thanks.

s4vectors IRanges • 1.1k views
ADD COMMENT
1
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States

Probably there is something fast already. I did this naively as

CharacterList(lapply(dat, setdiff, x=letters))

It took about 30 seconds to evaluate. To be more clever, and thinking that the data wasn't too large I made a matrix of TRUE values, where rows represent elements of dat and columns the letters.

m <- matrix(TRUE, nrow=length(dat), ncol=length(letters))

Then I indexed into the matrix and set to FALSE each position that was in dat

row = rep(seq_along(dat), lengths(dat))
col = match(unlist(dat), letters)
m[cbind(row, col)] = FALSE

And finally retrieved the remaining TRUE values and placed them in a list

row = row(m)[m]
col = letters[col(m)[m]]
splitAsList(col, row)

This takes about 3 seconds to evaluate. A problem is when an element of dat contains all letters so there is no row element of character(0) created. A solution is to update the original data structure

row = row(m)[m]; urow = unique(row)
col = letters[col(m)[m]]
dat[urow] = splitAsList(col, row)
dat[-urow] = list(character())
ADD COMMENT

Login before adding your answer.

Traffic: 893 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6