Background I have two drug utilization datasets for two drugs A and B. Each row in the datasets represents a prescription, described by
start.date , and
days.supply. Both datasets have been filtered so they contain only seven-day-long prescriptions (
days.supply = 7 for all rows) and only patients that consumed both drugs. All values of
start.date are positive integers.
Question Using IRanges but without resorting to loops, how could one find the patients who simultaneously consumed both drugs? More precisely, find a no-loop IRanges algorithm that identifies every
patient.id with at least one prescription for A and one prescription for B that mutually overlap by at least one day. Note that two A-prescriptions for the same patient should be considered one longer A-prescription; same goes for B-prescriptions.
Initial code and error message This question seems to me to be a time to use
reduce() and either
intersect(). While it is easy enough to apply these functions to each dataset as a whole, they however must instead be applied patient by patient.
ir.A <- IRanges(start = as.integer(A$start.date), width = as.integer(A$days.supply)) ir.B <- IRanges(start = as.integer(B$start.date), width = as.integer(B$days.supply)) split.A <- split(ir.A, A$patient.id) split.B <- split(ir.B, B$patient.id) red.A <- reduce(ir.A) red.B <- reduce(ir.B) x <- findOverlaps(red.A, red.B) #Warning in View : #'optional' and arguments in '...' are ignored #Error in View : arguments imply differing number of rows: 1, 0, 2, 3, 4, 6
R version 3.2.2 (2015-08-14) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.11 (El Capitan) locale:  en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages:  stats4 parallel stats graphics grDevices utils datasets methods base other attached packages:  XVector_0.10.0 IRanges_2.4.4 S4Vectors_0.8.2 BiocGenerics_0.16.1  data.table_1.9.6 bit64_0.9-5 bit_1.1-12 loaded via a namespace (and not attached):  zlibbioc_1.16.0 tools_3.2.2 chron_2.3-47