I have previously asked this question on Biostars and thought that maybe this is the more approriate forum for it. Apologies in advance for double posting. I saw a warning on another forum that it is poor practise to double post. However, I still haven't received any suggestions in the first forum.
I am running genomic ranges to analyse genomic segment enrichment. The first three columns in my dataset are: chr, start, end, followed by 3 additional metadata columns. All the columns are separated by tabs.
I have successfully run
subsetByOverlaps(cases, controls, type="within", invert="true"). According to here, my output should be genomic segments within my chromosome start and end points, as well as being exclusive to my cases. Conversely, I also ran
subsetByOverlaps(controls, cases, type="within, invert="true") to look for segments exclusive to controls. I then looked for segments that are found in both by removing the invert option. In a certain instance my queryLength was approximately 4000 segments and subject length 200 odd segments. Given the size of my queryLength, if I run
subsetByOverlaps(cases, controls, type="within") I get more than 200 segments in granges object. Am I missing something with respect to the behaviour of the function, since I expected my output to be less than 200 segments assuming that the segments are treated as sets?
The second question is, if I then swap the cases and controls to run subsetByOverlaps(controls, cases, type="within"), how can I combine the data from the 2 runs? Finally, am I correct to assume that combining the two in a dataframe would give me the equivalent of the union of genomic segments found within my cases and controls? If not, is there a way to use Granges to obtain that union without doing it in 2 steps?