Closed:FDR estimation for Biological or Technical Chip-Seq replicates in the context of GRanges
0
0
Entering edit mode
@jurat-shahidin-9488
Last seen 4.5 years ago
Chicago, IL, USA

Hi, everyone:

I have been working on my packages and it is about to close up works except FDR estimations. However, I have started to read & load three replicates (bed file format) in GRanges objects, and I have to consider the case when chosen sample is Biological or Technical respectively ,so this is general workflow that I have implemented in my packages.

in the context of processing three GRanges object for finding co-localization evidence across these sample, and this is the general workflow: 

-> read & load multiple sample (bed format) in GRanges  - > find overlapped regions conditionally in parallel -> filtering function with specific threshold value (a.k.a, count overall overlapped regions in parallel) -> chisq.test() for data that passed from previous step- > based on the combined pvalue, further filtering process with second threshold value (data that passed from previous step) - > final output as GRanges (preserve data who also passed from previous step, but not export them to hard disk) 

first running of my packages are: (a as chosenSample, b,c are supportingSamples):

ov_ab_1 <- as(findOverlaps(a, b), "List")
ov_ac_1 <- as(findOverlaps(a, c), "List")

in second running of my packages, I have to switch parameter (where b as chosenSample, a,c are supportingSample), such as:

ov_ba_2 <- as(findOverlaps(b,a), "List")
ov_bc_2 <- as(findOverlaps(b,c), "List")

in the third running test, I am gonna do like this (where c as chosenSample, a,b are supportingSample)):

ov_ca_3 <- as(findOverlaps(c,a), "List")
ov_cb_3 <- as(findOverlaps(c,b), "List")

However, implementing FDR estimation for a, b, c from first , second, third running test, where each processed sample has three different output :

for example:  a_preserved_first_test, a_preserved_second_test, a_preserved_third_test and same ouput format for b, c respectively

Objective: in the context of Biological replicates, I want to retrieve common regions that both found at least two running test (but how ???), then pass these regions to p.adjust() to get adjusted pvalue, then do further filtering process with third threshold parameter, and generate output for the regions that passed previous step finally .

Question:

In order to do FDR estimation, I need to run my packages three times (if three sample are an input), where I may put result of each test into specific R environment (I am not sure this is right things to do). Is there any possible optimizing approach regarding running my packages three times (any chance to recursively switch to next running test when previous running test is done).?

I am not sure if I create sub-environment where saving the result of each running test. I hope there might be better solution.  Maybe my question is bit of straightforward to you, forgive my naive question if it was. Any possible  approach, suggestion, trivial solution or any recommended bioconductor packages may help out above question, that are highly appreciated. Thank a lot

 

Best regards:
Jurat

fdr granges r chipsq • 216 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 273 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6