Question

Compute test-score statistical significance using control-scores from shuffled population

0

Entering edit mode

Bade ▴ 310

@bade-5877

Last seen 3.4 years ago

Delaware

Hi List,

I couldn't get answer to this in other forums so posting here with hopes of help in computing statistical significance of my data. Suppose I have 2 baskets (B1 and B2) on a table each with mix of apple and oranges, and there are 10 such tables (T1 to T10. Now, I have computed the log-odd score of finding apples in B1 at all 10 tables:

    2.95    5.56    6.025    7.225    7.37    7.39    7.54    7.54    6.82    7.295

To generate a control population I (randomly) shuffled fruits between B1 and B2 on every table, keeping the number of fruits in each basket same as above. And again computed log-odd score of finding apples in B1 at all 10 tables:

    scores from shuffled control-1
    0.81    1.25    0.695    0.725    -0.23    -0.25    -0.27    0.2    0.04    0.035
    
    scores from shuffled control-2
    -0.81    0.94    0.855    0.41    0.37    0.755    0.78    0.78    -0.075    0.59

and 3 more shuffled controls, so total 5 different controls with shuffled scores.

How can I compute p-values representing statistical significance of log-odd scores from real (B1) baskets against shuffled (control) baskets, for each table? Could you please suggest test or R-package for for this?

Thanks

Bade

statistics hypotheses R • 1.3k views

ADD COMMENT • link 8.1 years ago Bade ▴ 310

score 0 · Answer 1 · 2016-03-21

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 6 minutes ago

WEHI, Melbourne, Australia

You have not performed enough iterations to compute any useful permutation p-value. See

http://arxiv.org/abs/1603.05766

However there must be more to your problem that your toy example obscures. In your toy example of apples and oranges, you could compute exact p-values immediately using either binomial exact tests or Fisher's exact tests applied to each table, depending on which null hypothesis you want to test. Performing random shuffles serves no purpose.

ADD COMMENT • link 8.1 years ago Gordon Smyth 50k

0

Entering edit mode

@Gordon Smyth: Many thanks for replying and link to your paper. I almost lost hope of getting help on this.

Here baskets (B1 and B2) represent “w” and “c” strands, both independent of each other as far as this study is concerned. In above example we are just concerned about “w” strand. The “apple” and “oranges” represent “double-stranded” (ds) and “single-stranded” (ss) reads respectively. And finally "tables" are bins of specific length (~100nt) in an intergenic region. So, scores in my toy example are generalized log-odd ratio of ds-reads against ss-reads from bin-1 to bin-10 on strand “w”.

Scores from shuffled controls are ds/ss-RNA log odd score from the same bins (1 to 10) and same strand. And these shuffled controls were generated by 1000 iterations of shuffling. I can generate more of these controls if required.

I need a test to compare bin-specific log-odd scores of real-data with those from shuffled-controls and assign a bin-specific p-value of significance. Is it possible and which test would suit best? and is there any R-package available? I would greatly appreciate your help.

I know there are other possible ways to compute p-values for these score like considering all the scores on particular chromosome, and use them for some statistical testing. But, other methods don’t really fit the biological context of problem.

Bade

ADD REPLY • link 8.1 years ago Bade ▴ 310