Question

Fisher test in Rna seq

0

Entering edit mode

g.k • 0

@gk-13275

Last seen 5.4 years ago

Hello,

I would like to do rna seq data for several genes, tested in control and treatment.

To do fisher.test I need a contingency table for each gene, is there a way to do this in R instead of computing a contingency table for each gene?
I am new to this so any advice can be helpful.

I have the count data and the sample Info data,

                    control1 treated1 control2 treat2 control3 treat3
    ENSG00000000003        723        486        904        445       1170       1097
    ENSG00000000005          0          0          0          0          0          0
    ENSG00000000419        467        523        616        371        582        781

Where ENSG are the genes

Thank you

rna-seq rnaseq r • 22k views

ADD COMMENT • link updated 4.8 years ago by Gordon Smyth 50k • written 5.4 years ago by g.k • 0

2

Entering edit mode

Hello, it's not clear from your question what it is you are testing with your fisher.test. Are you trying to test enrichment of the transcript in one condition versus another? If so it might be better to use a dedicated package for rna seq such as DESeq2 rather than fisher.test.

I have used fisher.test to test enrichment of candidate gene sets compared to the reference for things such as GO terms or similar classification terms. If you want some advice on setting that up let me know.

ADD REPLY • link 5.4 years ago anna_stavrinides ▴ 20

2

Entering edit mode

I'll echo what Anna said, but with more conviction: you absolutely should not use a fisher.test for this. Use edgeR, limma/voom, or DESeq2.

ADD REPLY • link 5.4 years ago Steve Lianoglou ★ 13k

score 0 · Answer 1 · 2019-07-22

The function nbinomTest in the edgeR package does the Fisher test you suggest. If counts is your matrix, then

Control<- rowSums( counts[,c(1,3,5)] )
Treated <- rowSums( counts[,c(2,4,6)] )
out <- nbinomTest(Control, Treated)

does all the Fisher tests. However, as Steve and Anna have commented, we strongly advise against this because it ignores biological variation and will drastically over-estimate the significance of any differences found.

It would seem from a casual look at the data you give that you actually have paired data whereby each treated sample is paired with a control sample. You should use limma, edgeR or DESeq2 to undertake a paired analysis with proper estimation of replicate to replicate variability.