Question

Which test fits the best here

0

Entering edit mode

AZ ▴ 30

@fereshteh-15803

Last seen 2.2 years ago

United Kingdom

I have a list of patients in rows and oncogenic signalling pathways in columns of two independent matrixes

One for responders to a drug

one for non-responders to the same drugs

If a patient gets mutation in pathway X we give that 1 otherwise 0

I want to know if pathway X is significantly altered between two groups

I have tried 3 things

wilcox.test(group1$pathwayX, group2$pathwayX)
t.test(group1$pathwayX, group2$pathwayX)
fisher.test(x = matrix(
  c(
    group1_sample_size,
    pathwayX_mutated_samples,
    group2_sample_size,
    pathwayX_mutated_samples
  ),
  nrow = 2
)

)

Basically I have two boolean matrixes for each group

And I am not sure using which statistical test I can say which pathway is significantly altered between two groups

Any help? Thanks

My matrixes look like this

    > head(group1)
                           patients BER CPF CR CS FA HR MMR NER NHEJ OD p53 TLS TM UR DR AM
1 2SKsnsuD9my3Mona.vep.txt_1   0   0  0  0  0  0   0   0    0  0   1   0  0  0  0  0
2 4Pyv3CFxV1xnub78.vep.txt_1   0   0  0  0  0  0   0   0    0  0   0   0  1  0  0  0
3 8X6mBq2k2pJ07trv.vep.txt_1   0   1  0  0  1  1   0   0    0  0   0   0  0  0  0  0
4 aoZMTHJebqIv4XPB.vep.txt_1   0   0  1  0  1  1   0   0    0  0   0   0  1  0  0  0
5 eI178OJnaJgJiChV.vep.txt_1   1   0  0  0  0  0   0   1    1  0   0   0  0  0  0  0
6 iwyHwDFnhwBqHpiY.vep.txt_1   0   0  0  0  1  0   0   0    0  0   1   0  0  0  0  0
> 

set.seed(123)
training.samples <- data$Response %>% 
  createDataPartition(p = 0.8, list = FALSE)
train.data  <- data[training.samples, ]
test.data <- data[-training.samples, ]

model <- glm( Response ~., data = train.data, family = binomial)

fisher wilcox t.test • 926 views

ADD COMMENT • link 4.8 years ago AZ ▴ 30

score 2 · Accepted Answer · 2020-09-10

Hi again,

I would try binary logistic regression, with all variables encoded as binary factors. In my mind, a Wilcoxon or Student's t-test is not appropriate here, due to the fact that the data is just 0 and 1.

Please also keep in mind that your question does not relate to any Bioconductor package.

Kevin

Edit: if you tabulate the data into counts (How many have the 0 condition? vs How many have the 1 condition?), then you could justify a Fisher's or Chi-squared test here, in my opinion. You may have to think about what, exactly, you want to compare. To me, it seems like there will be many pairwise comparisons here.