Question

some questions about RNAi hit selection

0

Entering edit mode

Rajarshi Guha ▴ 120

@rajarshi-guha-3531

Last seen 9.7 years ago

Hi, I have recently started working with RNAi screening data and have been getting up to speed on the literature. I have a few questions ,which are not directly related to Bioconductor (or R) but I figured that members of the list would probably be able to help out. If there are more appropriate places to post such questions I'dd appreciate pointers. My main question is about hit selection. I'm working with assays in which each gene is targeted by 4 different siRNA's and the plates have no replicates. My understanding is that in this situation, one cannot really use statistical tests to select siRNA's. Instead, one employs threshold approaches (mean, MAD, quartile etc). Is this correct? In such a thresholding approach, is there any way one can provide some sort of significance/score to a selection oh hits? Would it be correct to say that hit selction is simply a first step and one should use other informaiton (GO enrichment, pathway analysis) to further winnow an initial selection of hits? I am also working on a sensititzation screen, where I am trying to identify genes that are differentially knocked down. This problem seems analogous to microarray studies and in that vein, I have been considering the 4 signals (i.e., 4 siRNA's) for each gene, in the two conditions and used a t-test to determine whether there is a difference in the means. What I'm a little confused about is to what extent I need to perform multiple test corrections on the p-values - does the 'multiple' refer to the number of conditions in which the assay is run (drug and no drug) or the number of genes being considered? Thanks, -- Rajarshi Guha [[alternative HTML version deleted]]

• 1.6k views

ADD COMMENT • link updated 16.5 years ago by Ian Sudbery ▴ 50 • written 16.5 years ago by Rajarshi Guha ▴ 120

score 0 · Answer 1 · 2009-07-02

If you test N genes comparing two conditions, you have run N tests, and yeah, adding more conditions adds more tests, but people tend to ignore that. There are two questions on the table regarding multiple test correction, one being "is this effect caused by experimental conditions" and the other being "will people believe my results were caused by experimental conditions, or chalk them up to a random effect"? The fact that you have spent the money to perform a certain screen suggests that you are not ambivalent about the probability that the null hypothesis may be universally true in your test. You have some prior knowledge that predisposes you to think otherwise. Also, you do not, from what I understand, have in your hands enough replicate observations to resolve this matter using data from your original screen. Other people may not view your results the same way, and they may need independent verification of your effect in some other test, at test that would not suffer from the same sort of multiple hypothesis burden as your initial screen. In my own experience, these independent verifications generally pan out, and the issue of P value correction in the initial screen becomes a moot point. I am hoping an independent test of some sort is not out of the question. T On Jul 2, 2009, at 6:13 PM, Rajarshi Guha wrote: > What I'm a little confused about is to what extent I need to perform > multiple test corrections on the p-values - does the 'multiple' > refer to the > number of conditions in which the assay is run (drug and no drug) or > the > number of genes being considered?

score 0 · Answer 2 · 2009-07-03

Hi Rajarshi, I spent a long time thinking about this problem when I did some screening. My problem was slightly different because I had 2 siRNAs for each gene and 2 replicates for each replicate, but still not enough to do traditional stats. The first thing I suggest is that you analysis the data with the Biocondutor package cellHTS2 if you are not already. After performing several rounds of low through-put confirmation experiments I came to the following conclusions: 1) Without more data you cannot really do better than a threshold for selecting hit siRNAs. The only significance you can put on an siRNA in this situation is the rank in the hit list. I have been thinking about some sort of FDR measure that considers the position of the siRNA in relation to the distributions of both the positive and negative control distributions. But I've never really taken it anywhere. 2) The fact that an siRNA is a hit, doesn't mean that a gene is. When I looked at the correlation between the two siRNAs targeting the same gene, I saw that it was pretty much zero, while there was a substantial correlation between replicates. The reasons for this are probably two fold. Firstly different siRNAs have different efficiencies in knowing down the gene. Secondly the different siRNAs have different off-target effects. If you are screening thousands of siRNAs, then those that have off-target effects relevant to your screen will score highly. If there are many of these (which there are likely to be when you are screening 20,000 genes x 4 siRNAs), off-target effects are likely to dominate the top end of your list. You could score genes based of the minimum/mean score for the 4 siRNAs, when I did this (using the minimum of the 2 siRNAs that I had) I found that I had to set my threshold so low that none of my putative hits confirmed. If you do find some that do, you could be finding cases where both siRNA are having off-target effects (because of the massive multiple testing). This might seem unlikely, but I have seen it happen. My conclusions from this are that as you say hit selection is just the first step. You could use other information to winnow the initial selection of hits, but I don't really think that there is any substitute for experimental confirmation of hits using independent siRNAs. Winnowing based on GO/pathway analysis might help you select which hits you wan to confirm. Hope all this waffle helps in some way, Ian --- Rajarshi Guha wrote: > Hi, I have recently started working with RNAi screening data and have been > getting up to speed on the literature. I have a few questions ,which are not > directly related to Bioconductor (or R) but I figured that members of the > list would probably be able to help out. If there are more appropriate > places to post such questions I'dd appreciate pointers. > > My main question is about hit selection. I'm working with assays in which > each gene is targeted by 4 different siRNA's and the plates have no > replicates. My understanding is that in this situation, one cannot really > use statistical tests to select siRNA's. Instead, one employs threshold > approaches (mean, MAD, quartile etc). Is this correct? In such a > thresholding approach, is there any way one can provide some sort of > significance/score to a selection oh hits? > > Would it be correct to say that hit selction is simply a first step and one > should use other informaiton (GO enrichment, pathway analysis) to further > winnow an initial selection of hits? > > I am also working on a sensititzation screen, where I am trying to identify > genes that are differentially knocked down. This problem seems analogous to > microarray studies and in that vein, I have been considering the 4 signals > (i.e., 4 siRNA's) for each gene, in the two conditions and used a t-test to > determine whether there is a difference in the means. > > What I'm a little confused about is to what extent I need to perform > multiple test corrections on the p-values - does the 'multiple' refer to the > number of conditions in which the assay is run (drug and no drug) or the > number of genes being considered? > > Thanks, > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.