Dear all,
I am pretty much a novice so any advice would be appreciated. After having read through the manuals of these three packages I actually got more confused about the commonalities and differences. I am trying to select the "right" method to analyze data with the following structure. to give an example of the data, i just wrote a quick table with some numbers (0-100). I actually have two biological replicates per cell in such a table but for simplicity i put one value.
treatment A (in time) | none | 1 hour | 6 hour | none | 1 hour | 6 hour | none | 1 hour | 6 hour |
treatment drug | a | a | a | b | b | b | c | c | c |
protein A | 10 | 95 | 15 | 10 | 15 | 15 | 10 | 95 | 65 |
protein B | 80 | 80 | 15 | 80 | 78 | 79 | 85 | 83 | 85 |
In this toy example protein A goes up after 1 hour and down after 6 hours under drug a (let us assume that this is the "normal pattern").; protein A doesn't go up under drug b; protein A goes up and stays high under drug 6 at 6 hours.Meanwhile protein B goes down only at six hours under drug a (let us assume that no change under drug b or c is the "normal" pattern).
My biological question is to find the drugs (a or b or c) for which the pattern is not the "normal" pattern. So for protein A I would "get" drugs b and c , and for protein B i would "get" drug a. In practice i have many more drugs and would be looking to test for which drugs (and which proteins) there is an interaction between treatment A and drug treatment.
As far as i can understand, all three packages can be used for such factorial designs, but I am not sure which to use.
Your suggestions are welcome.
Kind regards, AB
Thank you for your comments. Indeed i forgot to say it explicitly. For protein A, under treatment A only , I would note something like 10, 95 , 15. So meaning that drug a has no effect on protein A. I therefore am looking to formulate correctly the model such that drug c in 6 hours would be the significant result for protein A, and drug a at 6 hours for protein B. If no treatment is applied then protein A would remain A always and protein B would remain 80 always. I hope this answers.
This is not precise enough. Ignore all other drugs besides drug A. What exactly is your null hypothesis for this drug? From what you're writing, the null hypothesis is that after 1 hour, you expect to get a 9.5-fold increase in expression compared to the zero time point, and after 6 hours, this drops to an expected 1.5-fold increase in expression. Is this correct? Just saying "some increase after 1 hour followed by a drop at 6 hours" is too vague to construct a hypothesis test. For example, does the expression at 6 hours return to the expression at time zero under the null?
Besides, you say that "drug A has no effect on protein A". But, I would say a near 10-fold increase in expression of protein A after 1 hour of treatment with drug A is, in fact, a pretty strong effect. Why is this not interesting?
Thank you. I shall clarify: The null hypothesis is that under treatment A alone, I would get the values 10,95,15. Therefore, drug (a) combined with treatment A, shows the same pattern. It is not "interesting" to me because i am not looking for protein A, but for the interactions: drug c, in which there is a deviation at 6h: 10, 95, 65 (rather than 10,95,15), or drug b (deviates at 1h, so 10,15,15 instead of 10,95,15).
Edited: Where are you getting values of 10, 95 and 15 from? You can't define the null hypothesis after you look at the data, you need to define it beforehand. In other words, a separate piece of data must be used to get these numbers - is this the case? And are these numbers the same for all proteins (I would find this hard to imagine)?
In any case, it is unwise to frame a null hypothesis in terms of absolute numbers. What happens if, upon treatment with drug A, gene X exhibits an expression pattern of 20, 190 and 30? Is this interesting or not?