Entering edit mode
Arne.Muller@sanofi-aventis.com
▴
210
@arnemullersanofi-aventiscom-1086
Last seen 10.2 years ago
Hello,
I've a rather general question related to factors that I'd like to use
in linear models (for siRNA design):
I have two nucleotid positions for a gene (say NC1 and NC2), and 'A',
'T', 'C', 'G' are the 4 possible values for each nucleotide. There's a
normal distributed response I measure for some genes, and I'd like to
know which nucleotide type (A,T,C or G) is significant at each
position (NC1 and NC2 and possibly with interactions).
I see two possibilities for coding these factors:
Two factors NC1 and NC2 each with levels A T C and G
or 8 factors with levels 0 or 1 (i.e. boolean):
A1 T1 C1 G1 A2 T2 C2 and G2
Which solution would be most appropriate?
I think about two main differences between the two possibilities:
1. Interpretation of the results. The model with 2 factors and 4
levels might be more difficult to interpret since e.g. a treatment
contrast would be the effect of 3 nucleotide types (T,C or G) relative
to the first nucleotide ('A') whereas the model with 8 factors
directly tells you that having this nucleotide/position (coded by 1)
is significant or not. ANOVA would also be easier to interpret for the
8 factor model.
2. Different degrees of freedom (more factors with less DFs). When
there are many nucleotide positions many factors need to be stimated
by only relatively few measurements (the dependent variable).
Any comments and discussion is appreciated,
kind regards,
Arne