Question

expanding factors for lm

0

Entering edit mode

@arnemullersanofi-aventiscom-1086

Last seen 10.2 years ago

Hello, I've a rather general question related to factors that I'd like to use in linear models (for siRNA design): I have two nucleotid positions for a gene (say NC1 and NC2), and 'A', 'T', 'C', 'G' are the 4 possible values for each nucleotide. There's a normal distributed response I measure for some genes, and I'd like to know which nucleotide type (A,T,C or G) is significant at each position (NC1 and NC2 and possibly with interactions). I see two possibilities for coding these factors: Two factors NC1 and NC2 each with levels A T C and G or 8 factors with levels 0 or 1 (i.e. boolean): A1 T1 C1 G1 A2 T2 C2 and G2 Which solution would be most appropriate? I think about two main differences between the two possibilities: 1. Interpretation of the results. The model with 2 factors and 4 levels might be more difficult to interpret since e.g. a treatment contrast would be the effect of 3 nucleotide types (T,C or G) relative to the first nucleotide ('A') whereas the model with 8 factors directly tells you that having this nucleotide/position (coded by 1) is significant or not. ANOVA would also be easier to interpret for the 8 factor model. 2. Different degrees of freedom (more factors with less DFs). When there are many nucleotide positions many factors need to be stimated by only relatively few measurements (the dependent variable). Any comments and discussion is appreciated, kind regards, Arne

• 744 views

ADD COMMENT • link 19.6 years ago Arne.Muller@sanofi-aventis.com ▴ 210