Analysis of 2*4 factorial design
1
0
Entering edit mode
Sudipta • 0
@eeaf45c1
Last seen 2 hours ago
Ireland

Hi, first I want to thank you for creating limma package.

I have a 2(Factor A)*4(Factor B) experimental design.

I am trying to ask three questions,

  1. Which genes are different between factor A (main affect, moderated t Statistic and log fold change),
  2. Which genes are different due to factor B (main effect, moderated F Statistic),
  3. Which genes are different between the levels of factor B given factor A (moderated t Statistic and log fold change), Example: (factorB_level2 - factorB_level1) under factorA_level1, etc.

A. Two solve first two questions (1 and 2), I used,

# Classic Interaction model with sum to zero parametrization

## setting contrasts
contrasts(factor_A) <- contr.sum(2)
contrasts(factor_B) <- contr.sum(4)

## creating design matrix
design_2 <- model.matrix(~factor_A*factor_B)

## model fitting
model2 <- lmFit(exprs_data, design_2)
model2_bayes <- eBayes(model2)

## extracting the main effect due to factor A and factor B
top2_A2_vs_A1 <- topTable(model2_bayes, coef = "factor_A1", n=Inf, adjust = "BH")
top2_B2_vs_B1 <- topTable(model2_bayes, coef = c("factor_B1", "factor_B2", "factor_B3"), n=Inf, adjust = "BH")

B. To solve the 3rd question with the above design, it became a bit complex, so I tried the below code (I believe the corresponding coeff of A2_vs_A1, B2_vs_B1_givenA2, etc., can be considered as an estimate of log FC)

# Analysis as a single factor

## creating the single factor
group <- paste(factor_A, factor_B, sep = "_")

## creating the design matrix
design <- model.matrix(~0+group)
colnames(design) <- gsub("group","", colnames(design))

## fitting the model
model <- lmFit(exprs_data, design)

## making the contrast
contrasts_group <- makeContrasts(
  A2_vs_A1 = (A2_B1+A2_B2+A2_B3+A2_B4)/8 - (A1_B1+A1_B2+A1_B3+A1_B4)/8,
  # B = ((A2_B2-A2_B1)+(A2_B3-A2_B1)+(A2_B4-A2_B1)+(A2_B3-A2_B2)+(A2_B4-A2_B3)+(A2_B4-A2_B2)) + ((A1_B2-A1_B1) + (A1_B3-A1_B1)+(A1_B4-A1_B1)+(A1_B3-A1_B2)+(A1_B4-A1_B3)+(A1_B4-A1_B2))/6,
  B2_vs_B1_givenA2 = (A2_B2-A2_B1),
  B3_vs_B1_givenA2 = (A2_B3-A2_B1),
  B4_vs_B1_givenA2 = (A2_B4-A2_B1),
  B3_vs_B2_givenA2 = (A2_B3-A2_B2),
  B4_vs_B3_givenA2 = (A2_B4-A2_B3),
  B4_vs_B2_givenA2 = (A2_B4-A2_B2),
  B2_vs_B1_givenA1 = (A1_B2-A1_B1),
  B3_vs_B1_givenA1 = (A1_B3-A1_B1),
  B4_vs_B1_givenA1 = (A1_B4-A1_B1),
  B3_vs_B2_givenA1 = (A1_B3-A1_B2),
  B4_vs_B3_givenA1 = (A1_B4-A1_B3),
  B4_vs_B2_givenA1 = (A1_B4-A1_B2),
  levels = colnames(design)
)

## fitting the model
model_contr <-  contrasts.fit(model, contrasts_group)
model_contr_bayes <- eBayes(model_contr)


## The main effect of factor A
top_A2_vs_A1 <- topTable(model_contr_bayes, coef = "A2_vs_A1", n=Inf, adjust = "BH")

## ~ factorB | factorA 
top_B2_vs_B1_givenA2 <- topTable(model_contr_bayes, coef = "B2_vs_B1_givenA2", n=Inf, adjust = "BH")
### same for others.

Here is how I created the data,

# fake data generation
set.seed(123) 

# Define parameters
num_genes <- 1000 # Number of genes
num_replicates <- 5 # Number of replicates per group

# Treatments/factors
factor_A <- factor(rep(c("A1", "A2"), each = num_replicates * 4))
factor_B <- factor(rep(c("B1", "B2", "B3", "B4"), times = num_replicates * 2))

# dummy data frame to hold the expression data
exprs_data <- matrix(nrow = num_genes, ncol = num_replicates * 8)

# Filling the matrix with random expression values
for (i in 1:(num_replicates * 8)) {
  exprs_data[, i] <- rnorm(num_genes, mean = sample(5:15, 1), sd = 2)
}

# Converting the matrix to data frame and naming the columns appropriately
colnames(exprs_data) <- paste(factor_A, factor_B, rep(1:num_replicates, times = 8), sep = "_")
exprs_data <- as.data.frame(exprs_data)

# Adding row names to represent gene IDs
rownames(exprs_data) <- paste0("Gene", 1:num_genes)

I am very new to limma trying to migrate from anova and posthoc based concept to linear modeling. I am wondering if I am doing it correct, also, for the second approach I am not sure, how to get the main effect of factor B. My real data is little more complex than this, it doesn't have values for one of the combination.

Thank you in advance.

limma • 46 views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

This is covered by Section 9.5 of the limma User's Guide.

The analysis with a single-factor is much better. Your definition of A2 vs A1 main (average) effect would be correct if you divided by 4 instead of by 8.

ADD COMMENT

Login before adding your answer.

Traffic: 766 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6