Hello, Its my first time to deal with Longitudinal analysis in Limma and I would like to make sure that my code reflects my research question. Research question: which gene-expressions are significantly differentially expressed over time with the change of FEV1 (continuous variable) overtime. I have 2 gene expression, FEV1, and other covariants readings for 2 timepoints.
Phenotype data
RID<-factor(rownames(combined_pheno)) #reading ID is different for each reading
ID<-factor(combined_pheno$sid) # each subject has 1 ID and 2 RID
time<-as.numeric(combined_pheno$Years_Phase1_Phase2)
#each subject will have 2 readings with 2 time points, time1 always = 0 and time2= 4-6 depends on the time difference between the 2 readings
#would you recommend to deal with time as factor : 1/2
Batch<-factor(combined_pheno$Lc.Batch)
Batch<-gsub("-", "_", Batch)
Batch<-factor(Batch)
gender<-factor(combined_pheno$gender)
# etc. (other covariants)
Gene Expression data
#First I combined the 2 gene expression files so I can normalize them together
combined_geneExpression <- cbind(geneExpression_time1, geneExpression_time2)
# First I Filtered low expressed genes
cpm_threshold=1
cpm_values <- cpm(combined_geneExpression)
selected_genes <- rownames(combined_geneExpression)[apply(cpm_values >= cpm_threshold, 1, function(x) sum(x) / length(x) >= 0.8)]
combined_geneExpression <- combined_geneExpression[selected_genes, ]
combined_geneExpression<-as.matrix(combined_geneExpression)
# data normalisation, CPM, log2 transformation
combined_geneExpression<- DGEList(counts=combined_geneExpression)
combined_geneExpression<- calcNormFactors(combined_geneExpression_Batch,method = "TMM")
Model Design
#do I have to add interaction term in my design i.e. FEV1:time
design <- model.matrix(~ ID + FEV1 + time + FEV1:time+ Batch +
other_covariants)
#voom tranformation
v <- voom(combined_geneExpression_Batch, design, plot = TRUE) #Coefficients not estimable: gender2 race2 Warning message: Partial NA coefficients for 13015 probe(s)
#fit linear model
fit <- lmFit(v, design)
# Contrasts: FEV1 coefficient for the interaction with Time
cont.matrix <- makeContrasts(Time:FEV1, levels = design)
fit <- contrasts.fit( fit, cont.matrix)
fit<- eBayes(fit)
result <- topTable(fit, coef = 1, n = Inf, adjust = "BH", p.value = 0.05 , sort.by = "logFC")
Please let me know if you have any suggestions, Thank you
Thank Gordon for your comment
Based on these suggestions I made the following changes:
-exclude batch correction before data normalization and include it as a covariant in the model design
-The design will be design <- model.matrix(~ ID + FEV1+ time + FEV1:time + Batch + other_covariants)
-I will exclude duplicateCorrection from the code since ID is added in the main design