**0**wrote:

The present analysis is an exercise physiology study with a mixed factorial design. The design is as follows: Time (pre vs. post intervention; repeated measures) x Metformin status (on drug vs off drug; independent subjects) x dietary intervention (control diet vs MUFA enriched vs PUFA enriched; independent subjects). The data under investigation comes from a custom preoteomics aptamer array (104 individual subjects; two timepoints; 857 x 208 expression matrix). The primary contrast of interest is the first one (pre-post). I have set up the analysis using the attached code, but others on the research team find the initial results highly improbable.

I am unclear on how to interpret this and I want to give their concerns due diligence. Any advice would be appreciated. My primary concern is that some aspect of the following code is inappropriate for the stated application due to faulty implementation/violated assumptions.

```
library(dplyr)
library(impute)
library(vsn)
library(limma)
#filter combined proteomics data to exclude MNAR probes where abundances are NA for more than 33.3% of samples
combined<-inner_join(join,bind,by="Order_Key_USE")
na<-apply(combined,2,is.na)
na_index<-apply(na,2,sum)/dim(combined)[1]
combined<-combined[,na_index<.333]
#join additional metadata to samples
add_data<-read.csv("meta.csv",check.names=FALSE,na.strings=".")[,1:10]
add_data$Sex<-as.factor(add_data$Sex)
add_data$Smoker<-as.factor(add_data$Smoker)
final_join<-dplyr::inner_join(add_data,combined,by="StudyID")
write.csv(final_join,file="Intermediate_one.csv")
#set up objects for limma and normalize expression data with vsn
R<-read.csv("Intermediate_one.csv",check.names=FALSE)
abunds<-justvsn(t(R[,-c(1:25)]))
# impute remaining NA assuming they are not MNAR
edata<-impute.knn(abunds,k = 10, rowmax = 1.0, colmax = 1.0, maxp = 1500, rng.seed=362436069)$data
rownames(edata)<-colnames(R[,-c(1:25)])
#set up model matrix for limma and estimate correlation of individual subject samples across repeated measures
pheno_table<-R[,c(2,3,14,18)]
colnames(pheno_table)<-c("StudyID","Treatment","Study","Main")
Treat<-factor(paste(pheno_table$Study,pheno_table$Main,sep="."))
Treat<-factor(paste(Treat,pheno_table$Treatment,sep="."))
levels(Treat)<-list("PUFApre.Metform"=c("PUFA.1.Metform"),
"PUFApost.Metform"=c("PUFA.2.Metform"),
"PUFApre.NoMetform"=c("PUFA.1.NoMetform"),
"PUFApost.NoMetform"=c("PUFAt.2.NoMetform"),
"NoDietpre.Metform"=c("NoDiet.1.Metform"),
"NoDietpost.Metform"=c("NoDiet.2.Metform"),
"NoDietpre.NoMetform"=c("NoDiet.1.NoMetform"),
"NoDietpost.NoMetform"=c("NoDiet.2.NoMetform"),
"MUFApre.Metform"=c("MUFA.1.Metform"),
"MUFApost.Metform"=c("MUFA.2.Metform"),
"MUFApre.NoMetform"=c("MUFA.1.NoMetform"),
"MUFApost.NoMetform"=c("MUFA.2.NoMetform"))
design<-model.matrix(~0+Treat)
colnames(design)<-levels(Treat)
corfit<-duplicateCorrelation(edata,design,block=pheno_table$StudyID)
#Fit linear model for within subjects contrast
fit<-lmFit(edata,design,block=pheno_table$StudyID,correlation=corfit$consensus)
cm <- makeContrasts(
`PUFApost.Metform-PUFApre.Metform` = PUFApost.Metform-PUFApre.Metform,
`NoDietpost.Metform-NoDietpre.Metform` = NoDietpost.Metform-NoDietpre.Metform,
`MUFApost.Metform-MUFApre.Metform` = MUFApost.Metform-MUFApre.Metform,
`PUFApost.NoMetform-PUFApre.NoMetform` = PUFApost.NoMetform-PUFApre.NoMetform,
`NoDietpost.NoMetform-NoDietpre.NoMetform` = NoDietpost.NoMetform-NoDietpre.NoMetform,
`MUFApost.NoMetform-MUFApre.NoMetform` = MUFApost.NoMetform-MUFApre.NoMetform,
levels=design)
#clean up DE table
fit2 <- contrasts.fit(fit, cm)
fit2 <- eBayes(fit2,robust=TRUE)
T<-topTable(fit2,number=100000)
T<-cbind(rownames(T),T)
colnames(T)[1]<-"FeatureID"
```

**37k**• written 6 weeks ago by JMallory •

**0**