I have data from a RNA-seq experiment where 90 individuals were sequenced. Three biopsies were taken from each patient at different locations, i.e, for the same individual we have three different RNA-seq samples corresponding to three different locations (proximal, distal, rectum). We are interested in see which are the differences between locations.
I thought of using a blocking factor with
duplicateCorrelation to account for the fact that each patient has 3 different locations.
An example of the metadata and of the steps I'm thinking of using:
head(targets) Sample_Name Location sex age patient_ID 1 2941 Proximal Male 68 294 2 2942 Distal Male 68 294 3 2943 Rectum Male 66 294 4 1331 Proximal Female 24 133 5 1332 Distal Female 24 133 6 1333 Rectum Female 24 133 location <- as.factor(targets$location) sex <- as.factor(targets$sex) age <- targets$age #dataset is the expression data object design <- model.matrix(0~Location+sex+age) corfit <- duplicateCorrelation(dataset, design, block=targets$patient_ID) fit <- lmFit(dataset, design, block = targets$patient_ID, correlation = corfit$consensus) efit <- eBayes(fit, robust=TRUE)
Are these steps appropriate to account for patient effect?