Advice to use block factor and duplicatecorrelation in RNA-seq experiment
1
0
Entering edit mode
jfertaj ▴ 20
@jfertaj-8566
Last seen 5 months ago
United Kingdom

Hi all,

I have data from a RNA-seq experiment where 90 individuals were sequenced. Three biopsies were taken from each patient at different locations, i.e, for the same individual we have three different RNA-seq samples corresponding to three different locations (proximal, distal, rectum). We are interested in see which are the differences between locations.

I thought of using a blocking factor with duplicateCorrelation to account for the fact that each patient has 3 different locations.

An example of the metadata and of the steps I'm thinking of using:

head(targets)
Sample_Name  Location    sex  age  patient_ID
1        2941  Proximal   Male  68          294
2        2942    Distal   Male  68          294
3        2943    Rectum   Male  66          294
4        1331  Proximal Female  24          133
5        1332    Distal Female  24          133
6        1333    Rectum Female  24          133

location <- as.factor(targets$location) sex <- as.factor(targets$sex)
age <- targets$age #dataset is the expression data object design <- model.matrix(0~Location+sex+age) corfit <- duplicateCorrelation(dataset, design, block=targets$patient_ID)

fit <- lmFit(dataset, design, block = targets$patient_ID, correlation = corfit$consensus)
efit <- eBayes(fit, robust=TRUE)


Are these steps appropriate to account for patient effect?

Thanks

limma • 348 views
0
Entering edit mode

Can you please clarify why the ages are so different for different samples from the same patient? For patient 133, for example, is it really true that the distal sample was collected 33 years after the proximal sample? Or am I misinterpretting the age column?

0
Entering edit mode

Sorry Gordon Smyth, it was a problem of copy/paste from the console, now it is fixed. The column age is the patient's age.

1
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

The usual design matrix for this sort of experiment is

mode.matrix(0 ~ Location + patient_ID)


There's no need to adjust for age or sex. There's no gain to be had from duplicateCorrelation unless you don't have all three biopsies from all the patients.

0
Entering edit mode

Thanks Gordon Smyth, I have 3 biopsies for all patients except one, if I don't need to use duplicateCorrelation then it will save me some computing time when I have to analyse methylation array data