I have the following study design:
PatientID Time_Resp R or NR at T1 Timepoint
1 T0_no No T0
1 T1_no No T1
2 T0_yes Yes T0
2 T1_yes Yes T1
3 T0_no No T0
3 T1_no No T1
4 T0_yes Yes T0
4 T1_yes Yes T1
Patients are sampled at two timepoints; before and after treatment with a study drug. I have 2 (T0 T1) x20 patients. Some of them respond (n = 9) to the drug and some do not (n = 11). Are there any differences in genexpression over time in general or between groups over time.
Q1) As I have repeated sampling of the same individuals I consider it would be appropriate to use Limma and Duplicate correlation (in comparisons involving paired samples) - correct?
Q2) We have discussed using Deseq2 as well, but if I understand, the duplicate correlation function is not available there and therefore it is better to use the code above and limma in the entire project (even in comparisons only involving unpaired data such as within T0).
Q3) I have basic skills in bioinformnatics and have followed the limma manual 9.7, thus can you check that my code is correct please, especially regarding paired data and dupl corr?
Q4) Can it be improved to fx exclude potential outliers (if that is not done by default), I have also read in other Bioconductor messages fx https://support.bioconductor.org/p/59700/, that it is recommended to do voom twise (although the manual doesnt say anything about repeated voom). Is that what you recommend and how and where can I incoorporate it correctly in my code?
Q5) Also I have read that the Duplicate correlation function takes time to excecute - that is not my experience and therefore my question regarding the code in Q3 - with my data this step takes less than a minute -- have I done anything wrong when defining block variables etc?
I really appreciate your comments
All the best //Sofie
# load packages and get version numbers using sapply() sapply(c("Rsubread", "limma", "edgeR", "Glimma", "RColorBrewer", "gplots"), library, character.only = TRUE) sapply(c("Rsubread", "limma", "edgeR", "Glimma", "RColorBrewer", "gplots"), packageVersion) #there is a lot of code here between (based on Law 2017 RNAseq is easy as 1-2-3 ...) not relevant to the query #Groupcomparison using duplicate correlation. first design matrix and contrasts: Fourgroup <- factor(targets$Time_Resp, levels = c('T0_yes','T1_yes','T0_no','T1_no')) design <- model.matrix( ~ 0 + Fourgroup) colnames(design) <- gsub("Fourgroup", "", colnames(design)) contr.matrix <- makeContrasts( Responder_T1vsT0 = T1_yes-T0_yes, Nonresponder_T1vsT0 = T1_no-T0_no, DiffbwDiffRespNonresp = (T1_yes-T0_yes)-(T1_no-T0_no), levels=design) # show the number of samples per category colSums(design) #Removing heteroscedascity with voom v <- voom(x, design, plot=TRUE) #x = DGElist #Fitting linear models (lmFit) for comparisons of interest, and statistics using eBayes #Unpaired data - not used in my query but appreciate check code correct? #vfit <- lmFit(v, design) #vfit <- contrasts.fit(vfit, contrasts=contr.matrix) #efit <- eBayes(vfit) #PAIRED DATA IN THE EXAMPLE - check code correct? PatientID <- factor(targets$PatientID) #to be used as block-variable table(PatientID) #just to check 2 datapoints each corfit <- duplicateCorrelation(v, design, block=PatientID) corfit$consensus ## = 0.614, excpected to be positive in study with repeated sampling of the same patient at different time points vfit <- lmFit(v, design, block=PatientID, correlation=corfit$consensus) vfit <- contrasts.fit(vfit, contrasts=contr.matrix) efit <- eBayes(vfit) View(summary(decideTests(efit))) head(efit$coefficients, n = 2) #List all genes and corrsponding Log2FC, P-values, BH adjusted P-values to investigate further C1 <- topTable(efit, coef=1, n=Inf, sort.by = "P") C2 <- topTable(efit, coef=2, n=Inf, sort.by = "P") C3 <- topTable(efit, coef=3, n=Inf, sort.by = "P") #and more code follows fx ranking of lists for GSEA etc