Hello All,
I am trying to implement LIMMA room duplicate correlation strategy for my RNA-seq samples using blocking setup. I need to consider following points into account.
1. My samples are biologically paired and dependent also not all patients has its paired sample.
for instance, this is how my design table looks like,
Patient | Condition | Time |
PE01 | PH | ID |
PE01 | PH | REL |
AE01 | Non | ID |
PE05 | Non | REL |
I have 82 samples with more than half of patients has its paired sample and few of them doesn't. According to Limma from user guide, I have adapted to the following workflow. But I have a question, how limma voom fits into the workflow for a paired experimental design that uses duplicateCorrelation() to handle blocking on the same subject, how is the appropriate experimental design (taking subject blocking into account) fed to voom?
My design looks like,
combined_fac <- factor(paste( design_table$Condition)) design <- model.matrix(~0 + combined_fac) rownames(design)<-colnames(data) colnames(design)<-c("Non","PH")
## Then followed with estimating correlation, blocking is input with design and then the Contract for finding DE genes
y <- DGEList(data)
y <- calcNormFactors(y)
v <- voom(y, design)
corfit <- duplicateCorrelation(v, design, block =design_table$Patient
)
v <- voom(y, design, block =design_table$Patient, correlation =
corfit$consensus)
fit <- lmFit(v, design, block = design_table$Patient, correlation =
corfit$consensus)
cont_matrix <- makeContrasts( PHvsNon = PH-Non, levels=design)
fit2 <- contrasts.fit(fit, cont_matrix)
fit2 <- eBayes(fit2)
result_double <- topTable(fit2, number = nrow(data), sort.by ="none")
I wanted to make the understanding the workflow is right with for my experiment.? or what is the appropriate way to run voom such that it takes all of the necessary design information into its account?
Thanks
Thanks Aaron for your time, yes Phlike is PH as it is in my design_table, I just corrected that in my primary question, also model$block. So yes I need the DE genes between PH and Non. Their data is my count and the column names of samples don't have "." so I don't understand why combined_fac should be defined as you define? Also regarding time, is it because statistically, the variance inflation matters in the analysis? Aslo as you see few patients doesnt have either REL or ID as pairs. In this case is it important to take the time level into account
Does gene expression change between levels of
Time
? Presumably whoever designed the experiment thought that it does, otherwise they wouldn't have bothered collecting samples for the two time points. If there is DE between time points, you need to consider theTime
factor in your design matrix. Otherwise, DE between time points will manifest as an increase in the estimated variance, which will reduce your power to detect DE between conditions.Does gene expression change between levels of Time? No, we first started with that (to look into the difference between 2-time levels), but we didn't see any changes between time level but the subtype is driving the difference between samples.
Honestly speaking, I don't see the harm in putting
Time
into the model. It's only one factor, so you hardly lose any residual d.f., and it does protect you against potential variance inflation. I don't know how you initially concluded that time had no effect; but if you used the same approach as you did in your original post, you obviously won't observe any DE because the variance inflation due toCondition
is likely to be substantial.I don't know how you initially concluded that time had no effect; --We initially started to find the DE genes (using Deseq2) between ID and REL of all samples. And the DE genes were very heterogeneous in their expression pattern. And it was very difficult to call it as distinguish expression pattern between ID and REL. But the between subtype is showing a very distinguish pattern compared to ID versus REL. Yes I can incorporate the Time into model. Thank you
Sorry, I am getting error here, at makeContrasts : Error: unexpected symbol in " makeContrasts( ( combined_facPhlike ."
Why is there a dot by itself? The dot in my post is part of the column name of
design
.In my design there is also a dot, as combined_facPhlike . REL in the design matrix
Ah, you should use
paste0
rather thanpaste
when makingcombined_fac
.Yeah thanks it works :)