Question: Multi-factorial design for patient-replicates from different time points using duplicate correlation LIMMA Voom
0
2.6 years ago by
Germany
alva.james0 wrote:

Hello All,

I am trying to implement LIMMA room duplicate correlation strategy for my RNA-seq samples using blocking setup. I need to consider following points into account.

1. My samples are biologically paired and dependent also not all patients has its paired sample.

for instance, this is how my design table looks like,

 Patient Condition Time PE01 PH ID PE01 PH REL AE01 Non ID PE05 Non REL

I have 82 samples with more than half of patients has its paired sample and few of them doesn't. According to Limma from user guide, I have adapted to the following workflow. But I have a question,   how limma voom fits into the workflow for a paired experimental design that uses duplicateCorrelation() to handle blocking on the same subject, how is the appropriate experimental design (taking subject blocking into account) fed to voom?

My design looks like,

combined_fac <- factor(paste( design_table$Condition)) design <- model.matrix(~0 + combined_fac) rownames(design)<-colnames(data) colnames(design)<-c("Non","PH") ## Then followed with estimating correlation, blocking is input with design and then the Contract for finding DE genes y <- DGEList(data) y <- calcNormFactors(y) v <- voom(y, design) corfit <- duplicateCorrelation(v, design, block =design_table$Patient

)
v <- voom(y, design, block =design_table$Patient, correlation = corfit$consensus)
fit <- lmFit(v, design, block = design_table$Patient, correlation = corfit$consensus)
cont_matrix <- makeContrasts( PHvsNon = PH-Non,  levels=design)

fit2 <- contrasts.fit(fit, cont_matrix)
fit2 <- eBayes(fit2)
result_double <- topTable(fit2, number = nrow(data), sort.by ="none")

I wanted to make the understanding the workflow is right with for my experiment.? or what is the appropriate way to run voom such that it takes all of the necessary design information into its account?

Thanks

modified 2.6 years ago • written 2.6 years ago by alva.james0
Answer: Multi-factorial design for patient-replicates from different time points using d
2
2.6 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

Time seems to be an important factor, so perhaps you should be doing:

combined_fac <- factor(paste0(design_table$Condition, ".", design_table$Time))


Even if you're only interested in DE between conditions, you need to include time in the design to avoid inflating the variance. It is simple to modify the contrasts to test for DE between conditions in the presence of time:

con <- makeContrasts( (PH.ID + PH.REL)/2 - (Non.ID + Non.REL)/2, levels=design)

In any case, your design matrix doesn't have a Phlike column, so I don't know what that is or where it comes from. I also don't know what model$Block is in the call to duplicateCorrelation, perhaps you mean design_tab$Patient.

Otherwise, the code looks fine.

Thanks Aaron for your time, yes Phlike is PH as it is in my design_table, I just corrected that in my primary question, also model\$block. So yes I need the DE genes between PH and Non. Their data is my count and the column names of samples don't have "." so I don't understand why combined_fac should be defined as you define? Also regarding time, is it because statistically, the variance inflation matters in the analysis? Aslo as you see few patients doesnt have either REL or ID as pairs. In this case is it important  to take the time level into account

Does gene expression change between levels of Time? Presumably whoever designed the experiment thought that it does, otherwise they wouldn't have bothered collecting samples for the two time points. If there is DE between time points, you need to consider the Time factor in your design matrix. Otherwise, DE between time points will manifest as an increase in the estimated variance, which will reduce your power to detect DE between conditions.

Does gene expression change between levels of Time? No, we first started with that (to look into the difference between 2-time levels), but we didn't see any changes between time level but the subtype is driving the difference between samples.

Honestly speaking, I don't see the harm in putting Time into the model. It's only one factor, so you hardly lose any residual d.f., and it does protect you against potential variance inflation. I don't know how you initially concluded that time had no effect; but if you used the same approach as you did in your original post, you obviously won't observe any DE because the variance inflation due to Condition is likely to be substantial.

I don't know how you initially concluded that time had no effect; --We initially started to find the DE genes (using Deseq2) between ID and REL of all samples. And the DE genes were very heterogeneous in their expression pattern. And it was very difficult to call it as distinguish expression pattern between ID and REL. But the between subtype is showing a very distinguish pattern compared to ID versus REL. Yes I can incorporate the Time into model. Thank you

Sorry, I am getting error here, at makeContrasts : Error: unexpected symbol in " makeContrasts( ( combined_facPhlike ."

Why is there a dot by itself? The dot in my post is part of the column name of design.

In my design there is also a dot, as  combined_facPhlike . REL in the design matrix

Ah, you should use paste0 rather than paste when making combined_fac.

Yeah thanks it works :)

Content
Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.