Question

Time course experiment with limma voom

0

Entering edit mode

mahes.muniandy • 0

@mahesmuniandy-7955

Last seen 6 days ago

Helsinki

I am working on a medication switch-control comparison (RNA sequence data, human tissue) and the results have confused me somewhat. I wonder if anyone can help me make sense of my findings.

I have data for 30 people (half of them have been on the same medication they have taken for years, half of them have had their medication switched to a new one). For each one of them, I have RNA seq data at baseline and 6 months after medication switch/continued medication. I have performed analysis with limma-voom and find that there is a 30% overlap (significantly different genes) between people who have stayed on the same medication and people who have switched medication. Top 20 genes (sorted by p-value) have 80% overlap – fold changes (LFC more than 1) that are quite similar, in the same direction. The clinician has advised me that the results are not possible to explain in any biological sense - nor with any time effect. We started thinking that there could be plate effect (technical effect) in the sequencing platform and found that the genes that are differentially expressed are also correlating with the plate position (correlations as high as 0.8). I can’t adjust for plate position in my model because my phenotype is also correlating with plate position; both sample type (medication switch, control) and visit (baseline, 6 months). My question is on how to remove this plate effect when I have such a strong correlation to phenotype. I fear I now have data I can’t work with - any advice is appreciated.

Thank you in advance,

Mahes Muniandy (University of Helsinki)

design <- model.matrix(~ 0 + TS +  Age + Sex + Smoking  + BMI )
v <- voom(dge_AT, design)
corfit <- duplicateCorrelation(v,design,block=ID)
v <- voom(dge_AT,design,block= ID,correlation=corfit$consensus)
fit <- lmFit(v,design,block=ID,correlation=corfit$consensus)
cm <- makeContrasts(
          ctrl = TScontrol_V2-TScontrol_V1,
          treat = TStreatment_change _V2- TStreatment_change _V1,
          difference = ((TStreatment_change _V2- TStreatment_change _V1)-( TScontrol_V2-TScontrol_V1
, levels=design)

limma • 856 views

ADD COMMENT • link updated 17 months ago by Gordon Smyth 50k • written 17 months ago by mahes.muniandy • 0

0

Entering edit mode

I've never known plate position to be an important factor in an RNA-seq experiment. What sequencing technology are you using and how does plate position enter into it?

How have you correlated genes with plate position? It isn't obvious to me what you mean by that.

ADD REPLY • link 17 months ago Gordon Smyth 50k

0

Entering edit mode

Illumina Stranded total RNA prep (with ribo depletion) was the platform used. The top genes that were different between time points in both treatment and control were mostly mt-DNA genes. Same fold changes in both conditions. Many other genes also behaved similarly in both conditions. Because we couldn’t find a biological explanation, we wondered if the platform caused a huge technical effect. Initially, we thought that maybe the ribo depletion step did not act on all samples in the same manner. We plotted the mt-genes against the rRNA counts and didn’t see any patterns. Then we plotted the mt-DNA genes against the plate column position (10 columns) and saw a linear relationship – the mt-DNA gene counts increased steadily as the column number progressed from left to right. We then did Spearman correlation to correlate the column number with all genes. Out of 20,000 genes, 4,000 correlated significantly with column position.

Samples on the plate Columns 1-4 : baseline (both people who stayed on the same medication and people who switched medication) Columns 5-8 : 6 months (both people who stayed on the same medication and people who switched medication) Columns 9-10 : people without the disease.

For the top genes, there was steady increase in gene counts from baseline to 6 months to people without the disease. I didn’t plot all the genes to check, just the top 10.

Any ideas or thoughts on how I might approach this is hugely appreciated. Thank you so much for your time.

ADD REPLY • link 17 months ago mahes.muniandy • 0

0

Entering edit mode

I asked you for information about the sequencing technology. You've told me the library preparation protocol but not the sequencing technology. I still don't know how you have sequenced the samples or for what purpose you use a plate with 10 columns. I do not know of any sequencing technology with 10 plate columns.

Regarding the correlations, you would need to adjust read counts for sequencing depth before correlating with plate position. Perhaps you did that, but you don't mention doing so.

Problems ribo depletion would decrease counts sizes and statistical power but would not bias the DE analysis for most genes. Even if ribo depletion was inconsistent between samples, it would not seriously affect a limma analysis because TMM normalization would correct for unequal ribo depletion.

ADD REPLY • link 17 months ago Gordon Smyth 50k

score 0 · Answer 1 · 2022-11-19

Your question relates to a delicate research question that is far beyond what could be addressed in an online forum like this. If you're not a bioinformatics specialist yourself, then I recommend that you find a good statistical bioinformatician to collaborate with to explore whether your analysis questions can be solved. If the analysis can be improved then the bioinformatician would be a co-author of the final paper. If not, then they might be able to recommend how to redesign your experiment in the future.

I am still not clear for what purpose you are using a plate with 10 columns. However, if you have confounded the treatment groups in your experiment with plate position, and plate position has a strong technical effect, then it may not be possible to rescue your experiment.

Having said that, I would personally not analyse a balanced RNA-seq experiment like this using duplicateCorrelation. I would prefer to include patient ID in the linear model and estimate separate treatment effects for cases and active treatments. Then it would not be necessary to adjust for any patient characteristics like age, sex, smoking or BMI. Such an approach would not eliminate a plate effect, if one exists, but it would be more robust to baseline differences between the subjects.