Search
Question: How to handle mix of samples from the same subject and from different subjects during differential expression analysis of RNA-seq using GLMs?
1
2.7 years ago by
Leif Väremo70
Leif Väremo70 wrote:

Hi, I am performing differential expression analysis of RNA-seq data using GLMs in edgeR.

I have an experimental design with 4 different groups of subjects in a 2k-factorial design (6 different subjects per group). In other words, a control group, a group with condition A, a group with condition B, and a group with both condition A and B. Each subject is also treated with a compound and we have samples at baseline and 3 time-points after treatment.

I am using a model like this: ~condA+condB+condA:condB+treatment

Now to my question: In e.g. condition A there will be samples from different subjects, but also samples from the same subject (4 samples due to the treatment). How should this be properly handled when e.g. identifying differential expression for condition A vs control (or vs condition B)?

I cannot include factors for each subject. I have been suggested to use mixed effects models and use random effects to handle subjects. I am not sure how to implement this for RNA-seq data though.

I have run a separate analysis on only the baseline samples (before treatment) and removing the treatment factor from the model. The factor p-values correlate well with the ones I get from an analysis of all samples including treatment according to the model above, but very few (or no) genes can be called significant. Is the reason that I get significant genes using the complete data due to increased sample size, or is it influenced by the fact that I have replicates within subjects (not really replicates, since they are treated) that the model does not handle?

Any thoughts on how to properly analyze this data, or arguments for that it is ok to use my current strategy, would be very welcome!

Thanks

Leif Väremo

modified 2.7 years ago by Ryan C. Thompson6.8k • written 2.7 years ago by Leif Väremo70
2
2.7 years ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson6.8k wrote:

You can fit a mixed model using voom to prepare your data for limma analysis, and then following the Limma User's Guide, Section 9.7 Multi-level Experiments. This is the solution I'd recommend for your data.

Can one of the limma authors comment on whether anything special needs to be done when using voom with duplicateCorrelation? voom obviously needs to be run first, but after running duplicateCorrelation, should one run voom a second time and pass it the correlation and blocking structure from duplicateCorrelation?

Thanks for your fast suggestion! I had totally missed this section in the limma guide.

I have run voom only once, since the attempt to run a second pass after the duplicateCorrelation returned an error:

Error in lowess(sx, sy, f = span) : 'delta' must be finite and > 0
1: In t(log2(t(counts + 0.5)/(lib.size + 1) * 1e+06)) : NaNs produced
2: Partial NA coefficients for 2512 probe(s)

Running lmFit with the correlation and blocking structure after this worked fine regardless.

You wouldn't run the second voom on the result of the first voom. You'd run it on the original counts a second time, but with passing the correlation and block arguments. Again, though, I'm not sure whether this is necessary.