How to handle mix of samples from the same subject and from different subjects during differential expression analysis of RNA-seq using GLMs?
1
1
Entering edit mode
Leif Väremo ▴ 70
@leif-varemo-5898
Last seen 8.4 years ago

Hi, I am performing differential expression analysis of RNA-seq data using GLMs in edgeR.

I have an experimental design with 4 different groups of subjects in a 2k-factorial design (6 different subjects per group). In other words, a control group, a group with condition A, a group with condition B, and a group with both condition A and B. Each subject is also treated with a compound and we have samples at baseline and 3 time-points after treatment.

I am using a model like this: ~condA+condB+condA:condB+treatment

Now to my question: In e.g. condition A there will be samples from different subjects, but also samples from the same subject (4 samples due to the treatment). How should this be properly handled when e.g. identifying differential expression for condition A vs control (or vs condition B)?

I cannot include factors for each subject. I have been suggested to use mixed effects models and use random effects to handle subjects. I am not sure how to implement this for RNA-seq data though.

I have run a separate analysis on only the baseline samples (before treatment) and removing the treatment factor from the model. The factor p-values correlate well with the ones I get from an analysis of all samples including treatment according to the model above, but very few (or no) genes can be called significant. Is the reason that I get significant genes using the complete data due to increased sample size, or is it influenced by the fact that I have replicates within subjects (not really replicates, since they are treated) that the model does not handle?

Any thoughts on how to properly analyze this data, or arguments for that it is ok to use my current strategy, would be very welcome!

Thanks

Leif Väremo

rnaseq rna-seq differential gene expression differential expression edger • 2.1k views
ADD COMMENT
2
Entering edit mode
@ryan-c-thompson-5618
Last seen 8 months ago
Scripps Research, La Jolla, CA

You can fit a mixed model using voom to prepare your data for limma analysis, and then following the Limma User's Guide, Section 9.7 Multi-level Experiments. This is the solution I'd recommend for your data.

ADD COMMENT
0
Entering edit mode

Can one of the limma authors comment on whether anything special needs to be done when using voom with duplicateCorrelation? voom obviously needs to be run first, but after running duplicateCorrelation, should one run voom a second time and pass it the correlation and blocking structure from duplicateCorrelation?

ADD REPLY
0
Entering edit mode

Thanks for your fast suggestion! I had totally missed this section in the limma guide.

I have run voom only once, since the attempt to run a second pass after the duplicateCorrelation returned an error:

Error in lowess(sx, sy, f = span) : 'delta' must be finite and > 0
In addition: Warning messages:
1: In t(log2(t(counts + 0.5)/(lib.size + 1) * 1e+06)) : NaNs produced
2: Partial NA coefficients for 2512 probe(s)

Running lmFit with the correlation and blocking structure after this worked fine regardless.

ADD REPLY
0
Entering edit mode

You wouldn't run the second voom on the result of the first voom. You'd run it on the original counts a second time, but with passing the correlation and block arguments. Again, though, I'm not sure whether this is necessary.

ADD REPLY
0
Entering edit mode

Ah, yes of course! For reference, I tried both alternatives (single and double voom) and the results (adjusted p-values of the coefficients) are very consistent (Pearson correlation >0.99).

ADD REPLY

Login before adding your answer.

Traffic: 643 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6