Question

Which Package Should I Use to Predict Model Output Using RNA Seq Data?

0

Entering edit mode

theorist • 0

@theorist-23776

Last seen 3.4 years ago

I'm asking for some general input on how to analyze some data.

I'm trying to figure out how to fit the following statistical model to the RNA seq data in the SummarizedExperiment (SE) object here dataset. Briefly, the SE object contains expression values for individual genes across the different C. elegans lifestages. I want to find the weighted average of these stage specific expression levels that best fits predictions I've made using a model from our lab.

$$ Yg = \suml \betal X{l, g} + \epsilon_g $$

where $X{l,g}$ is the true expression level of gene $g$ in lifestage $l$, $Yg$ is my model prediction, and $\epsilong$ is proportional to the std error in my model prediction. We don't know $X{l,g}$ exactly (of course), instead we have multiple estimates of it from replicates of the RNA seq experiments. That is ${x{l,g,1}, x{l,g,2}, ...}$

Before I started looking into bioconductor, I was planning on using a weighted least squares approach which deals with the imprecision in $Y$ (but not $x_{l,g,i} , but I'm wondering if there's a better way.

Many thanks,

Mike

regression • 557 views

ADD COMMENT • link updated 3.8 years ago by Gordon Smyth 50k • written 3.8 years ago by theorist • 0

score 0 · Answer 1 · 2020-07-05

Example workflows using limma, edgeR or DESeq2 respectively:

limma fits weighted linear models, as suggested in your question. The other two packages fit generalized linear models.