Question

RNA-Seq and disease outcome analysis

0

Entering edit mode

JonasEnder • 0

@14f64205

Last seen 14 months ago

United States

I am hoping to get some help in thinking about an analysis I want to perform. I have rna-seq count data from tissue from an early life timepoint, and want to find potential associations with a later life outcome, such a disease status. I was recommended to use DESeq2 and do a typical DE analysis. But, I don't think it functionally makes sense to use something like DESeq2 because the model just, to me, wouldn't be temporally accurate as the counts are being modeled as the dependent variable, and the outcome (treatment/condition in the DESeq2 framework) in this case is years after measurement of the rna-seq data.

My thoughts would be to just use a regression model but then you are ignoring things that DESeq2 does like addressing dispersion. And, from the post below, Mike Love suggested not putting counts into a model as an independent variable (with or without normalization), does anyone have any suggestions on the best way to model this problem then? Any suggestions or discussion are extremely appreciated!

This is a very similar question but I'd love help figuring out a more fleshed out solution if possible: Switching RNAseq count data to explanatory variable?

rnaseqAnalysis RNASeqData • 620 views

ADD COMMENT • link 14 months ago JonasEnder • 0

score 0 · Answer 1 · 2023-02-24

As Mike points out in the post you link to, this isn't a conventional thing to do, particularly since you will have tens of thousands of gene expression values that will have variable levels of correlation. Also, measuring a snapshot of gene expression at time X and using that to infer results at time X + Y, where Y might be a long time is probably going to be difficult. I've done some analyses trying to use miRNA expression as a predictor of response to drugs, and it hasn't ever worked well.

Anyway, the conventional way to do this sort of thing is to use penalized regression methods like ridge regression or lasso regression to identify uncorrelated predictors (this is how people generate methylation clock signatures from EWAS data). The go-to package is glmnet, which you can get on CRAN, and which simplifies the process. You will most likely want to convert to logCPM, and you can provide weights to glmnet, so running the data through limma::voom to get both is probably not a horrible idea.