RNA-Seq and disease outcome analysis
1
0
Entering edit mode
JonasEnder • 0
@14f64205
Last seen 14 months ago
United States

I am hoping to get some help in thinking about an analysis I want to perform. I have rna-seq count data from tissue from an early life timepoint, and want to find potential associations with a later life outcome, such a disease status. I was recommended to use DESeq2 and do a typical DE analysis. But, I don't think it functionally makes sense to use something like DESeq2 because the model just, to me, wouldn't be temporally accurate as the counts are being modeled as the dependent variable, and the outcome (treatment/condition in the DESeq2 framework) in this case is years after measurement of the rna-seq data.

My thoughts would be to just use a regression model but then you are ignoring things that DESeq2 does like addressing dispersion. And, from the post below, Mike Love suggested not putting counts into a model as an independent variable (with or without normalization), does anyone have any suggestions on the best way to model this problem then? Any suggestions or discussion are extremely appreciated!

This is a very similar question but I'd love help figuring out a more fleshed out solution if possible: Switching RNAseq count data to explanatory variable?

rnaseqAnalysis RNASeqData • 620 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 10 hours ago
United States

As Mike points out in the post you link to, this isn't a conventional thing to do, particularly since you will have tens of thousands of gene expression values that will have variable levels of correlation. Also, measuring a snapshot of gene expression at time X and using that to infer results at time X + Y, where Y might be a long time is probably going to be difficult. I've done some analyses trying to use miRNA expression as a predictor of response to drugs, and it hasn't ever worked well.

Anyway, the conventional way to do this sort of thing is to use penalized regression methods like ridge regression or lasso regression to identify uncorrelated predictors (this is how people generate methylation clock signatures from EWAS data). The go-to package is glmnet, which you can get on CRAN, and which simplifies the process. You will most likely want to convert to logCPM, and you can provide weights to glmnet, so running the data through limma::voom to get both is probably not a horrible idea.

ADD COMMENT
0
Entering edit mode

Thanks for the input, I really appreciate it. And, I do understand and completely agree about the timing issue and predicability of an outcome many years later.

I was hoping I might be able to ask you a question about your suggestion. I assume you are suggesting putting all gene expression values into the regression model at once? I had been thinking about it similarly to a GWAS, where I just would run one at a time, i.e. have like 15k models and then just apply a multiple testing correction. Is there a reason not to at least consider trying this? Thanks for any input or suggestions!

ADD REPLY

Login before adding your answer.

Traffic: 958 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6