Normalization and transformation of RNA-seq raw data for regularized linear regression models
1
0
Entering edit mode
@panagiotismokos-9709
Last seen 7.5 years ago

Dear Bioconductor users,

I am working with RNA-seq data. I want to perform regularized elastic net linear regression modeling using RNA-seq data. First, I have transformed my data using VOOM transformation and then i would use the log-cpm values of the E component of EList output object.The inverse variance weights, after voom, is a numeric matrix and in the glmnet function could be input a vector with size equal to the number of observations.Could I (and how to) use the precision weights at the observation level in glmnet function

Thank you very much for your time in advance!!!

 

Sincerely,

Panagiotis Mokos

rnaseq glmnet limma voom deseq2 • 2.7k views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 4 minutes ago
The city by the bay

If you want to model each gene separately, you'll probably need to supply each row of v$E as the y argument in glmnet, along with the corresponding row of v$weights as weights (assuming v is the output of voom). I'm not completely sure that the precision weights from voom are applicable as the "observation weights" mentioned in ?glmnet; it seems that there's weighted least squares in there somewhere, so it's probably fine. Anyway, this is the standard analysis approach where each gene is considered on its own merits (excepting situations involving empirical Bayes, or when learning latent variables). Of course, summarizing the output across thousands of genes is its own challenge; you should know what statistics are of interest a priori (e.g., percentage of genes where a particular variable is non-zero, the ideal lambda from cross-validation).

The other option is to model all genes together, presumably using family="mgaussian" where each gene is a separate response. In this setting, it doesn't seem possible to supply a weight matrix to weights, so if you want to do this you'll just have to make do without weighting. Note that coercing v$E and v$weights to vectors and stuffing them into glmnet will almost definitely not do what you want - you'd either be assuming that each variable affects all genes in the same manner (which is very unlikely) or you'd have to include gene-specific variables in the design matrix (in which case you might as well just fit each gene separately).

Of course, the bigger question is why you want to do this in the first place. Are you trying to use elastic nets to learn which variables should be put into a model for a DE analysis? This is generally considered to be a Bad idea; automatic model selection methods will happily learn models that provide great predictive power but whose variables cannot be easily interpreted in the context of the questions of interest.

P.S. Split the limma and voom tags in your post, otherwise maintainers won't be notified.

P.P.S. Respond to answers using the "add comment" button, don't add a new answer.

ADD COMMENT
0
Entering edit mode

Dear Aaron Lun,

Thank you very much for your quick and informative reply!

My main problem is how to perform regularized linear cox regression modeling using RNA-seq data as the log-transformed counts do not fulfill the statistical assumptions required for linear modeling (i.e. equal variance). Do you beleive that vst transformation of RNA-seq data using varianceStabilizingTransformation function of DEseq2 package is sufficient for further analysis of RNA-seq data in glmnet package?

I would appreciate hearing your opinion!!

Thank you very much for your time in advance!!

Sincerely,

Panagiotis Mokos

ADD REPLY
0
Entering edit mode

I'm not going to say too much, because we're quickly getting out of the scope of the support site, but it seems like you're trying to use variable selection to identify genes that contribute to the hazard. You could make life easier for yourself by instead using the time-to-event (death, presumably?) as a covariate in the DE analysis. This will identify genes that are associated with time and can be done with standard pipelines.

ADD REPLY
0
Entering edit mode

Dear Aaron,

Thank you for your response!

The main purpose of my analysis is to perform unbiased prediction and feature selection in high-dimensional survival regression applying bootstrap enhanced elastic net cox regression modeling. Although I use RNA-seq data to perform this analysis. Do you beleive that varianceStabilizingTransformation or voom function could better transform my data in order to perform this analysis or could you recommend a better transformation option to me?

Thank you very much for your time in advance!!

Sincerely,

Panagiotis Mokos

ADD REPLY

Login before adding your answer.

Traffic: 304 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6