Search
Question: ABOUT TRANSFORMATION OF RNA-SEQ DATA FOR GLMNET COX SURVIVAL ANALYSIS
0
gravatar for panagiotis.mokos
8 months ago by
panagiotis.mokos10 wrote:

Dear Bioconductor users,

I am working with RNA-seq data (raw counts) and I want to perform regularized cox regression modelling  using glmnet package. First, I have performed VST transformation that makes RNA-seq data homoscedastic. Next do i have to set the argument of glmnet function standardize= TRUE for variable standardization (all variables to have unit variance) prior to fitting the model sequence and then use the resulting unstandardized coefficients to rank the selected features (genes) or in my case the default standardization is not appropriate ?

Thank you for your time in advance!!

Sincerely,

Panagiotis Mokos

ADD COMMENTlink modified 8 months ago by Michael Love15k • written 8 months ago by panagiotis.mokos10
2
gravatar for Michael Love
8 months ago by
Michael Love15k
United States
Michael Love15k wrote:

hi Panagiotis,

The glmnet software is optimized to have unit variance predictors, so I can see how you got to this dilemma.

Scaling (for each gene, across samples) and VST are to some degree at odds. The VST shrinks technical variance so that biological differences are not overwhelmed. And doing so it outperforms simply transformations such as log(x + 1). But then if you force all genes to have unit variance, you undo that effect, increasing technical noise which was just shrunk.

I'd suggest you use the VST, then use a variance filter on the VST data to remove genes with minimal variance (take a look at the meanSdPlot to get a sense of the genes which likely have no biological signal, see vignette), then feed the remaining genes to glmnet with standardize=TRUE.

ADD COMMENTlink written 8 months ago by Michael Love15k

Dear Love,

Thank you very much for your useful information!!

Please, could you explain more about this gene filtering (based on variance) or send me a link (the above-mentioned vignette)?

Also, in your opinion, is it better to prior standardize (unit variance) the VST-filtered data and then input them to glmnet algorithm setting standardize= FALSE? In other words do you believe that the final coefficient sizes (which they will be used to rank the selected features) should reflect the differences of gene variances?

Thank you for your time !!!

Sincerely,

Panagiotis

ADD REPLYlink written 8 months ago by panagiotis.mokos10

The DESeq2 vignette is available by typing into R:

vignette("DESeq2")

You should definitely read this over, particularly the part about transformations. It's the detailed user guide for the software, which has grown over 7 years of DESeq1/2.

(All Bioconductor software is required to have a detailed software vignette.)

I don't really have any extra opinion on the downstream usage beyond my suggestion above. If this seems to confusing or difficult, you could just filter out low counts genes based on some heuristic you define and then use glmnet on log counts.

ADD REPLYlink written 8 months ago by Michael Love15k

Dear Love, 

Thank you very much for your response!!

Panagiotis

ADD REPLYlink written 8 months ago by panagiotis.mokos10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 259 users visited in the last hour