Question: Difference between "be" and "leek" methods when deciding number of surrogate variables to estimate with SVA?
4
gravatar for brhead
2.5 years ago by
brhead40
brhead40 wrote:

I am trying to estimate sources of heterogeneity in methylation data in addition to some known sources (i.e., I have batch and age but would also like to correct for smoking, unmeasured technical artifacts, and cellular heterogeneity). When I use num.sv and the default "be" method, I get 12 SVs; when I specify the "leek" method, I get 0 SVs. Is there a reason why the two methods might behave so differently?

I am confused about whether one method is generally recommended over the other, as the SVA vignette shows an example with "leek": https://www.bioconductor.org/packages/devel/bioc/vignettes/sva/inst/doc/sva.pdf

...while the documentation for the SVA command defaults to "be" if a number is not specified and cautions that the "numSVmethod" parameter "... should not be adapted by the user unless they are an expert": https://www.rdocumentation.org/packages/sva/versions/3.20.0/topics/sva

My question is partially answered here:  svaseq: how many and which surrogate variables to pick, and maybe there is not a "best" way to estimate the number of SVs to include. Still, I would like to better understand the differences between the two methods.

Thanks,

Brooke

sva num.sv • 1.4k views
ADD COMMENTlink written 2.5 years ago by brhead40

Hi Brooke,

Did you get your answer? I have the same question. I am working on TCGA breast cancer DNA methylation data. I downloaded the beta values, and then converted into M-values. When I applied "be" method, I got 94 surrogate variables, while using "leek", I got 3. I am not sure which one to choose and do further analysis.

Thanks

Srikant

 

ADD REPLYlink written 2.4 years ago by vermasrikant0

Hi Srikant,

Sorry I didn't see your message earlier! I found this paper to be helpful in deciding what to do:

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0808-5

As the article points out, a large number of surrogate variables may be capturing more than you intend. Plus, including a huge number of covariates in a regression model is not ideal. In your case, I would definitely go for the method that resulted in 3 SVs rather than 94!

-Brooke

ADD REPLYlink written 2.2 years ago by brhead40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 141 users visited in the last hour