Difference between "be" and "leek" methods when deciding number of surrogate variables to estimate with SVA?
0
4
Entering edit mode
brhead ▴ 40
@brhead-11927
Last seen 6.9 years ago

I am trying to estimate sources of heterogeneity in methylation data in addition to some known sources (i.e., I have batch and age but would also like to correct for smoking, unmeasured technical artifacts, and cellular heterogeneity). When I use num.sv and the default "be" method, I get 12 SVs; when I specify the "leek" method, I get 0 SVs. Is there a reason why the two methods might behave so differently?

I am confused about whether one method is generally recommended over the other, as the SVA vignette shows an example with "leek": https://www.bioconductor.org/packages/devel/bioc/vignettes/sva/inst/doc/sva.pdf

...while the documentation for the SVA command defaults to "be" if a number is not specified and cautions that the "numSVmethod" parameter "... should not be adapted by the user unless they are an expert": https://www.rdocumentation.org/packages/sva/versions/3.20.0/topics/sva

My question is partially answered here:  svaseq: how many and which surrogate variables to pick, and maybe there is not a "best" way to estimate the number of SVs to include. Still, I would like to better understand the differences between the two methods.

Thanks,

Brooke

sva num.sv • 3.8k views
ADD COMMENT
0
Entering edit mode

Hi Brooke,

Did you get your answer? I have the same question. I am working on TCGA breast cancer DNA methylation data. I downloaded the beta values, and then converted into M-values. When I applied "be" method, I got 94 surrogate variables, while using "leek", I got 3. I am not sure which one to choose and do further analysis.

Thanks

Srikant

 

ADD REPLY
0
Entering edit mode

Hi Srikant,

Sorry I didn't see your message earlier! I found this paper to be helpful in deciding what to do:

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0808-5

As the article points out, a large number of surrogate variables may be capturing more than you intend. Plus, including a huge number of covariates in a regression model is not ideal. In your case, I would definitely go for the method that resulted in 3 SVs rather than 94!

-Brooke

ADD REPLY

Login before adding your answer.

Traffic: 625 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6