Question: Taking surrogate variable into account before running PCA
gravatar for wamiqsaifi
5 months ago by
wamiqsaifi0 wrote:

I ran SVA to get surrogate variables. I want to visualise my data taking into consideration the surrogate variables. I came to know tha modifying the original expression matrix is not a good idea ( How can I go about doing this?

ADD COMMENTlink modified 5 months ago by Guido Hooiveld2.1k • written 5 months ago by wamiqsaifi0
gravatar for Guido Hooiveld
5 months ago by
Guido Hooiveld2.1k
Wageningen University, Wageningen, the Netherlands
Guido Hooiveld2.1k wrote:

In the few cases I need to visualize the 'cleaned' data I use the approaches mentioned in the Biostars thread (AFAIK they are the same). However, if you would like to identify differentially expressed genes, you should take these into account as covariates in your (linear) model rather than to use the 'cleaned' data for that. This is actually also what is stated in that thread...

ADD COMMENTlink written 5 months ago by Guido Hooiveld2.1k

Thanks, But do you know the reason why we shouldn't use the cleaned data for differential expression but can use it for visualisations?

ADD REPLYlink modified 5 months ago • written 5 months ago by wamiqsaifi0

As is nicely phrased in e.g. this paper:

"... SVs can be regressed out of the data to obtain “cleaned” data for visualization (as we do in this report), however differential expression statistics should not be performed on this “clean” data, as this too can lead to anti-conservative bias resulting from between-sample correlation being introduced by regressing out the SVs and from inflating variance partitioning related to the effect of interest, as the total variance of the system has been reduced without being taken into account during the linear modeling."

From: Jaffe et al. Practical impacts of genomic data "cleaning" on biological discovery using surrogate variable analysis. BMC Bioinformatics. 2015 Nov 6;16:372. Link


See also e.g. these threads batch effect : comBat or blocking in limma ? and Using of limma moderated t-test with "corrected" expression matrix resulting from ComBat batch effect correction.


Along the same line:
Nygaard et al. Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses. Biostatistics. 2016 Jan;17(1):29-39. Link

This thread at Biostarts is also an interesting read, and links to the Nygaard paper.


ADD REPLYlink written 5 months ago by Guido Hooiveld2.1k

Thanks! That helps.

ADD REPLYlink written 5 months ago by wamiqsaifi0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 282 users visited in the last hour