Taking surrogate variable into account before running PCA
1
0
Entering edit mode
wamiqsaifi • 0
@wamiqsaifi-13273
Last seen 6.4 years ago

I ran SVA to get surrogate variables. I want to visualise my data taking into consideration the surrogate variables. I came to know tha modifying the original expression matrix is not a good idea (https://www.biostars.org/p/121489/). How can I go about doing this?

sva svaseq • 2.0k views
ADD COMMENT
3
Entering edit mode
Guido Hooiveld ★ 4.1k
@guido-hooiveld-2020
Last seen 9 hours ago
Wageningen University, Wageningen, the …

In the few cases I need to visualize the 'cleaned' data I use the approaches mentioned in the Biostars thread (AFAIK they are the same). However, if you would like to identify differentially expressed genes, you should take these into account as covariates in your (linear) model rather than to use the 'cleaned' data for that. This is actually also what is stated in that thread...

ADD COMMENT
0
Entering edit mode

Thanks, But do you know the reason why we shouldn't use the cleaned data for differential expression but can use it for visualisations?

ADD REPLY
0
Entering edit mode

As is nicely phrased in e.g. this paper:

"... SVs can be regressed out of the data to obtain “cleaned” data for visualization (as we do in this report), however differential expression statistics should not be performed on this “clean” data, as this too can lead to anti-conservative bias resulting from between-sample correlation being introduced by regressing out the SVs and from inflating variance partitioning related to the effect of interest, as the total variance of the system has been reduced without being taken into account during the linear modeling."

From: Jaffe et al. Practical impacts of genomic data "cleaning" on biological discovery using surrogate variable analysis. BMC Bioinformatics. 2015 Nov 6;16:372. Link

 

See also e.g. these threads batch effect : comBat or blocking in limma ? and Using of limma moderated t-test with "corrected" expression matrix resulting from ComBat batch effect correction.

 

Along the same line:
Nygaard et al. Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses. Biostatistics. 2016 Jan;17(1):29-39. Link

This thread at Biostarts is also an interesting read, and links to the Nygaard paper.
 

 

ADD REPLY
0
Entering edit mode

Thanks! That helps.

ADD REPLY

Login before adding your answer.

Traffic: 729 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6