DESeq2 and ComBat
4
1
Entering edit mode
ribioinfo ▴ 100
@ribioinfo-9434
Last seen 4.3 years ago

Hi, is it possible to remove batch effects with ComBat and then to do a differential analysis with DESeq2? If yes, what are the steps to do?

Thank you.

deseq2 combat • 12k views
ADD COMMENT
1
Entering edit mode
@andrewjskelton73-7074
Last seen 8 months ago
United Kingdom

Don't use ComBat on raw counts, I believe ComBat requires log transformed data anyway. Check out the DESeq2 Users Guide, section 3.12.1 Linear Combinations, to add batch effects in your model design. 

ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 10 hours ago
United States

Here's another link to show how you can use estimated batch effect variables with DESeq2 (here svaseq, but the principle would be the same)

http://www.bioconductor.org/help/workflows/rnaseqGene/#batch

ADD COMMENT
0
Entering edit mode
ribioinfo ▴ 100
@ribioinfo-9434
Last seen 4.3 years ago
Thank you. In that example svaseq is used but If I have two datasets and I know the batches, combat is better than svaseq?
ADD COMMENT
0
Entering edit mode

I can't really give any more specific advice without a more specific description of what your data looks like and what you are trying to do (what biological question do you want to ask, and in what way does batch effect correction enter the picture).

ADD REPLY
0
Entering edit mode

I will try to explain you my experimets:

1) I have a sequencing of some cells in different states of differention: 1, 2, and 5;

2) I have a different sequencing of other cells in the states: 3, 4, and 5;

I want to use DESeq2 and at the moment i have used it to analyze the experiment 1 and 2 separately but i would like to compare the common genes.

I do not know if it is correct to compare the two different analyses directly or if i have to remove the batch effects (with svaseq or ComBat) or if i have to normalize all the experiment together and use the contrast.

Thank you
 

ADD REPLY
0
Entering edit mode

My first question would be how many replicates of each, but the design you've described means that you'd only be able to adequately estimate the batch effect of cells in state 5, as they're shared across experiment (this also requires that they were sequenced with the same machine, prep, chemistry, etc). 

I think your best bet is to do the experiments independently (as you've done so far), then use a non-parametric rank based approach maybe? Either that or simple look at the overlap in what is significantly differentially expressed between the two experiments. 

ADD REPLY
0
Entering edit mode

I have 3 replicates for every condition. Are the FC comparable, between the two experiments, if i choose to compare the overlapping genes?
 

ADD REPLY
0
Entering edit mode

Not comparable directly, but the fact that something is differentially expressed in two separate experiments should tell you something. 

ADD REPLY
0
Entering edit mode

Can you also tell the biological question you want to answer? What comparisons do you want to make?

ADD REPLY
0
Entering edit mode

I want to investigate the role of some genes in the different stages.

Considering the two analysis separately i think that i can only extract the information of what genes are differentially expressed across the two analysis.

If i would to do a differential analysis 1 VS 3 (or other combination of the conditions of the two experiments) can i normalize the table with all conditions and do a contrast on it? 

Should I use svaseq considering that i have only the condition 5 in both experiments?

ADD REPLY
0
Entering edit mode

While it's not the ideal experimental design (better would be to have distributed all states within each library preparation batch in a block design, or even randomized), it is still possible to analyze all the samples together using a design ~batch + state. I assume the colData looks something like this (with replicates in addition):

batch state
    1     1
    1     2
    1     5
    2     3
    2     4
    2     5

Be sure that these columns are factors, not numerics.

What happens when you run DESeq2 with a design of ~batch + state, is that it will use the samples from state 5 to estimate the batch effect. So if you only have a few samples, this can be a very noisy estimate of the batch differences for each gene, but it's the best you can do given you want to make comparisons across batch.

ADD REPLY
0
Entering edit mode

Thank you. Can I also use svaseq or in this case this method is more appropriate?

ADD REPLY
0
Entering edit mode

Hi riccardo,

If you use mike's proposal of including a batch effect coefficient, you don't need to use svaseq anymore.

Bernd

 

ADD REPLY
0
Entering edit mode

Hi riccardo,

you might try to compute surrogate variables (SVs) using the condition 5 samples only. Then you get 3 values of the SVs for data set 1, and 3 value for data set 2. You can then create SVs for the whole data set by simply repeating the values appropriately for the other samples.

This way, you inferred the SV only from a condition that is shared. This way, you could analyse the data jointly, rather than separately.

However, I am not sure whether this idea is really super brilliant. In case Mike, Andrew or others  have comments on that I would love to hear them :)

 

Bernd

 

ADD REPLY
0
Entering edit mode

Hi, how can i choose what is the best method between yours and the method of Mike?

ADD REPLY
1
Entering edit mode

Hi Riccardo,

if you use exactly one SV, my proposal and Mike's approach will likely to be quite similar.

However, Mike's proposal is more robust as well as close to a "textbook" solution, so easy to communicate as well. 

sva uses a quite complex algorithm, so the additionally variability caused by that might hamper the potential advantages.  So I would recommend Mike's proposal.

Bernd

 

ADD REPLY
0
Entering edit mode

I agree with Bernd. They are both probably going to give similar answers, and perhaps doing it with fixed effects (the ~batch + condition approach) sounds simpler and so more palatable to reviewers.

Trying to remove batch effects with only a few samples to rely on is a tough statistical challenge, and I just want to stress for future experiments it would make inference more powerful with full block designs or randomization of conditions across library preparation batches. (Sometimes the data is as it is and this can't be avoided, or it was handed down to the analyst as such, but that's my attempt at a PSA.)

ADD REPLY
0
Entering edit mode
Ok, thank you. But in this case, with only the condition 5 in both batches, the correction is done only for the condition 5 or also the other conditions are corrected?
ADD REPLY
0
Entering edit mode

All conditions are corrected, but the estimation comes from only the condition 5 samples. Few samples => noisier estimates and worse inference.

ADD REPLY
0
Entering edit mode
Bernd Klaus ▴ 610
@bernd-klaus-6281
Last seen 6.1 years ago
Germany

Hi Riccardo,

you could try to apply sva to both datasets together, plot a PCA and see whether you can detect a clustering

by data set.

 

Usually, if there is e.g. a strong dataset specific effect, sva will capture it anyway, even though it works "unsupervised", so it might not be necessary to use Combat.

Simply apply sva and then inspect the computed surrogate variables to see whether they capture a difference bewtween the two data sets. For an example, see the capturing of the cell line effect by the surrogate variables in the RNA-Seq gene workflow:

http://bioconductor.org/help/workflows/rnaseqGene/#batch

and then include the SVs in your usual DE workflow.

As a side note, Combat has the disadvantage that it will regress the batch effect, which might lead to spurious or overoptimistic DE results, as shown by this recent paper by Nygaard et. al.:

http://dx.doi.org/10.1093/biostatistics/kxv027

So I personally would always prefer to include the batch effect in the model, rather than regressing it out beforehand.

Bernd

 

 

ADD COMMENT
0
Entering edit mode

Hi, thank you. SVA could help me in this situation:

1) I have a sequencing of some cells in different states of differention: 1, 2, and 5;

2) I have a different sequencing of other cells in the states: 3, 4, and 5;

I want to use DESeq2 and at the moment i have used it to analyze the experiment 1 and 2 separately but i would like to compare the common genes.

I do not know if it is correct to compare the two different analyses directly or if i have to remove the batch effects (with svaseq or ComBat) or if i have to normalize all the experiment together and use the contrast.

Thank you

ADD REPLY

Login before adding your answer.

Traffic: 482 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6