ComBat: Working with no replicates
2
0
Entering edit mode
@pedro-furio-tari-6207
Last seen 10.4 years ago
Dear all, We have a mix of cell-lines run by 12 different labs (more than 150 samples in total) and we have found a strong batch effect by laboratory that we would like to correct. From those 12, there are 3 labs that are bringing just one cell-line with no replicates at all (1 sample). If we remove the samples from those 3 labs, we are able to run ComBat, but we would like to keep them if possible. Is there any way? If we simulate a "false replicate" just by copying the same expression values it works. Could it be the way to go? Could these results be trustworthy? We also would like to use the different cell-line names as the covariates, but some of them don't have any replicates, so it doesn't work. Is there any way we could also use them as categorical covariates? Right now we are not giving any covariates information. Any help would be much appreciated :) Thanks in advance, Pedro [[alternative HTML version deleted]]
• 1.0k views
ADD COMMENT
0
Entering edit mode
Pekka Kohonen ▴ 190
@pekka-kohonen-5862
Last seen 7.0 years ago
Sweden
Dear Pedro, If you have just one sample from the lab, how do you differentiate between the cell line-specific effect and the lab-specific effect? I don't see how you are trying to do with these 3 samples makes any sense. If you have the same cell lines measured in a different lab (which has enough samples to run ComBat) why not just use those then? Also, I wonder what is the minimum number of samples to estimate a lab-specific distribution (which is what Combat is doing) for each gene? Probably 5-10 samples or so? I think that statistics should not be treated as just a way to hack your data so that it appears to be OK. This sounds a bit like doing that. :-) Best, Pekka P.S. my name in Finnish means "Pedro" 2013/10/28 Pedro Furi? Tar? <pfurio at="" cipf.es="">: > Dear all, > > We have a mix of cell-lines run by 12 different labs (more than 150 samples > in total) and we have found a strong batch effect by laboratory that we > would like to correct. From those 12, there are 3 labs that are bringing > just one cell-line with no replicates at all (1 sample). > > If we remove the samples from those 3 labs, we are able to run ComBat, but > we would like to keep them if possible. Is there any way? If we simulate a > "false replicate" just by copying the same expression values it works. > Could it be the way to go? Could these results be trustworthy? > > We also would like to use the different cell-line names as the covariates, > but some of them don't have any replicates, so it doesn't work. Is there > any way we could also use them as categorical covariates? Right now we are > not giving any covariates information. > > Any help would be much appreciated :) > > Thanks in advance, > > Pedro > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
@pedro-furio-tari-6207
Last seen 10.4 years ago
Pekka Kohonen <pkpekka at="" ...=""> writes: > > Dear Pedro, > > If you have just one sample from the lab, how do you differentiate > between the cell line-specific effect and the lab-specific effect? I > don't see how you are trying to do with these 3 samples makes any > sense. If you have the same cell lines measured in a different lab > (which has enough samples to run ComBat) why not just use those then? > Also, I wonder what is the minimum number of samples to estimate a > lab-specific distribution (which is what Combat is doing) for each > gene? Probably 5-10 samples or so? > > I think that statistics should not be treated as just a way to hack > your data so that it appears to be OK. This sounds a bit like doing > that. > > Best, Pekka > > P.S. my name in Finnish means "Pedro" > > 2013/10/28 Pedro Furi? Tar? <pfurio at="" ...="">: > > Dear all, > > > > We have a mix of cell-lines run by 12 different labs (more than 150 samples > > in total) and we have found a strong batch effect by laboratory that we > > would like to correct. From those 12, there are 3 labs that are bringing > > just one cell-line with no replicates at all (1 sample). > > > > If we remove the samples from those 3 labs, we are able to run ComBat, but > > we would like to keep them if possible. Is there any way? If we simulate a > > "false replicate" just by copying the same expression values it works. > > Could it be the way to go? Could these results be trustworthy? > > > > We also would like to use the different cell-line names as the covariates, > > but some of them don't have any replicates, so it doesn't work. Is there > > any way we could also use them as categorical covariates? Right now we are > > not giving any covariates information. > > > > Any help would be much appreciated :) > > > > Thanks in advance, > > > > Pedro > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at ... > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at ... > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > Dear Pekka, Maybe we did not explain well the problem. We do not want to perform any statistical test on the data after correcting the batch effect, so we do not need to have replicates in all the cell-lines. We would like to perform another kind of analysis for which we need to correct the batch effect. It happens that we have this strong "lab effect" we would like to remove but unfortunately some of the labs only produced 1 sample and it makes ComBat return an error. Perhaps it is not possible to apply ComBat in these situations but we wanted to be sure before using another strategy. Thanks so much for your kind response. Best regards, Pedro
ADD COMMENT
0
Entering edit mode
Dear Pedro, In my previous lab we were also doing cell line profiling in this manner. But similar considerations apply to Combat as to statistical testing in general (e.g. for differential expression). Combat is estimating gene-specific means and variances, but uses the empirical bayesian pooled variance (shrinkage) method. That is why in contrast to other methods Combat works with less than 10 samples: doi: 10.1093/biostatistics/kxj037. Based on my understanding you cannot use Combat to correct the lab-specific effects with only one sample per lab, but cell line should be a covariate so that cell line effects are not "normalized away". And there should really be more than 2 samples per lab as well, preferrably. Best, Pekka 2013/10/29 Pedro Furi? <pfurio at="" cipf.es="">: > Pekka Kohonen <pkpekka at="" ...=""> writes: > >> >> Dear Pedro, >> >> If you have just one sample from the lab, how do you differentiate >> between the cell line-specific effect and the lab-specific effect? I >> don't see how you are trying to do with these 3 samples makes any >> sense. If you have the same cell lines measured in a different lab >> (which has enough samples to run ComBat) why not just use those then? >> Also, I wonder what is the minimum number of samples to estimate a >> lab-specific distribution (which is what Combat is doing) for each >> gene? Probably 5-10 samples or so? >> >> I think that statistics should not be treated as just a way to hack >> your data so that it appears to be OK. This sounds a bit like doing >> that. >> >> Best, Pekka >> >> P.S. my name in Finnish means "Pedro" >> >> 2013/10/28 Pedro Furi? Tar? <pfurio at="" ...="">: >> > Dear all, >> > >> > We have a mix of cell-lines run by 12 different labs (more than 150 samples >> > in total) and we have found a strong batch effect by laboratory that we >> > would like to correct. From those 12, there are 3 labs that are bringing >> > just one cell-line with no replicates at all (1 sample). >> > >> > If we remove the samples from those 3 labs, we are able to run ComBat, but >> > we would like to keep them if possible. Is there any way? If we simulate a >> > "false replicate" just by copying the same expression values it works. >> > Could it be the way to go? Could these results be trustworthy? >> > >> > We also would like to use the different cell-line names as the covariates, >> > but some of them don't have any replicates, so it doesn't work. Is there >> > any way we could also use them as categorical covariates? Right now we are >> > not giving any covariates information. >> > >> > Any help would be much appreciated :) >> > >> > Thanks in advance, >> > >> > Pedro >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at ... >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at ... >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > > Dear Pekka, > > Maybe we did not explain well the problem. We do not want to perform any > statistical test on the data after correcting the batch effect, so we do not > need to have replicates in all the cell-lines. We would like to perform > another kind of analysis for which we need to correct the batch effect. It > happens that we have this strong "lab effect" we would like to remove but > unfortunately some of the labs only produced 1 sample and it makes ComBat > return an error. Perhaps it is not possible to apply ComBat in these > situations but we wanted to be sure before using another strategy. > > Thanks so much for your kind response. > > Best regards, > > Pedro > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 440 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6