use Combat to adjust for hidden variables without knowing batch effect

0

Entering edit mode

W. Evan Johnson ▴ 870

@w-evan-johnson-5447

Last seen 6 months ago

United States

Shirley, Michael has given you some good advice--definitely do these things. Also, one other thing to try is to apply SVA, and see if any of the surrogate variables seem to be correlated with your missing variables (maybe you have some idea which samples where collected together or at the same time?). Hope this helps, Evan On Jul 18, 2013, at 4:00 AM, <bioconductor-request at="" r-project.org=""> wrote: > Hi Michael, > > Many thanks for your great suggestions. They are very helpful. > > Best, > Shirley > > On Tue, Jul 16, 2013 at 11:56 PM, Michael Breen > <breenbioinformatics at="" gmail.com=""> wrote: >> Hi Shirley, >> >> It's often not recommended to batch correct without considerable evidence of >> a batch effect. (i.e. date, cohorts etc..) >> >> What is recommended is to proceed with various sorts of quality assessment >> to visualize potential batch effects. For example, we will often produce: >> >> -3D PCA plots wrapping 1, 2, 3, standard deviations around the data points >> -Hierarchical clustering using pearsons correlation >> (for each of these it helps to overlap a color scheme onto the potential >> batches to aid in visualizing) >> -Array to Array distance plots >> >> If you find no evidence of batches then skip the batch adjustment. If exists >> a potential effect, correct with Combat or SCAN and proceed with your >> analysis. >> >> Good luck, >> >> Michael >> >> >> On Mon, Jul 15, 2013 at 6:10 PM, shirley zhang <shirley0818 at="" gmail.com=""> >> wrote: >>> >>> I know if the batch effect is known. We can use Combat to adjust for >>> the batch effect. However, if the batch effect is unknown, could I >>> still use Combat or SVA to adjust for some hidden variables? We know >>> that our blood samples were NOT >>> drawn at the same time from individuals, and RNA were NOT extracted at >>> the same time. >>> >>> Many thanks, >>> Shirley >>

Clustering sva Clustering sva • 1.6k views

ADD COMMENT • link 11.4 years ago W. Evan Johnson ▴ 870

0

Entering edit mode

W. Evan Johnson ▴ 870

@w-evan-johnson-5447

Last seen 6 months ago

United States

Also, Tim Triche (at USC) just pointed out that your data may be PCR data. Is this correct? For PCR data, ComBat should work fine if you have a few hundred genes and there aren't any egregious outliers. However, I think that SVA requires a large number of genes (>1,000 or so)--but I'll let Jeff Leek confirm or refute this! Thanks! Evan On Jul 18, 2013, at 11:19 AM, W. Evan Johnson wrote: > Shirley, > > Michael has given you some good advice--definitely do these things. > > Also, one other thing to try is to apply SVA, and see if any of the surrogate variables seem to be correlated with your missing variables (maybe you have some idea which samples where collected together or at the same time?). > > Hope this helps, > > Evan > > > > On Jul 18, 2013, at 4:00 AM, <bioconductor-request at="" r-project.org=""> > wrote: > >> Hi Michael, >> >> Many thanks for your great suggestions. They are very helpful. >> >> Best, >> Shirley >> >> On Tue, Jul 16, 2013 at 11:56 PM, Michael Breen >> <breenbioinformatics at="" gmail.com=""> wrote: >>> Hi Shirley, >>> >>> It's often not recommended to batch correct without considerable evidence of >>> a batch effect. (i.e. date, cohorts etc..) >>> >>> What is recommended is to proceed with various sorts of quality assessment >>> to visualize potential batch effects. For example, we will often produce: >>> >>> -3D PCA plots wrapping 1, 2, 3, standard deviations around the data points >>> -Hierarchical clustering using pearsons correlation >>> (for each of these it helps to overlap a color scheme onto the potential >>> batches to aid in visualizing) >>> -Array to Array distance plots >>> >>> If you find no evidence of batches then skip the batch adjustment. If exists >>> a potential effect, correct with Combat or SCAN and proceed with your >>> analysis. >>> >>> Good luck, >>> >>> Michael >>> >>> >>> On Mon, Jul 15, 2013 at 6:10 PM, shirley zhang <shirley0818 at="" gmail.com=""> >>> wrote: >>>> >>>> I know if the batch effect is known. We can use Combat to adjust for >>>> the batch effect. However, if the batch effect is unknown, could I >>>> still use Combat or SVA to adjust for some hidden variables? We know >>>> that our blood samples were NOT >>>> drawn at the same time from individuals, and RNA were NOT extracted at >>>> the same time. >>>> >>>> Many thanks, >>>> Shirley >>> >

ADD COMMENT • link 11.4 years ago W. Evan Johnson ▴ 870

0

Entering edit mode

Dear Dr. Johnson, Many thanks for your suggestions. Yes, my data is rtPCR data ((200 genes x 2,000 samples). Besides the number of genes, there are two other difference of my data from microarray data. 1. In my qPCR data, if the expression level of a gene in a sample is > 30 cycle threshold (CT), the value is set to NA. In my expression data matrix (200 genes x 2,000 samples), there are lots of NA values. For example, 10% of genes have NA value in 50% of samples. Different genes might have NA value in different samples. 2. My qPCR data contains negative value. The expression level of each gene is first adjusted by house-keeping gene which is repeatedly run in each plate/run. If the expression level of the gene is higher than that of house-keeping gene, in my data matrix, the value is negative. I really appreciate your input. Shirley On Thu, Jul 18, 2013 at 2:47 PM, Johnson, William Evan <wej at="" bu.edu=""> wrote: > Also, Tim Triche (at USC) just pointed out that your data may be PCR data. Is this correct? For PCR data, ComBat should work fine if you have a few hundred genes and there aren't any egregious outliers. However, I think that SVA requires a large number of genes (>1,000 or so)--but I'll let Jeff Leek confirm or refute this! > > Thanks! > > Evan > > > On Jul 18, 2013, at 11:19 AM, W. Evan Johnson wrote: > >> Shirley, >> >> Michael has given you some good advice--definitely do these things. >> >> Also, one other thing to try is to apply SVA, and see if any of the surrogate variables seem to be correlated with your missing variables (maybe you have some idea which samples where collected together or at the same time?). >> >> Hope this helps, >> >> Evan >> >> >> >> On Jul 18, 2013, at 4:00 AM, <bioconductor-request at="" r-project.org=""> >> wrote: >> >>> Hi Michael, >>> >>> Many thanks for your great suggestions. They are very helpful. >>> >>> Best, >>> Shirley >>> >>> On Tue, Jul 16, 2013 at 11:56 PM, Michael Breen >>> <breenbioinformatics at="" gmail.com=""> wrote: >>>> Hi Shirley, >>>> >>>> It's often not recommended to batch correct without considerable evidence of >>>> a batch effect. (i.e. date, cohorts etc..) >>>> >>>> What is recommended is to proceed with various sorts of quality assessment >>>> to visualize potential batch effects. For example, we will often produce: >>>> >>>> -3D PCA plots wrapping 1, 2, 3, standard deviations around the data points >>>> -Hierarchical clustering using pearsons correlation >>>> (for each of these it helps to overlap a color scheme onto the potential >>>> batches to aid in visualizing) >>>> -Array to Array distance plots >>>> >>>> If you find no evidence of batches then skip the batch adjustment. If exists >>>> a potential effect, correct with Combat or SCAN and proceed with your >>>> analysis. >>>> >>>> Good luck, >>>> >>>> Michael >>>> >>>> >>>> On Mon, Jul 15, 2013 at 6:10 PM, shirley zhang <shirley0818 at="" gmail.com=""> >>>> wrote: >>>>> >>>>> I know if the batch effect is known. We can use Combat to adjust for >>>>> the batch effect. However, if the batch effect is unknown, could I >>>>> still use Combat or SVA to adjust for some hidden variables? We know >>>>> that our blood samples were NOT >>>>> drawn at the same time from individuals, and RNA were NOT extracted at >>>>> the same time. >>>>> >>>>> Many thanks, >>>>> Shirley >>>> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.4 years ago shirley zhang ★ 1.0k

Login before adding your answer.