Delta CT data distribution and cluster analyses; machine learning or other
2
0
Entering edit mode
john herbert ▴ 560
@john-herbert-4612
Last seen 10.2 years ago
Dear Bioconductors, I have a bunch of DeltaCT values for several tissues. If I boxplot the data, it looks very similar to microarray data, a lot of congestion around zero. Likewise, if I log2 the data, as in microarray, the distributions looks close to normal and like microarray data. Please see the image here for different plots; https://docs.google.com/leaf?id=0B9IUGsKecS4GNDc0OWVlNzEtZjE5Yi00Y2Q4L WI0M2MtMGFiNzZhMDU0YTFm&hl=en My question is data manipulation in this manner OK for this type of data and will it effect/invalidate any unsupervised machine learning/clustering? Can I quantile normalise the data and still do valid clustering? [[alternative HTML version deleted]]
Microarray Microarray • 3.0k views
ADD COMMENT
0
Entering edit mode
@richard-friedman-513
Last seen 10.2 years ago
Dear John, Is the Delta CT data from PCR or from some other method? If it is from PCR in my experience Delta Delta CT is usually normally distributed. were the first delta references to the difference between the experiment and internal reference (e.g. GAPDH) and the second delta refers to 2 experimental conditions. With hopes that the above helps, Rich ------------------------------------------------------------ Richard A. Friedman, PhD Associate Research Scientist, Biomedical Informatics Shared Resource Herbert Irving Comprehensive Cancer Center (HICCC) Lecturer, Department of Biomedical Informatics (DBMI) Educational Coordinator, Center for Computational Biology and Bioinformatics (C2B2)/ National Center for Multiscale Analysis of Genomic Networks (MAGNet) Room 824 Irving Cancer Research Center Columbia University 1130 St. Nicholas Ave New York, NY 10032 (212)851-4765 (voice) friedman at cancercenter.columbia.edu http://cancercenter.columbia.edu/~friedman/ I am a Bayesian. When I see a multiple-choice question on a test and I don't know the answer I say "eeney-meaney-miney-moe". Rose Friedman, Age 14 On May 13, 2011, at 10:46 AM, john herbert wrote: > Dear Bioconductors, > I have a bunch of DeltaCT values for several tissues. If I boxplot > the data, > it looks very similar to microarray data, a lot of congestion around > zero. > > Likewise, if I log2 the data, as in microarray, the distributions > looks > close to normal and like microarray data. > > Please see the image here for different plots; > https://docs.google.com/leaf?id=0B9IUGsKecS4GNDc0OWVlNzEtZjE5Yi00Y2Q 4LWI0M2MtMGFiNzZhMDU0YTFm&hl=en > > My question is data manipulation in this manner OK for this type of > data and > will it effect/invalidate any unsupervised machine learning/ > clustering? > > Can I quantile normalise the data and still do valid clustering? > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Hi Richard, Thank you. It is from taqman real time PCR. I have sent a mail asking how exactly they normalised the data. We only have biological replicates and no common reference, so I was told we can only use Delta CT values. I make, maybe wrongly, that is Delta Delta CT values are normally distributed that Delta CT values will also be normally distributed? I will make plots of the raw data and Delta CT as I know it. On Fri, May 13, 2011 at 3:53 PM, Richard Friedman < friedman@cancercenter.columbia.edu> wrote: > Dear John, > > Is the Delta CT data from PCR or from some other method? > If it is from PCR in my experience Delta Delta CT is usually normally > distributed. > were the first delta references to the difference between the experiment > and internal reference > (e.g. GAPDH) and the second delta refers to 2 experimental conditions. > > With hopes that the above helps, > Rich > ------------------------------------------------------------ > Richard A. Friedman, PhD > Associate Research Scientist, > Biomedical Informatics Shared Resource > Herbert Irving Comprehensive Cancer Center (HICCC) > Lecturer, > Department of Biomedical Informatics (DBMI) > Educational Coordinator, > Center for Computational Biology and Bioinformatics (C2B2)/ > National Center for Multiscale Analysis of Genomic Networks (MAGNet) > Room 824 > Irving Cancer Research Center > Columbia University > 1130 St. Nicholas Ave > New York, NY 10032 > (212)851-4765 (voice) > friedman@cancercenter.columbia.edu > http://cancercenter.columbia.edu/~friedman/ > > I am a Bayesian. When I see a multiple-choice question on a test and I > don't > know the answer I say "eeney-meaney-miney-moe". > > Rose Friedman, Age 14 > > > > > > > > > On May 13, 2011, at 10:46 AM, john herbert wrote: > > Dear Bioconductors, >> I have a bunch of DeltaCT values for several tissues. If I boxplot the >> data, >> it looks very similar to microarray data, a lot of congestion around zero. >> >> Likewise, if I log2 the data, as in microarray, the distributions looks >> close to normal and like microarray data. >> >> Please see the image here for different plots; >> >> https://docs.google.com/leaf?id=0B9IUGsKecS4GNDc0OWVlNzEtZjE5Yi00Y2 Q4LWI0M2MtMGFiNzZhMDU0YTFm&hl=en >> >> My question is data manipulation in this manner OK for this type of data >> and >> will it effect/invalidate any unsupervised machine learning/clustering? >> >> Can I quantile normalise the data and still do valid clustering? >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
What is the range of the data that you received? In most TaqMan real-time PCR experiments, the Ct values range between about 10 (for really really abuindant things like 18S) to 40. These measurements are in cycles. In principle, if you had perfectly efficient probe-primer combination, the number of mRNA molecules present would double every cycle. As a result, cycle values are already essentially on the "negative log base two" scale. As Richard already pointed out, the Delta-Ct or Delta-Delta-Ct values on this scale are usually normal. If your data are not in a range that makes sense as cycles, then it is likely that someone exponentiated the data to get it back to the "raw" scale, and thus converted from normally distributed to log-normal. Kevin > Hi Richard, > Thank you. It is from taqman real time PCR. I have sent a mail asking how > exactly they normalised the data. > We only have biological replicates and no common reference, so I was told we > can only use Delta CT values. > > I make, maybe wrongly, that is Delta Delta CT values are normally > distributed that Delta CT values will also be normally distributed? > > I will make plots of the raw data and Delta CT as I know it. > > > > > > On Fri, May 13, 2011 at 3:53 PM, Richard Friedman< > friedman at cancercenter.columbia.edu> wrote: > >> Dear John, >> >> Is the Delta CT data from PCR or from some other method? >> If it is from PCR in my experience Delta Delta CT is usually normally >> distributed. >> were the first delta references to the difference between the experiment >> and internal reference >> (e.g. GAPDH) and the second delta refers to 2 experimental conditions. >> >> With hopes that the above helps, >> Rich >> ------------------------------------------------------------ >> Richard A. Friedman, PhD >> Associate Research Scientist, >> Biomedical Informatics Shared Resource >> Herbert Irving Comprehensive Cancer Center (HICCC) >> Lecturer, >> Department of Biomedical Informatics (DBMI) >> Educational Coordinator, >> Center for Computational Biology and Bioinformatics (C2B2)/ >> National Center for Multiscale Analysis of Genomic Networks (MAGNet) >> Room 824 >> Irving Cancer Research Center >> Columbia University >> 1130 St. Nicholas Ave >> New York, NY 10032 >> (212)851-4765 (voice) >> friedman at cancercenter.columbia.edu >> http://cancercenter.columbia.edu/~friedman/ >> >> I am a Bayesian. When I see a multiple-choice question on a test and I >> don't >> know the answer I say "eeney-meaney-miney-moe". >> >> Rose Friedman, Age 14 >> >> >> >> >> >> >> >> >> On May 13, 2011, at 10:46 AM, john herbert wrote: >> >> Dear Bioconductors, >>> I have a bunch of DeltaCT values for several tissues. If I boxplot the >>> data, >>> it looks very similar to microarray data, a lot of congestion around zero. >>> >>> Likewise, if I log2 the data, as in microarray, the distributions looks >>> close to normal and like microarray data. >>> >>> Please see the image here for different plots; >>> >>> https://docs.google.com/leaf?id=0B9IUGsKecS4GNDc0OWVlNzEtZjE5Yi00Y 2Q4LWI0M2MtMGFiNzZhMDU0YTFm&hl=en >>> >>> My question is data manipulation in this manner OK for this type of data >>> and >>> will it effect/invalidate any unsupervised machine learning/clustering? >>> >>> Can I quantile normalise the data and still do valid clustering? >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
The range of Raw CT values is around 15 to 35 The 2^-deltaCT are very small, less than zero. An example is 0.079703285 I have 5 case samples and 5 control samples. For all samples, there are CT measures for target genes and house-keeper genes. Our approach is to use houskeeper on each sample as that used in Delta CT calculation. E.g. Sample Case 1 target CT = 15 Sample Case 1 house keeper CT = 10 Delta CT = 15-10 = 5 A = 2 to the power of minus delta CT, as in Excel =power(2,-(-5)) = 0.03125 Then normal sample is the same.... Sample normal 1 target CT = 10 Sample normal 1 house keeper CT = 4 Delta CT = 10-4 = 6 2 to the power of minus delta CT, as in Excel =power(2,-(-6)) = 0.015625 I have lots of these small values. These values don't look normally distributed. My view is maybe I should make an M value (log2 ratios) do ttests etc. Is this the best way to go for gene expression and subsequent clustering?. Thank you. On Fri, May 13, 2011 at 9:06 PM, Kevin R. Coombes <kevin.r.coombes@gmail.com> wrote: > What is the range of the data that you received? > > In most TaqMan real-time PCR experiments, the Ct values range between about > 10 (for really really abuindant things like 18S) to 40. These measurements > are in cycles. In principle, if you had perfectly efficient probe- primer > combination, the number of mRNA molecules present would double every cycle. > As a result, cycle values are already essentially on the "negative log base > two" scale. > > As Richard already pointed out, the Delta-Ct or Delta-Delta-Ct values on > this scale are usually normal. > > If your data are not in a range that makes sense as cycles, then it is > likely that someone exponentiated the data to get it back to the "raw" > scale, and thus converted from normally distributed to log-normal. > > Kevin > > > > Hi Richard, >> Thank you. It is from taqman real time PCR. I have sent a mail asking how >> exactly they normalised the data. >> We only have biological replicates and no common reference, so I was told >> we >> can only use Delta CT values. >> >> I make, maybe wrongly, that is Delta Delta CT values are normally >> distributed that Delta CT values will also be normally distributed? >> >> I will make plots of the raw data and Delta CT as I know it. >> >> >> >> >> >> On Fri, May 13, 2011 at 3:53 PM, Richard Friedman< >> friedman@cancercenter.columbia.edu> wrote: >> >> Dear John, >>> >>> Is the Delta CT data from PCR or from some other method? >>> If it is from PCR in my experience Delta Delta CT is usually normally >>> distributed. >>> were the first delta references to the difference between the experiment >>> and internal reference >>> (e.g. GAPDH) and the second delta refers to 2 experimental conditions. >>> >>> With hopes that the above helps, >>> Rich >>> ------------------------------------------------------------ >>> Richard A. Friedman, PhD >>> Associate Research Scientist, >>> Biomedical Informatics Shared Resource >>> Herbert Irving Comprehensive Cancer Center (HICCC) >>> Lecturer, >>> Department of Biomedical Informatics (DBMI) >>> Educational Coordinator, >>> Center for Computational Biology and Bioinformatics (C2B2)/ >>> National Center for Multiscale Analysis of Genomic Networks (MAGNet) >>> Room 824 >>> Irving Cancer Research Center >>> Columbia University >>> 1130 St. Nicholas Ave >>> New York, NY 10032 >>> (212)851-4765 (voice) >>> friedman@cancercenter.columbia.edu >>> http://cancercenter.columbia.edu/~friedman/ >>> >>> I am a Bayesian. When I see a multiple-choice question on a test and I >>> don't >>> know the answer I say "eeney-meaney-miney-moe". >>> >>> Rose Friedman, Age 14 >>> >>> >>> >>> >>> >>> >>> >>> >>> On May 13, 2011, at 10:46 AM, john herbert wrote: >>> >>> Dear Bioconductors, >>> >>>> I have a bunch of DeltaCT values for several tissues. If I boxplot the >>>> data, >>>> it looks very similar to microarray data, a lot of congestion around >>>> zero. >>>> >>>> Likewise, if I log2 the data, as in microarray, the distributions looks >>>> close to normal and like microarray data. >>>> >>>> Please see the image here for different plots; >>>> >>>> >>>> https://docs.google.com/leaf?id=0B9IUGsKecS4GNDc0OWVlNzEtZjE5Yi00 Y2Q4LWI0M2MtMGFiNzZhMDU0YTFm&hl=en >>>> >>>> My question is data manipulation in this manner OK for this type of data >>>> and >>>> will it effect/invalidate any unsupervised machine learning/clustering? >>>> >>>> Can I quantile normalise the data and still do valid clustering? >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
John, Do not raise deltaCt to a power and do a t-test. To test the hypothesis do deltaCt(condition 1)=deltaCt(condiiton 2) with a t-test. deltaCt=-log2M and will be closer to nornally distbutes that 2^-delatCt. I hope this helps. Best wishes, Rich On Sat, 14 May 2011, john herbert wrote: > The range of Raw CT values is around 15 to 35The 2^-deltaCT are very small, less than zero. An example is?0.079703285 > I have 5 case samples and 5 control samples. For all samples, there are CT measures for target genes and house-keeper genes. Our approach is to > use houskeeper on each sample as that used in Delta CT calculation.? > > E.g. > Sample Case 1 target CT = 15 > Sample Case 1 house keeper CT = 10 > Delta CT = 15-10 = 5 > A = 2 to the power of minus delta CT, as in Excel =power(2,-(-5)) =?0.03125 > > Then normal sample is the same.... > Sample normal 1 target CT = 10 > Sample normal 1 house keeper CT = 4 > Delta CT = 10-4 = 6 > 2 to the power of minus delta CT, as in Excel =power(2,-(-6)) =?0.015625 > > I have lots of these small values. These values don't look normally distributed.? > > My view is maybe I should make an M value (log2 ratios) do ttests etc.? > > Is this the best way to go for gene expression and subsequent clustering?.? > > Thank you. ? > > > On Fri, May 13, 2011 at 9:06 PM, Kevin R. Coombes <kevin.r.coombes at="" gmail.com=""> wrote: > What is the range of the data that you received? > > In most TaqMan real-time PCR experiments, the Ct values range between about 10 (for really really abuindant things like 18S) to 40. > ?These measurements are in cycles. ?In principle, if you had ?perfectly efficient probe-primer combination, the number of mRNA > molecules present would double every cycle. ?As a result, cycle values are already essentially on the "negative log base two" scale. > > As Richard already pointed out, the Delta-Ct or Delta-Delta-Ct values on this scale are usually normal. > > If your data are not in a range that makes sense as cycles, then it is likely that someone exponentiated the data to get it back to > the "raw" scale, and thus converted from normally distributed to log-normal. > > ? ?Kevin > > > > Hi Richard, > Thank you. It is from taqman real time PCR. I have sent a mail asking how > exactly they normalised the data. > We only have biological replicates and no common reference, so I was told we > can only use Delta CT values. > > I make, maybe wrongly, that is Delta Delta CT values are normally > distributed that Delta CT values will also be normally distributed? > > I will make plots of the raw data and Delta CT as I know it. > > > > > > On Fri, May 13, 2011 at 3:53 PM, Richard Friedman< > friedman at cancercenter.columbia.edu> ?wrote: > > Dear John, > > ? ? ? ?Is the Delta CT data from PCR or from some other method? > If it is from PCR in my experience Delta Delta CT is usually normally > distributed. > were the first delta references to the difference between the experiment > and internal reference > (e.g. GAPDH) and the second delta refers to 2 experimental conditions. > > With hopes that the above helps, > Rich > ------------------------------------------------------------ > Richard A. Friedman, PhD > Associate Research Scientist, > Biomedical Informatics Shared Resource > Herbert Irving Comprehensive Cancer Center (HICCC) > Lecturer, > Department of Biomedical Informatics (DBMI) > Educational Coordinator, > Center for Computational Biology and Bioinformatics (C2B2)/ > National Center for Multiscale Analysis of Genomic Networks (MAGNet) > Room 824 > Irving Cancer Research Center > Columbia University > 1130 St. Nicholas Ave > New York, NY 10032 > (212)851-4765 (voice) > friedman at cancercenter.columbia.edu > http://cancercenter.columbia.edu/~friedman/ > > I am a Bayesian. When I see a multiple-choice question on a test and I > don't > know the answer I say "eeney-meaney-miney-moe". > > Rose Friedman, Age 14 > > > > > > > > > On May 13, 2011, at 10:46 AM, john herbert wrote: > > ? Dear Bioconductors, > I have a bunch of DeltaCT values for several tissues. If I boxplot the > data, > it looks very similar to microarray data, a lot of congestion around zero. > > Likewise, if I log2 the data, as in microarray, the distributions looks > close to normal and like microarray data. > > Please see the image here for different plots; > > https://docs.google.com/leaf?id=0B9IUGsKecS4GNDc0O WVlNzEtZjE5Yi00Y2Q4LWI0M2MtMGFiNzZhMDU0YTFm&hl=en > > My question is data manipulation in this manner OK for this type of data > and > will it effect/invalidate any unsupervised machine learning/clustering? > > Can I quantile normalise the data and still do valid clustering? > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- ------------------------------------------------------------ Richard A. Friedman, PhD Associate Research Scientist Herbert Irving Comprehensive Cancer Center Biomedical Informatics Shared Resource Lecturer Department of Biomedical Informatics Box 95, Room 130BB or P&S 1-420C Columbia University Medical Center 630 W. 168th St. New York, NY 10032 (212)305-6901 (5-6901) (voice) friedman at cancercenter.columbia.edu http://cancercenter.columbia.edu/~friedman/ "The last 250 pages of the last Harry Potter book took place in one day because alot happened in that day. All of Ulysses takes place in one day and nothing happened in that day." -Rose Friedman, age 11
ADD REPLY
0
Entering edit mode
Hi John, Richard's suggestion is correct - Delta Ct (or Ct itself) is on the logarithmic scale, so the error is hopefully additive and they may be treated as microarray data after log transformation. However, there is one important difference: microarray usually contains thousands of genes, so in most cases we may expect most of them not to change and hence quantile normalization should be all right. But PCR data usually has a few hundreds (or even less) genes, so one must be much more careful when deciding whether it is reasonable to expect most of these genes not to change between conditions (especially if these genes belong to some set of interest). If this assumption is wrong (i.e. high percentage of genes may change), quantile normalization should be avoided. Regards, Moshe. > John, > > Do not raise deltaCt to a power and do a t-test. > To test the hypothesis do deltaCt(condition 1)=deltaCt(condiiton 2) with a > t-test. > > deltaCt=-log2M and will be closer to nornally distbutes that 2^-delatCt. > > I hope this helps. > > > Best wishes, > Rich > > On Sat, 14 May 2011, john herbert wrote: > >> The range of Raw CT values is around 15 to 35The 2^-deltaCT are very >> small, less than zero. An example is?0.079703285 >> I have 5 case samples and 5 control samples. For all samples, there are >> CT measures for target genes and house-keeper genes. Our approach is to >> use houskeeper on each sample as that used in Delta CT calculation.? >> >> E.g. >> Sample Case 1 target CT = 15 >> Sample Case 1 house keeper CT = 10 >> Delta CT = 15-10 = 5 >> A = 2 to the power of minus delta CT, as in Excel =power(2,-(-5)) >> =?0.03125 >> >> Then normal sample is the same.... >> Sample normal 1 target CT = 10 >> Sample normal 1 house keeper CT = 4 >> Delta CT = 10-4 = 6 >> 2 to the power of minus delta CT, as in Excel =power(2,-(-6)) =?0.015625 >> >> I have lots of these small values. These values don't look normally >> distributed.? >> >> My view is maybe I should make an M value (log2 ratios) do ttests etc.? >> >> Is this the best way to go for gene expression and subsequent >> clustering?.? >> >> Thank you. ? >> >> >> On Fri, May 13, 2011 at 9:06 PM, Kevin R. Coombes >> <kevin.r.coombes at="" gmail.com=""> wrote: >> What is the range of the data that you received? >> >> In most TaqMan real-time PCR experiments, the Ct values range >> between about 10 (for really really abuindant things like 18S) to >> 40. >> ?These measurements are in cycles. ?In principle, if you had >> ?perfectly efficient probe-primer combination, the number of mRNA >> molecules present would double every cycle. ?As a result, cycle >> values are already essentially on the "negative log base two" >> scale. >> >> As Richard already pointed out, the Delta-Ct or Delta-Delta- Ct >> values on this scale are usually normal. >> >> If your data are not in a range that makes sense as cycles, then >> it is likely that someone exponentiated the data to get it back to >> the "raw" scale, and thus converted from normally distributed to >> log-normal. >> >> ? ?Kevin >> >> >> >> Hi Richard, >> Thank you. It is from taqman real time PCR. I have sent a mail >> asking how >> exactly they normalised the data. >> We only have biological replicates and no common reference, so I >> was told we >> can only use Delta CT values. >> >> I make, maybe wrongly, that is Delta Delta CT values are normally >> distributed that Delta CT values will also be normally >> distributed? >> >> I will make plots of the raw data and Delta CT as I know it. >> >> >> >> >> >> On Fri, May 13, 2011 at 3:53 PM, Richard Friedman< >> friedman at cancercenter.columbia.edu> ?wrote: >> >> Dear John, >> >> ? ? ? ?Is the Delta CT data from PCR or from some other >> method? >> If it is from PCR in my experience Delta Delta CT is usually >> normally >> distributed. >> were the first delta references to the difference between >> the experiment >> and internal reference >> (e.g. GAPDH) and the second delta refers to 2 experimental >> conditions. >> >> With hopes that the above helps, >> Rich >> ------------------------------------------------------------ >> Richard A. Friedman, PhD >> Associate Research Scientist, >> Biomedical Informatics Shared Resource >> Herbert Irving Comprehensive Cancer Center (HICCC) >> Lecturer, >> Department of Biomedical Informatics (DBMI) >> Educational Coordinator, >> Center for Computational Biology and Bioinformatics (C2B2)/ >> National Center for Multiscale Analysis of Genomic Networks >> (MAGNet) >> Room 824 >> Irving Cancer Research Center >> Columbia University >> 1130 St. Nicholas Ave >> New York, NY 10032 >> (212)851-4765 (voice) >> friedman at cancercenter.columbia.edu >> http://cancercenter.columbia.edu/~friedman/ >> >> I am a Bayesian. When I see a multiple-choice question on a >> test and I >> don't >> know the answer I say "eeney-meaney-miney-moe". >> >> Rose Friedman, Age 14 >> >> >> >> >> >> >> >> >> On May 13, 2011, at 10:46 AM, john herbert wrote: >> >> ? Dear Bioconductors, >> I have a bunch of DeltaCT values for several tissues. >> If I boxplot the >> data, >> it looks very similar to microarray data, a lot of >> congestion around zero. >> >> Likewise, if I log2 the data, as in microarray, the >> distributions looks >> close to normal and like microarray data. >> >> Please see the image here for different plots; >> >> https://docs.google.com/leaf?id=0B9IUGsKecS4GNDc0 OWVlNzEtZjE5Yi00Y2Q4LWI0M2MtMGFiNzZhMDU0YTFm&hl=en >> >> My question is data manipulation in this manner OK for >> this type of data >> and >> will it effect/invalidate any unsupervised machine >> learning/clustering? >> >> Can I quantile normalise the data and still do valid >> clustering? >> >> ? ? ? ?[[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> ? ? ? ?[[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> > > -- > ------------------------------------------------------------ > Richard A. Friedman, PhD > Associate Research Scientist > Herbert Irving Comprehensive Cancer Center > Biomedical Informatics Shared Resource > Lecturer > Department of Biomedical Informatics > Box 95, Room 130BB or P&S 1-420C > Columbia University Medical Center > 630 W. 168th St. > New York, NY 10032 > (212)305-6901 (5-6901) (voice) > friedman at cancercenter.columbia.edu > http://cancercenter.columbia.edu/~friedman/ > > "The last 250 pages of the last Harry Potter > book took place in one day because alot > happened in that day. All of Ulysses takes > place in one day and nothing happened in that day." > -Rose Friedman, age 11 > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD REPLY
0
Entering edit mode
@richard-friedman-513
Last seen 10.2 years ago
John, I inadvertently took the correspondence off line. By control gene do you mean something that is the same control in all experiments? Or is the control gene the same as the Target gene under different conditions? Best wishes, Rich On May 13, 2011, at 1:14 PM, john herbert wrote: > The exact values I have are; > Step 1 = Delta CT = CT of a target gene - CT of a control gene > > Normalised values in the data are 2^-(Delta CT) (2 to the power - DCT) > > I am wondering, as this data in this form, looks like microarray > data. So can I quantile normalise? And can I cluster based on the > normalised data? > > Sorry if I was not clear. > > > > On Fri, May 13, 2011 at 5:29 PM, Richard Friedman <friedman at="" cancercenter.columbia.edu=""> > wrote: > > On May 13, 2011, at 12:04 PM, john herbert wrote: > > Hi Richard, > Thank you. It is from taqman real time PCR. I have sent a mail > asking how exactly they normalised the data. > We only have biological replicates and no common reference, so I was > told we can only use Delta CT values. > > I make, maybe wrongly, that is Delta Delta CT values are normally > distributed that Delta CT values will also be normally distributed? > > I will make plots of the raw data and Delta CT as I know it. > > > > > What does your Delta CT represent? The change of what and what? > > > > > > > > > On Fri, May 13, 2011 at 3:53 PM, Richard Friedman <friedman at="" cancercenter.columbia.edu=""> > wrote: > Dear John, > > Is the Delta CT data from PCR or from some other method? > If it is from PCR in my experience Delta Delta CT is usually > normally distributed. > were the first delta references to the difference between the > experiment and internal reference > (e.g. GAPDH) and the second delta refers to 2 experimental conditions. > > With hopes that the above helps, > Rich > ------------------------------------------------------------ > Richard A. Friedman, PhD > Associate Research Scientist, > Biomedical Informatics Shared Resource > Herbert Irving Comprehensive Cancer Center (HICCC) > Lecturer, > Department of Biomedical Informatics (DBMI) > Educational Coordinator, > Center for Computational Biology and Bioinformatics (C2B2)/ > National Center for Multiscale Analysis of Genomic Networks (MAGNet) > Room 824 > Irving Cancer Research Center > Columbia University > 1130 St. Nicholas Ave > New York, NY 10032 > (212)851-4765 (voice) > friedman at cancercenter.columbia.edu > http://cancercenter.columbia.edu/~friedman/ > > I am a Bayesian. When I see a multiple-choice question on a test and > I don't > know the answer I say "eeney-meaney-miney-moe". > > Rose Friedman, Age 14 > > > > > > > > > On May 13, 2011, at 10:46 AM, john herbert wrote: > > Dear Bioconductors, > I have a bunch of DeltaCT values for several tissues. If I boxplot > the data, > it looks very similar to microarray data, a lot of congestion around > zero. > > Likewise, if I log2 the data, as in microarray, the distributions > looks > close to normal and like microarray data. > > Please see the image here for different plots; > https://docs.google.com/leaf?id=0B9IUGsKecS4GNDc0OWVlNzEtZjE5Yi00Y2Q 4LWI0M2MtMGFiNzZhMDU0YTFm&hl=en > > My question is data manipulation in this manner OK for this type of > data and > will it effect/invalidate any unsupervised machine learning/ > clustering? > > Can I quantile normalise the data and still do valid clustering? > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > >
ADD COMMENT
0
Entering edit mode
John and Rich, By definition, delta CT for TaqMan data is the <detector ct="" value=""> - <ct for="" a="" reference="">. In some cases, the reference may be from a single detector (e.g., Ct of GAPDH) or it may also be from a collection of detectors (avg of Ct of GAPDH, Ct of BACTIN). Ct values are normally distributed. Therefore, 2^-(delta Ct) would have a log normal distribution. The latter values should resemble typical gene expression data (e.g., Affymetrix signal). Hope this helps, Bill -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Richard Friedman Sent: Friday, May 13, 2011 1:18 PM To: john herbert; Bioconductor mailing list Subject: Re: [BioC] Delta CT data distribution and cluster analyses;machine learning or other John, I inadvertently took the correspondence off line. By control gene do you mean something that is the same control in all experiments? Or is the control gene the same as the Target gene under different conditions? Best wishes, Rich On May 13, 2011, at 1:14 PM, john herbert wrote: > The exact values I have are; > Step 1 = Delta CT = CT of a target gene - CT of a control gene > > Normalised values in the data are 2^-(Delta CT) (2 to the power - DCT) > > I am wondering, as this data in this form, looks like microarray data. > So can I quantile normalise? And can I cluster based on the normalised > data? > > Sorry if I was not clear. > > > > On Fri, May 13, 2011 at 5:29 PM, Richard Friedman > <friedman at="" cancercenter.columbia.edu=""> > wrote: > > On May 13, 2011, at 12:04 PM, john herbert wrote: > > Hi Richard, > Thank you. It is from taqman real time PCR. I have sent a mail asking > how exactly they normalised the data. > We only have biological replicates and no common reference, so I was > told we can only use Delta CT values. > > I make, maybe wrongly, that is Delta Delta CT values are normally > distributed that Delta CT values will also be normally distributed? > > I will make plots of the raw data and Delta CT as I know it. > > > > > What does your Delta CT represent? The change of what and what? > > > > > > > > > On Fri, May 13, 2011 at 3:53 PM, Richard Friedman > <friedman at="" cancercenter.columbia.edu=""> > wrote: > Dear John, > > Is the Delta CT data from PCR or from some other method? > If it is from PCR in my experience Delta Delta CT is usually normally > distributed. > were the first delta references to the difference between the > experiment and internal reference (e.g. GAPDH) and the second delta > refers to 2 experimental conditions. > > With hopes that the above helps, > Rich > ------------------------------------------------------------ > Richard A. Friedman, PhD > Associate Research Scientist, > Biomedical Informatics Shared Resource Herbert Irving Comprehensive > Cancer Center (HICCC) Lecturer, Department of Biomedical Informatics > (DBMI) Educational Coordinator, Center for Computational Biology and > Bioinformatics (C2B2)/ National Center for Multiscale Analysis of > Genomic Networks (MAGNet) Room 824 Irving Cancer Research Center > Columbia University > 1130 St. Nicholas Ave > New York, NY 10032 > (212)851-4765 (voice) > friedman at cancercenter.columbia.edu > http://cancercenter.columbia.edu/~friedman/ > > I am a Bayesian. When I see a multiple-choice question on a test and I > don't know the answer I say "eeney-meaney-miney-moe". > > Rose Friedman, Age 14 > > > > > > > > > On May 13, 2011, at 10:46 AM, john herbert wrote: > > Dear Bioconductors, > I have a bunch of DeltaCT values for several tissues. If I boxplot the > data, it looks very similar to microarray data, a lot of congestion > around zero. > > Likewise, if I log2 the data, as in microarray, the distributions > looks close to normal and like microarray data. > > Please see the image here for different plots; > https://docs.google.com/leaf?id=0B9IUGsKecS4GNDc0OWVlNzEtZjE5Yi00Y2Q4L > WI0M2MtMGFiNzZhMDU0YTFm&hl=en > > My question is data manipulation in this manner OK for this type of > data and will it effect/invalidate any unsupervised machine learning/ > clustering? > > Can I quantile normalise the data and still do valid clustering? > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 973 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6