htqPCR question
2
0
Entering edit mode
@andreia-fonseca-3796
Last seen 7.1 years ago
Dear Heidi, So I have each of my biological replicate in a plate, I have used the following command to read the data: > raw <- readCtData(files=files$File, path=path, n.features = 96, flag = 4, feature = 1, type = 2, position = 3, Ct = 5, header = FALSE, SDS = FALSE) my files: File Treatment 1 Emb1.txt Embrio 2 Emb2.txt Embrio 3 Emb3.txt Embrio 4 lin1a.txt BM 5 lin2.txt BM 6 lin_1a.txt BM 7 lin1b.txt BM 8 Sca3.txt Heart I did the filtering raw.cat<-setCategory(raw, groups=files$Treatment, quantile=0.8) and now I want to normalize: d.norm<-normalizeCtDataraw.cat,norm="deltaCt", deltaCt.genes=c("U6")) and then I get the message: Calculating deltaCt values Using control gene(s): U6 Card 1: Mean=33.62 Stdev=NA Card 2: Mean=32.65 Stdev=NA Card 3: Mean=32.47 Stdev=NA Card 4: Mean=35.46 Stdev=NA Card 5: Mean=34.09 Stdev=NA Card 6: Mean=34.39 Stdev=NA Card 7: Mean=35.79 Stdev=NA Card 8: Mean=33.72 Stdev=NA so, for the normalization, the package is not taking into account the variation of the endogenous gene U6 across technical and biological replicates.How can I make the normalization taking into account the mean and standard deviation of the endogenous genes at least across the biolgical replicates of the same treatment? Thanks Kind regards, Andreia On Fri, Apr 30, 2010 at 8:53 PM, Heidi Dvinge <heidi@ebi.ac.uk> wrote: > Hi Andreia > > > Dear Heidi, > > > > Sorry it seems that I have not understood your table well. All the > samples > > were analyzed using the same platform, but each sample is on a different > > plate. This way the commands are different right? > > So going back to your table: > > > > Sample Cell_type Platform RT_reaction > > ------------------------------ > > ------ > > 1 A type1 RT1 > > 1 A type1 RT1 > > 1 A type1 RT2 > > 2 A type1 RT1 > > 2 B type1 RT1 > > 3 B type1 RT1 > > 4 B type1 RT1 > > > So, you actually just have cell type A and B you want to compare? That > makes it all easier. In that case you can use either of the three > functions limmaCtData, ttestCtData and mannwhitneyCtData depending on what > you prefer. Both the vignette and the help pages has examples about how to > compare two groups. ttestCtData and mannwhitneyCtData are probably easier > if you just have two conditions, with mannwhitney perhaps being most > suitable if you only have a small number of genes on each qPCR card. > > There's no way to account for the single different RT reaction though. You > can check if this is an outlier in any of the QC plots, like plotCtDensity > or plotCtBoxes, or perhaps try running the A versus B comparison both > with/without this card, and see if it skews the result in any unexpected > ways. Hopefully different RT reactions should be quite similar though. > > HTH > \Heidi > > > Thanks > > Andreia > > > > > > On Tue, Apr 27, 2010 at 11:11 AM, Heidi Dvinge <heidi@ebi.ac.uk> wrote: > > > >> > >> On 26 Apr 2010, at 09:41, Andreia Fonseca wrote: > >> > >> Dear heidi, > >> thank you for your reply. The datasets come from a 96 well qPCR plate, > >> but > >> only 80 reactions have good quality. Each line corresponds to a separate > >> plate, with the same probes. I would like to compare A and B. Thanks for > >> your help. > >> Kind regards, > >> Andreia > >> > >> Hello Andreia, > >> > >> so, what you probably want is to include the "Platform" as a andom > >> effect > >> using the duplicateCorrelation() function from limma with block > >> argument. If > >> you type > >> > >> > limmaUsersGuide() > >> > >> you'll get the full limma users guide, which includes examples of how to > >> include different kinds of technical and biological variation. This is > >> basically what the limmaCtData() funciton is based on. So generally > >> you'd > >> want to first do something like this (with "norm" being your normalised > >> qPCRset data): > >> > >> > targets <- data.frame(Cell=rep(c("A", "B"), times=3:4), > >> Platform=rep(c("type1", "type2"), times=c(2,5))) > >> > design <- model.matrix(~0+targets$Cell) > >> > colnames(design) <- c("CellTypeA", "CellTypeB") > >> > dupcor <- duplicateCorrelation(exprs(norm), design=design, > >> block=targets$Platform) > >> > >> and then include the dupcor$correlation and block in the > >> limmaCtData.There's a small bug in limmaCtData for including dupcor > >> though - > >> I'll get that fixed. > >> > >> However, int his case that doesn't really matter, since I don't think > >> you > >> can analyse you data this way, if you experiment design really is as > >> outlined in "targets" here. All platforms of type1 are used for cell > >> type A, > >> there are none present for B. > >> > >> You might have to just ignore that the samples are measured on 2 > >> different > >> platforms, and simply do a standard test between cell type A and B, even > >> though this is not optimal. How different are these platforms - would > >> you > >> expect it to be significant for your analysis? Have you tried testing > >> for > >> significance between Ct values measured on type1 and type2? Or what's > >> the > >> dupcor$correlation between them? These are some of the things you can > >> look > >> at before deciding how to continue. > >> > >> Best wishes > >> \Heidi > >> > >> On Sat, Apr 24, 2010 at 12:45 PM, Heidi Dvinge <heidi@ebi.ac.uk> wrote: > >> > >>> Hello Andreia, > >>> > >>> hm, slightly tricky. Are these data sets from some sort of commercial > >>> platform, like the ABI TLDA cards, or from normal qPCR reactions on a > >>> 384 > >>> well plate or similar? Just to make sure I understand you experiment > >>> right, do you have something like this? > >>> > >>> Sample Cell_type Platform RT_reaction > >>> ------------------------------------ > >>> 1 A type1 RT1 > >>> 1 A type1 RT1 > >>> 1 A type2 RT2 > >>> 2 A type2 RT1 > >>> 2 B type2 RT1 > >>> 3 B type2 RT1 > >>> 4 B type2 RT1 > >>> > >>> Is each line a separate plate? Or would the "Platform" column > >>> correspond > >>> to 2 individual plates, where multiple samples can be loaded onto each > >>> plate? > >>> What exactly is your primary interest here? Difference between cell > >>> types > >>> A and B? I think we need some specification here, for us to help you > >>> properly. > >>> > >>> If you have different kinds of plates, then you could potentially > >>> include > >>> pate as a batch effect when doing your analysis. This will require you > >>> to > >>> use limmaCtData() for your test. This is based on the function lmFit > >>> from > >>> the limma package, which has some nice examples of how to take this > >>> sort > >>> of information into account. Although it looks like you have too many > >>> variables here to consider all of them. > >>> > >>> By the way, you don't necessarily need one file per sample. As long as > >>> you > >>> have the sample number of genes per sample, e.g. 384, then with HTqPCR > >>> version 1.2.0 you can use the n.data parameter in readCtData() to > >>> indicate > >>> how many sets of data are in each file. > >>> > >>> HTH > >>> \Heidi > >>> > >>> > Dear Heidi, > >>> > > >>> > I have received a data set from a qPCR miRNA set where I have > >>> > > >>> > sample1 - 2 replicates from the same RT reaction > >>> > 1 replicate from an independen RT reaction > >>> cell_type:A > >>> > sample 2 - cell type : A > >>> > sample2 cell_type:B > >>> > sample3 cell_type:B > >>> > sample 4 cell_type:B > >>> > > >>> > I saw that I should prepare one file per sample, but the replicates > >>> of > >>> > sample1 have been produced in different plates and one is even from > >>> > another > >>> > RT reaction. How can I account for this using your package? > >>> > Thanks in advacne for your reply. > >>> > With kind regards, > >>> > Andreia > >>> > > >>> > > >>> > -- > >>> > -------------------------------------------- > >>> > Andreia J. Amaral > >>> > Unidade de Imunologia Clínica > >>> > Instituto de Medicina Molecular > >>> > Universidade de Lisboa > >>> > email: andreiaamaral@fm.ul.pt > >>> > andreia.fonseca@gmail.com > >>> > > >>> > >>> > >>> > >> > >> > >> -- > >> -------------------------------------------- > >> Andreia J. Amaral > >> Unidade de Imunologia Clínica > >> Instituto de Medicina Molecular > >> Universidade de Lisboa > >> email: andreiaamaral@fm.ul.pt > >> andreia.fonseca@gmail.com > >> > >> > >> > > > > > > -- > > -------------------------------------------- > > Andreia J. Amaral > > Unidade de Imunologia Clínica > > Instituto de Medicina Molecular > > Universidade de Lisboa > > email: andreiaamaral@fm.ul.pt > > andreia.fonseca@gmail.com > > > > > -- -------------------------------------------- Andreia J. Amaral Unidade de Imunologia Clínica Instituto de Medicina Molecular Universidade de Lisboa email: andreiaamaral@fm.ul.pt andreia.fonseca@gmail.com [[alternative HTML version deleted]]
miRNA qPCR Normalization limma miRNA qPCR Normalization limma • 1.8k views
ADD COMMENT
0
Entering edit mode
@michael-imbeault-3593
Last seen 9.6 years ago
Hi everyone, Is there a way to get a changelog of what changed between bioC 2.5 and 2.6 for individual packages? New functions, bugs fixed, etc? The associated documentation of packages I looked at don't seem to include this. Cheers, Michael
ADD COMMENT
0
Entering edit mode
news() is the tool for that... and emailing the maintainer is the backup approach, in case a NEWS file is not present in the package. b On Sun, May 2, 2010 at 12:34 AM, Michael Imbeault <michael.imbeault at="" sympatico.ca=""> wrote: > Hi everyone, > > Is there a way to get a changelog of what changed between bioC 2.5 and 2.6 > for individual packages? New functions, bugs fixed, etc? The associated > documentation of packages I looked at don't seem to include this. > > Cheers, > Michael > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Heidi Dvinge ★ 2.0k
@heidi-dvinge-2195
Last seen 9.6 years ago
Hi Andreia, > Dear Heidi, > > So I have each of my biological replicate in a plate, I have used the > following command to read the data: > >> raw <- readCtData(files=files$File, path=path, n.features = 96, flag = >> 4, > feature = 1, type = 2, position = 3, Ct = 5, header = FALSE, SDS = FALSE) > my files: > > File Treatment > 1 Emb1.txt Embrio > 2 Emb2.txt Embrio > 3 Emb3.txt Embrio > 4 lin1a.txt BM > 5 lin2.txt BM > 6 lin_1a.txt BM > 7 lin1b.txt BM > 8 Sca3.txt Heart > > I did the filtering > raw.cat<-setCategory(raw, groups=files$Treatment, quantile=0.8) > > and now I want to normalize: > > d.norm<-normalizeCtDataraw.cat,norm="deltaCt", deltaCt.genes=c("U6")) > and then I get the message: > > Calculating deltaCt values > Using control gene(s): U6 > Card 1: Mean=33.62 Stdev=NA > Card 2: Mean=32.65 Stdev=NA > Card 3: Mean=32.47 Stdev=NA > Card 4: Mean=35.46 Stdev=NA > Card 5: Mean=34.09 Stdev=NA > Card 6: Mean=34.39 Stdev=NA > Card 7: Mean=35.79 Stdev=NA > Card 8: Mean=33.72 Stdev=NA > > so, for the normalization, the package is not taking into account the > variation of the endogenous gene U6 across technical and biological > replicates.How can I make the normalization taking into account the mean > and > standard deviation of the endogenous genes at least across the biolgical > replicates of the same treatment? With deltaCt normalisation I'm afraid you cannot use information across biological replicates in HTqPCR, since these are placed on individual plates. In general, deltaCt is trying to normalise for the total amount of RNA in a single sample, using some sort of reference RNA such as GAPDH/actin or whatever is suitable in a given experiment, but that considers each sample individually. Technical replicates within plates should automatically be taken into account though - how many copies of U6 are on each plate? If you want to normalise between samples, you'll have to use either quantile normalisation, or some of the rank invariant methods. However, these take all genes on a card as input, not a manually selected set I'm afraid. HTH \Heidi > Thanks > Kind regards, > Andreia > > > > > On Fri, Apr 30, 2010 at 8:53 PM, Heidi Dvinge <heidi at="" ebi.ac.uk=""> wrote: > >> Hi Andreia >> >> > Dear Heidi, >> > >> > Sorry it seems that I have not understood your table well. All the >> samples >> > were analyzed using the same platform, but each sample is on a >> different >> > plate. This way the commands are different right? >> > So going back to your table: >> > >> > Sample Cell_type Platform RT_reaction >> > ------------------------------ >> > ------ >> > 1 A type1 RT1 >> > 1 A type1 RT1 >> > 1 A type1 RT2 >> > 2 A type1 RT1 >> > 2 B type1 RT1 >> > 3 B type1 RT1 >> > 4 B type1 RT1 >> > >> So, you actually just have cell type A and B you want to compare? That >> makes it all easier. In that case you can use either of the three >> functions limmaCtData, ttestCtData and mannwhitneyCtData depending on >> what >> you prefer. Both the vignette and the help pages has examples about how >> to >> compare two groups. ttestCtData and mannwhitneyCtData are probably >> easier >> if you just have two conditions, with mannwhitney perhaps being most >> suitable if you only have a small number of genes on each qPCR card. >> >> There's no way to account for the single different RT reaction though. >> You >> can check if this is an outlier in any of the QC plots, like >> plotCtDensity >> or plotCtBoxes, or perhaps try running the A versus B comparison both >> with/without this card, and see if it skews the result in any unexpected >> ways. Hopefully different RT reactions should be quite similar though. >> >> HTH >> \Heidi >> >> > Thanks >> > Andreia >> > >> > >> > On Tue, Apr 27, 2010 at 11:11 AM, Heidi Dvinge <heidi at="" ebi.ac.uk=""> >> wrote: >> > >> >> >> >> On 26 Apr 2010, at 09:41, Andreia Fonseca wrote: >> >> >> >> Dear heidi, >> >> thank you for your reply. The datasets come from a 96 well qPCR >> plate, >> >> but >> >> only 80 reactions have good quality. Each line corresponds to a >> separate >> >> plate, with the same probes. I would like to compare A and B. Thanks >> for >> >> your help. >> >> Kind regards, >> >> Andreia >> >> >> >> Hello Andreia, >> >> >> >> so, what you probably want is to include the "Platform" as a andom >> >> effect >> >> using the duplicateCorrelation() function from limma with block >> >> argument. If >> >> you type >> >> >> >> > limmaUsersGuide() >> >> >> >> you'll get the full limma users guide, which includes examples of how >> to >> >> include different kinds of technical and biological variation. This >> is >> >> basically what the limmaCtData() funciton is based on. So generally >> >> you'd >> >> want to first do something like this (with "norm" being your >> normalised >> >> qPCRset data): >> >> >> >> > targets <- data.frame(Cell=rep(c("A", "B"), times=3:4), >> >> Platform=rep(c("type1", "type2"), times=c(2,5))) >> >> > design <- model.matrix(~0+targets$Cell) >> >> > colnames(design) <- c("CellTypeA", "CellTypeB") >> >> > dupcor <- duplicateCorrelation(exprs(norm), design=design, >> >> block=targets$Platform) >> >> >> >> and then include the dupcor$correlation and block in the >> >> limmaCtData.There's a small bug in limmaCtData for including dupcor >> >> though - >> >> I'll get that fixed. >> >> >> >> However, int his case that doesn't really matter, since I don't think >> >> you >> >> can analyse you data this way, if you experiment design really is as >> >> outlined in "targets" here. All platforms of type1 are used for cell >> >> type A, >> >> there are none present for B. >> >> >> >> You might have to just ignore that the samples are measured on 2 >> >> different >> >> platforms, and simply do a standard test between cell type A and B, >> even >> >> though this is not optimal. How different are these platforms - would >> >> you >> >> expect it to be significant for your analysis? Have you tried testing >> >> for >> >> significance between Ct values measured on type1 and type2? Or what's >> >> the >> >> dupcor$correlation between them? These are some of the things you can >> >> look >> >> at before deciding how to continue. >> >> >> >> Best wishes >> >> \Heidi >> >> >> >> On Sat, Apr 24, 2010 at 12:45 PM, Heidi Dvinge <heidi at="" ebi.ac.uk=""> >> wrote: >> >> >> >>> Hello Andreia, >> >>> >> >>> hm, slightly tricky. Are these data sets from some sort of >> commercial >> >>> platform, like the ABI TLDA cards, or from normal qPCR reactions on >> a >> >>> 384 >> >>> well plate or similar? Just to make sure I understand you experiment >> >>> right, do you have something like this? >> >>> >> >>> Sample Cell_type Platform RT_reaction >> >>> ------------------------------------ >> >>> 1 A type1 RT1 >> >>> 1 A type1 RT1 >> >>> 1 A type2 RT2 >> >>> 2 A type2 RT1 >> >>> 2 B type2 RT1 >> >>> 3 B type2 RT1 >> >>> 4 B type2 RT1 >> >>> >> >>> Is each line a separate plate? Or would the "Platform" column >> >>> correspond >> >>> to 2 individual plates, where multiple samples can be loaded onto >> each >> >>> plate? >> >>> What exactly is your primary interest here? Difference between cell >> >>> types >> >>> A and B? I think we need some specification here, for us to help you >> >>> properly. >> >>> >> >>> If you have different kinds of plates, then you could potentially >> >>> include >> >>> pate as a batch effect when doing your analysis. This will require >> you >> >>> to >> >>> use limmaCtData() for your test. This is based on the function lmFit >> >>> from >> >>> the limma package, which has some nice examples of how to take this >> >>> sort >> >>> of information into account. Although it looks like you have too >> many >> >>> variables here to consider all of them. >> >>> >> >>> By the way, you don't necessarily need one file per sample. As long >> as >> >>> you >> >>> have the sample number of genes per sample, e.g. 384, then with >> HTqPCR >> >>> version 1.2.0 you can use the n.data parameter in readCtData() to >> >>> indicate >> >>> how many sets of data are in each file. >> >>> >> >>> HTH >> >>> \Heidi >> >>> >> >>> > Dear Heidi, >> >>> > >> >>> > I have received a data set from a qPCR miRNA set where I have >> >>> > >> >>> > sample1 - 2 replicates from the same RT reaction >> >>> > 1 replicate from an independen RT reaction >> >>> cell_type:A >> >>> > sample 2 - cell type : A >> >>> > sample2 cell_type:B >> >>> > sample3 cell_type:B >> >>> > sample 4 cell_type:B >> >>> > >> >>> > I saw that I should prepare one file per sample, but the >> replicates >> >>> of >> >>> > sample1 have been produced in different plates and one is even >> from >> >>> > another >> >>> > RT reaction. How can I account for this using your package? >> >>> > Thanks in advacne for your reply. >> >>> > With kind regards, >> >>> > Andreia >> >>> > >> >>> > >> >>> > -- >> >>> > -------------------------------------------- >> >>> > Andreia J. Amaral >> >>> > Unidade de Imunologia Cl?nica >> >>> > Instituto de Medicina Molecular >> >>> > Universidade de Lisboa >> >>> > email: andreiaamaral at fm.ul.pt >> >>> > andreia.fonseca at gmail.com >> >>> > >> >>> >> >>> >> >>> >> >> >> >> >> >> -- >> >> -------------------------------------------- >> >> Andreia J. Amaral >> >> Unidade de Imunologia Cl?nica >> >> Instituto de Medicina Molecular >> >> Universidade de Lisboa >> >> email: andreiaamaral at fm.ul.pt >> >> andreia.fonseca at gmail.com >> >> >> >> >> >> >> > >> > >> > -- >> > -------------------------------------------- >> > Andreia J. Amaral >> > Unidade de Imunologia Cl?nica >> > Instituto de Medicina Molecular >> > Universidade de Lisboa >> > email: andreiaamaral at fm.ul.pt >> > andreia.fonseca at gmail.com >> > >> >> >> > > > -- > -------------------------------------------- > Andreia J. Amaral > Unidade de Imunologia Cl?nica > Instituto de Medicina Molecular > Universidade de Lisboa > email: andreiaamaral at fm.ul.pt > andreia.fonseca at gmail.com >
ADD COMMENT
0
Entering edit mode
Hi Heidi, thanks for your response. I only have one qPCR per probe in each plate, so I have only one copy of U6 in each plate. I have only technical replicates for one sample, lin1a.txt, lin_1a.txt and lin1b.txt, but each technical replicate was analyzed in a different plate. So to use the quantile normalization or the rank invariant methods, do I have to, add an option saying to do normalization between samples, or I just have to follow the vignette? Another question, do these methods make the normalization between samples of the same treatment? Thanks once again. With kind regards, Andreia On Sun, May 2, 2010 at 9:34 AM, Heidi Dvinge <heidi@ebi.ac.uk> wrote: > Hi Andreia, > > > Dear Heidi, > > > > So I have each of my biological replicate in a plate, I have used the > > following command to read the data: > > > >> raw <- readCtData(files=files$File, path=path, n.features = 96, flag = > >> 4, > > feature = 1, type = 2, position = 3, Ct = 5, header = FALSE, SDS = FALSE) > > my files: > > > > File Treatment > > 1 Emb1.txt Embrio > > 2 Emb2.txt Embrio > > 3 Emb3.txt Embrio > > 4 lin1a.txt BM > > 5 lin2.txt BM > > 6 lin_1a.txt BM > > 7 lin1b.txt BM > > 8 Sca3.txt Heart > > > > I did the filtering > > raw.cat<-setCategory(raw, groups=files$Treatment, quantile=0.8) > > > > and now I want to normalize: > > > > d.norm<-normalizeCtDataraw.cat,norm="deltaCt", deltaCt.genes=c("U6")) > > and then I get the message: > > > > Calculating deltaCt values > > Using control gene(s): U6 > > Card 1: Mean=33.62 Stdev=NA > > Card 2: Mean=32.65 Stdev=NA > > Card 3: Mean=32.47 Stdev=NA > > Card 4: Mean=35.46 Stdev=NA > > Card 5: Mean=34.09 Stdev=NA > > Card 6: Mean=34.39 Stdev=NA > > Card 7: Mean=35.79 Stdev=NA > > Card 8: Mean=33.72 Stdev=NA > > > > so, for the normalization, the package is not taking into account the > > variation of the endogenous gene U6 across technical and biological > > replicates.How can I make the normalization taking into account the mean > > and > > standard deviation of the endogenous genes at least across the biolgical > > replicates of the same treatment? > > With deltaCt normalisation I'm afraid you cannot use information across > biological replicates in HTqPCR, since these are placed on individual > plates. In general, deltaCt is trying to normalise for the total amount of > RNA in a single sample, using some sort of reference RNA such as > GAPDH/actin or whatever is suitable in a given experiment, but that > considers each sample individually. Technical replicates within plates > should automatically be taken into account though - how many copies of U6 > are on each plate? > > If you want to normalise between samples, you'll have to use either > quantile normalisation, or some of the rank invariant methods. However, > these take all genes on a card as input, not a manually selected set I'm > afraid. > > HTH > \Heidi > > > Thanks > > Kind regards, > > Andreia > > > > > > > > > > On Fri, Apr 30, 2010 at 8:53 PM, Heidi Dvinge <heidi@ebi.ac.uk> wrote: > > > >> Hi Andreia > >> > >> > Dear Heidi, > >> > > >> > Sorry it seems that I have not understood your table well. All the > >> samples > >> > were analyzed using the same platform, but each sample is on a > >> different > >> > plate. This way the commands are different right? > >> > So going back to your table: > >> > > >> > Sample Cell_type Platform RT_reaction > >> > ------------------------------ > >> > ------ > >> > 1 A type1 RT1 > >> > 1 A type1 RT1 > >> > 1 A type1 RT2 > >> > 2 A type1 RT1 > >> > 2 B type1 RT1 > >> > 3 B type1 RT1 > >> > 4 B type1 RT1 > >> > > >> So, you actually just have cell type A and B you want to compare? That > >> makes it all easier. In that case you can use either of the three > >> functions limmaCtData, ttestCtData and mannwhitneyCtData depending on > >> what > >> you prefer. Both the vignette and the help pages has examples about how > >> to > >> compare two groups. ttestCtData and mannwhitneyCtData are probably > >> easier > >> if you just have two conditions, with mannwhitney perhaps being most > >> suitable if you only have a small number of genes on each qPCR card. > >> > >> There's no way to account for the single different RT reaction though. > >> You > >> can check if this is an outlier in any of the QC plots, like > >> plotCtDensity > >> or plotCtBoxes, or perhaps try running the A versus B comparison both > >> with/without this card, and see if it skews the result in any unexpected > >> ways. Hopefully different RT reactions should be quite similar though. > >> > >> HTH > >> \Heidi > >> > >> > Thanks > >> > Andreia > >> > > >> > > >> > On Tue, Apr 27, 2010 at 11:11 AM, Heidi Dvinge <heidi@ebi.ac.uk> > >> wrote: > >> > > >> >> > >> >> On 26 Apr 2010, at 09:41, Andreia Fonseca wrote: > >> >> > >> >> Dear heidi, > >> >> thank you for your reply. The datasets come from a 96 well qPCR > >> plate, > >> >> but > >> >> only 80 reactions have good quality. Each line corresponds to a > >> separate > >> >> plate, with the same probes. I would like to compare A and B. Thanks > >> for > >> >> your help. > >> >> Kind regards, > >> >> Andreia > >> >> > >> >> Hello Andreia, > >> >> > >> >> so, what you probably want is to include the "Platform" as a andom > >> >> effect > >> >> using the duplicateCorrelation() function from limma with block > >> >> argument. If > >> >> you type > >> >> > >> >> > limmaUsersGuide() > >> >> > >> >> you'll get the full limma users guide, which includes examples of how > >> to > >> >> include different kinds of technical and biological variation. This > >> is > >> >> basically what the limmaCtData() funciton is based on. So generally > >> >> you'd > >> >> want to first do something like this (with "norm" being your > >> normalised > >> >> qPCRset data): > >> >> > >> >> > targets <- data.frame(Cell=rep(c("A", "B"), times=3:4), > >> >> Platform=rep(c("type1", "type2"), times=c(2,5))) > >> >> > design <- model.matrix(~0+targets$Cell) > >> >> > colnames(design) <- c("CellTypeA", "CellTypeB") > >> >> > dupcor <- duplicateCorrelation(exprs(norm), design=design, > >> >> block=targets$Platform) > >> >> > >> >> and then include the dupcor$correlation and block in the > >> >> limmaCtData.There's a small bug in limmaCtData for including dupcor > >> >> though - > >> >> I'll get that fixed. > >> >> > >> >> However, int his case that doesn't really matter, since I don't think > >> >> you > >> >> can analyse you data this way, if you experiment design really is as > >> >> outlined in "targets" here. All platforms of type1 are used for cell > >> >> type A, > >> >> there are none present for B. > >> >> > >> >> You might have to just ignore that the samples are measured on 2 > >> >> different > >> >> platforms, and simply do a standard test between cell type A and B, > >> even > >> >> though this is not optimal. How different are these platforms - would > >> >> you > >> >> expect it to be significant for your analysis? Have you tried testing > >> >> for > >> >> significance between Ct values measured on type1 and type2? Or what's > >> >> the > >> >> dupcor$correlation between them? These are some of the things you can > >> >> look > >> >> at before deciding how to continue. > >> >> > >> >> Best wishes > >> >> \Heidi > >> >> > >> >> On Sat, Apr 24, 2010 at 12:45 PM, Heidi Dvinge <heidi@ebi.ac.uk> > >> wrote: > >> >> > >> >>> Hello Andreia, > >> >>> > >> >>> hm, slightly tricky. Are these data sets from some sort of > >> commercial > >> >>> platform, like the ABI TLDA cards, or from normal qPCR reactions on > >> a > >> >>> 384 > >> >>> well plate or similar? Just to make sure I understand you experiment > >> >>> right, do you have something like this? > >> >>> > >> >>> Sample Cell_type Platform RT_reaction > >> >>> ------------------------------------ > >> >>> 1 A type1 RT1 > >> >>> 1 A type1 RT1 > >> >>> 1 A type2 RT2 > >> >>> 2 A type2 RT1 > >> >>> 2 B type2 RT1 > >> >>> 3 B type2 RT1 > >> >>> 4 B type2 RT1 > >> >>> > >> >>> Is each line a separate plate? Or would the "Platform" column > >> >>> correspond > >> >>> to 2 individual plates, where multiple samples can be loaded onto > >> each > >> >>> plate? > >> >>> What exactly is your primary interest here? Difference between cell > >> >>> types > >> >>> A and B? I think we need some specification here, for us to help you > >> >>> properly. > >> >>> > >> >>> If you have different kinds of plates, then you could potentially > >> >>> include > >> >>> pate as a batch effect when doing your analysis. This will require > >> you > >> >>> to > >> >>> use limmaCtData() for your test. This is based on the function lmFit > >> >>> from > >> >>> the limma package, which has some nice examples of how to take this > >> >>> sort > >> >>> of information into account. Although it looks like you have too > >> many > >> >>> variables here to consider all of them. > >> >>> > >> >>> By the way, you don't necessarily need one file per sample. As long > >> as > >> >>> you > >> >>> have the sample number of genes per sample, e.g. 384, then with > >> HTqPCR > >> >>> version 1.2.0 you can use the n.data parameter in readCtData() to > >> >>> indicate > >> >>> how many sets of data are in each file. > >> >>> > >> >>> HTH > >> >>> \Heidi > >> >>> > >> >>> > Dear Heidi, > >> >>> > > >> >>> > I have received a data set from a qPCR miRNA set where I have > >> >>> > > >> >>> > sample1 - 2 replicates from the same RT reaction > >> >>> > 1 replicate from an independen RT reaction > >> >>> cell_type:A > >> >>> > sample 2 - cell type : A > >> >>> > sample2 cell_type:B > >> >>> > sample3 cell_type:B > >> >>> > sample 4 cell_type:B > >> >>> > > >> >>> > I saw that I should prepare one file per sample, but the > >> replicates > >> >>> of > >> >>> > sample1 have been produced in different plates and one is even > >> from > >> >>> > another > >> >>> > RT reaction. How can I account for this using your package? > >> >>> > Thanks in advacne for your reply. > >> >>> > With kind regards, > >> >>> > Andreia > >> >>> > > >> >>> > > >> >>> > -- > >> >>> > -------------------------------------------- > >> >>> > Andreia J. Amaral > >> >>> > Unidade de Imunologia Clínica > >> >>> > Instituto de Medicina Molecular > >> >>> > Universidade de Lisboa > >> >>> > email: andreiaamaral@fm.ul.pt > >> >>> > andreia.fonseca@gmail.com > >> >>> > > >> >>> > >> >>> > >> >>> > >> >> > >> >> > >> >> -- > >> >> -------------------------------------------- > >> >> Andreia J. Amaral > >> >> Unidade de Imunologia Clínica > >> >> Instituto de Medicina Molecular > >> >> Universidade de Lisboa > >> >> email: andreiaamaral@fm.ul.pt > >> >> andreia.fonseca@gmail.com > >> >> > >> >> > >> >> > >> > > >> > > >> > -- > >> > -------------------------------------------- > >> > Andreia J. Amaral > >> > Unidade de Imunologia Clínica > >> > Instituto de Medicina Molecular > >> > Universidade de Lisboa > >> > email: andreiaamaral@fm.ul.pt > >> > andreia.fonseca@gmail.com > >> > > >> > >> > >> > > > > > > -- > > -------------------------------------------- > > Andreia J. Amaral > > Unidade de Imunologia Clínica > > Instituto de Medicina Molecular > > Universidade de Lisboa > > email: andreiaamaral@fm.ul.pt > > andreia.fonseca@gmail.com > > > > > -- -------------------------------------------- Andreia J. Amaral Unidade de Imunologia Clínica Instituto de Medicina Molecular Universidade de Lisboa email: andreiaamaral@fm.ul.pt andreia.fonseca@gmail.com [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Andreia, On 2 May 2010, at 20:22, Andreia Fonseca wrote: > Hi Heidi, > thanks for your response. I only have one qPCR per probe in each > plate, so I have only one copy of U6 in each plate. Ok, so that why it says standard deviation NA in: > > > Calculating deltaCt values > > Using control gene(s): U6 > > Card 1: Mean=33.62 Stdev=NA > > Card 2: Mean=32.65 Stdev=NA > I have only technical replicates for one sample, lin1a.txt, > lin_1a.txt and lin1b.txt, but each technical replicate was analyzed > in a different plate. So to use the quantile normalization or the > rank invariant methods, do I have to, add an option saying to do > normalization between samples, or I just have to follow the vignette? Just follow the vignette, or ?normalizeCtData. > Another question, do these methods make the normalization between > samples of the same treatment? No, across-plate normalisation is always done for the entire qPCR set object combined! If you specifically want to do it for different sample types individually, then you'd have to do it for subsets of your data individually (e.g. norm1<-normalizeCtData(qPCRset[,1:3, norm="quantile") etc.), and then join the normalised results again afterwards using cbind(). Please note though that this "normalise together or separate" issue is similar to a discussion that was going on in the microarray community for a while. Normalising everything together has the disadvantage of potentially smoothing out small differences between sample types, so they're no longer detectable. However, if you normalise your data separately in groups, you risk inducing artificial differences between them. Therefore, the recommendation is generally to keep everything together! HTH \Heidi > Thanks once again. > With kind regards, > Andreia > > On Sun, May 2, 2010 at 9:34 AM, Heidi Dvinge <heidi@ebi.ac.uk> wrote: > Hi Andreia, > > > Dear Heidi, > > > > So I have each of my biological replicate in a plate, I have used > the > > following command to read the data: > > > >> raw <- readCtData(files=files$File, path=path, n.features = 96, > flag = > >> 4, > > feature = 1, type = 2, position = 3, Ct = 5, header = FALSE, SDS > = FALSE) > > my files: > > > > File Treatment > > 1 Emb1.txt Embrio > > 2 Emb2.txt Embrio > > 3 Emb3.txt Embrio > > 4 lin1a.txt BM > > 5 lin2.txt BM > > 6 lin_1a.txt BM > > 7 lin1b.txt BM > > 8 Sca3.txt Heart > > > > I did the filtering > > raw.cat<-setCategory(raw, groups=files$Treatment, quantile=0.8) > > > > and now I want to normalize: > > > > d.norm<-normalizeCtDataraw.cat,norm="deltaCt", deltaCt.genes=c > ("U6")) > > and then I get the message: > > > > Calculating deltaCt values > > Using control gene(s): U6 > > Card 1: Mean=33.62 Stdev=NA > > Card 2: Mean=32.65 Stdev=NA > > Card 3: Mean=32.47 Stdev=NA > > Card 4: Mean=35.46 Stdev=NA > > Card 5: Mean=34.09 Stdev=NA > > Card 6: Mean=34.39 Stdev=NA > > Card 7: Mean=35.79 Stdev=NA > > Card 8: Mean=33.72 Stdev=NA > > > > so, for the normalization, the package is not taking into account > the > > variation of the endogenous gene U6 across technical and biological > > replicates.How can I make the normalization taking into account > the mean > > and > > standard deviation of the endogenous genes at least across the > biolgical > > replicates of the same treatment? > > With deltaCt normalisation I'm afraid you cannot use information > across > biological replicates in HTqPCR, since these are placed on individual > plates. In general, deltaCt is trying to normalise for the total > amount of > RNA in a single sample, using some sort of reference RNA such as > GAPDH/actin or whatever is suitable in a given experiment, but that > considers each sample individually. Technical replicates within plates > should automatically be taken into account though - how many copies > of U6 > are on each plate? > > If you want to normalise between samples, you'll have to use either > quantile normalisation, or some of the rank invariant methods. > However, > these take all genes on a card as input, not a manually selected > set I'm > afraid. > > HTH > \Heidi > > > Thanks > > Kind regards, > > Andreia > > > > > > > > > > On Fri, Apr 30, 2010 at 8:53 PM, Heidi Dvinge <heidi@ebi.ac.uk> > wrote: > > > >> Hi Andreia > >> > >> > Dear Heidi, > >> > > >> > Sorry it seems that I have not understood your table well. All > the > >> samples > >> > were analyzed using the same platform, but each sample is on a > >> different > >> > plate. This way the commands are different right? > >> > So going back to your table: > >> > > >> > Sample Cell_type Platform RT_reaction > >> > ------------------------------ > >> > ------ > >> > 1 A type1 RT1 > >> > 1 A type1 RT1 > >> > 1 A type1 RT2 > >> > 2 A type1 RT1 > >> > 2 B type1 RT1 > >> > 3 B type1 RT1 > >> > 4 B type1 RT1 > >> > > >> So, you actually just have cell type A and B you want to > compare? That > >> makes it all easier. In that case you can use either of the three > >> functions limmaCtData, ttestCtData and mannwhitneyCtData > depending on > >> what > >> you prefer. Both the vignette and the help pages has examples > about how > >> to > >> compare two groups. ttestCtData and mannwhitneyCtData are probably > >> easier > >> if you just have two conditions, with mannwhitney perhaps being > most > >> suitable if you only have a small number of genes on each qPCR > card. > >> > >> There's no way to account for the single different RT reaction > though. > >> You > >> can check if this is an outlier in any of the QC plots, like > >> plotCtDensity > >> or plotCtBoxes, or perhaps try running the A versus B comparison > both > >> with/without this card, and see if it skews the result in any > unexpected > >> ways. Hopefully different RT reactions should be quite similar > though. > >> > >> HTH > >> \Heidi > >> > >> > Thanks > >> > Andreia > >> > > >> > > >> > On Tue, Apr 27, 2010 at 11:11 AM, Heidi Dvinge <heidi@ebi.ac.uk> > >> wrote: > >> > > >> >> > >> >> On 26 Apr 2010, at 09:41, Andreia Fonseca wrote: > >> >> > >> >> Dear heidi, > >> >> thank you for your reply. The datasets come from a 96 well qPCR > >> plate, > >> >> but > >> >> only 80 reactions have good quality. Each line corresponds to a > >> separate > >> >> plate, with the same probes. I would like to compare A and B. > Thanks > >> for > >> >> your help. > >> >> Kind regards, > >> >> Andreia > >> >> > >> >> Hello Andreia, > >> >> > >> >> so, what you probably want is to include the "Platform" as a > andom > >> >> effect > >> >> using the duplicateCorrelation() function from limma with block > >> >> argument. If > >> >> you type > >> >> > >> >> > limmaUsersGuide() > >> >> > >> >> you'll get the full limma users guide, which includes > examples of how > >> to > >> >> include different kinds of technical and biological > variation. This > >> is > >> >> basically what the limmaCtData() funciton is based on. So > generally > >> >> you'd > >> >> want to first do something like this (with "norm" being your > >> normalised > >> >> qPCRset data): > >> >> > >> >> > targets <- data.frame(Cell=rep(c("A", "B"), times=3:4), > >> >> Platform=rep(c("type1", "type2"), times=c(2,5))) > >> >> > design <- model.matrix(~0+targets$Cell) > >> >> > colnames(design) <- c("CellTypeA", "CellTypeB") > >> >> > dupcor <- duplicateCorrelation(exprs(norm), design=design, > >> >> block=targets$Platform) > >> >> > >> >> and then include the dupcor$correlation and block in the > >> >> limmaCtData.There's a small bug in limmaCtData for including > dupcor > >> >> though - > >> >> I'll get that fixed. > >> >> > >> >> However, int his case that doesn't really matter, since I > don't think > >> >> you > >> >> can analyse you data this way, if you experiment design > really is as > >> >> outlined in "targets" here. All platforms of type1 are used > for cell > >> >> type A, > >> >> there are none present for B. > >> >> > >> >> You might have to just ignore that the samples are measured on 2 > >> >> different > >> >> platforms, and simply do a standard test between cell type A > and B, > >> even > >> >> though this is not optimal. How different are these platforms > - would > >> >> you > >> >> expect it to be significant for your analysis? Have you tried > testing > >> >> for > >> >> significance between Ct values measured on type1 and type2? > Or what's > >> >> the > >> >> dupcor$correlation between them? These are some of the things > you can > >> >> look > >> >> at before deciding how to continue. > >> >> > >> >> Best wishes > >> >> \Heidi > >> >> > >> >> On Sat, Apr 24, 2010 at 12:45 PM, Heidi Dvinge <heidi@ebi.ac.uk> > >> wrote: > >> >> > >> >>> Hello Andreia, > >> >>> > >> >>> hm, slightly tricky. Are these data sets from some sort of > >> commercial > >> >>> platform, like the ABI TLDA cards, or from normal qPCR > reactions on > >> a > >> >>> 384 > >> >>> well plate or similar? Just to make sure I understand you > experiment > >> >>> right, do you have something like this? > >> >>> > >> >>> Sample Cell_type Platform RT_reaction > >> >>> ------------------------------------ > >> >>> 1 A type1 RT1 > >> >>> 1 A type1 RT1 > >> >>> 1 A type2 RT2 > >> >>> 2 A type2 RT1 > >> >>> 2 B type2 RT1 > >> >>> 3 B type2 RT1 > >> >>> 4 B type2 RT1 > >> >>> > >> >>> Is each line a separate plate? Or would the "Platform" column > >> >>> correspond > >> >>> to 2 individual plates, where multiple samples can be loaded > onto > >> each > >> >>> plate? > >> >>> What exactly is your primary interest here? Difference > between cell > >> >>> types > >> >>> A and B? I think we need some specification here, for us to > help you > >> >>> properly. > >> >>> > >> >>> If you have different kinds of plates, then you could > potentially > >> >>> include > >> >>> pate as a batch effect when doing your analysis. This will > require > >> you > >> >>> to > >> >>> use limmaCtData() for your test. This is based on the > function lmFit > >> >>> from > >> >>> the limma package, which has some nice examples of how to > take this > >> >>> sort > >> >>> of information into account. Although it looks like you have > too > >> many > >> >>> variables here to consider all of them. > >> >>> > >> >>> By the way, you don't necessarily need one file per sample. > As long > >> as > >> >>> you > >> >>> have the sample number of genes per sample, e.g. 384, then with > >> HTqPCR > >> >>> version 1.2.0 you can use the n.data parameter in readCtData > () to > >> >>> indicate > >> >>> how many sets of data are in each file. > >> >>> > >> >>> HTH > >> >>> \Heidi > >> >>> > >> >>> > Dear Heidi, > >> >>> > > >> >>> > I have received a data set from a qPCR miRNA set where I have > >> >>> > > >> >>> > sample1 - 2 replicates from the same RT reaction > >> >>> > 1 replicate from an independen RT reaction > >> >>> cell_type:A > >> >>> > sample 2 - cell type : A > >> >>> > sample2 cell_type:B > >> >>> > sample3 cell_type:B > >> >>> > sample 4 cell_type:B > >> >>> > > >> >>> > I saw that I should prepare one file per sample, but the > >> replicates > >> >>> of > >> >>> > sample1 have been produced in different plates and one is > even > >> from > >> >>> > another > >> >>> > RT reaction. How can I account for this using your package? > >> >>> > Thanks in advacne for your reply. > >> >>> > With kind regards, > >> >>> > Andreia > >> >>> > > >> >>> > > >> >>> > -- > >> >>> > -------------------------------------------- > >> >>> > Andreia J. Amaral > >> >>> > Unidade de Imunologia Clínica > >> >>> > Instituto de Medicina Molecular > >> >>> > Universidade de Lisboa > >> >>> > email: andreiaamaral@fm.ul.pt > >> >>> > andreia.fonseca@gmail.com > >> >>> > > >> >>> > >> >>> > >> >>> > >> >> > >> >> > >> >> -- > >> >> -------------------------------------------- > >> >> Andreia J. Amaral > >> >> Unidade de Imunologia Clínica > >> >> Instituto de Medicina Molecular > >> >> Universidade de Lisboa > >> >> email: andreiaamaral@fm.ul.pt > >> >> andreia.fonseca@gmail.com > >> >> > >> >> > >> >> > >> > > >> > > >> > -- > >> > -------------------------------------------- > >> > Andreia J. Amaral > >> > Unidade de Imunologia Clínica > >> > Instituto de Medicina Molecular > >> > Universidade de Lisboa > >> > email: andreiaamaral@fm.ul.pt > >> > andreia.fonseca@gmail.com > >> > > >> > >> > >> > > > > > > -- > > -------------------------------------------- > > Andreia J. Amaral > > Unidade de Imunologia Clínica > > Instituto de Medicina Molecular > > Universidade de Lisboa > > email: andreiaamaral@fm.ul.pt > > andreia.fonseca@gmail.com > > > > > > > > -- > -------------------------------------------- > Andreia J. Amaral > Unidade de Imunologia Clínica > Instituto de Medicina Molecular > Universidade de Lisboa > email: andreiaamaral@fm.ul.pt > andreia.fonseca@gmail.com [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Greetings I have encountered a small problem when using the setCategories method: s2010004660.cat <- setCategory(s2010004660, quantile=0.9, groups=sampleNames(s2010004660)) Categories after Ct.max and Ct.min filtering: wt_4wks mt_4wks wt_17wks mt_17wks OK 582 571 566 578 Undetermined 186 197 202 190 Categories after standard deviation filtering: wt_4wks mt_4wks wt_17wks mt_17wks OK 581 569 565 577 Undetermined 186 197 202 190 There were 50 or more warnings (use warnings() to see the first 50) > warnings() Warning messages: 1: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") : invalid factor level, NAs generated 2: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") : invalid factor level, NAs generated 3: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") : invalid factor level, NAs generated other 47 warnings are the same The categories that you get in the SDS data are "OK" and "Undetermined" and it seems to be unwilling to add the new level "Unreliable". I tried to manually add the levels: featureCategory(s2010004660)$wt_4wks <- factor(featureCategory(s2010004660)$wt_4wks, levels=c(levels(featureCategory(s2010004660)$wt_4wks),"Unreliable")) featureCategory(s2010004660)$mt_4wks <- factor(featureCategory(s2010004660)$mt_4wks, levels=c(levels(featureCategory(s2010004660)$mt_4wks),"Unreliable")) featureCategory(s2010004660)$wt_17wks <- factor(featureCategory(s2010004660)$wt_17wks, levels=c(levels(featureCategory(s2010004660)$wt_17wks),"Unreliable")) featureCategory(s2010004660)$mt_17wks <- factor(featureCategory(s2010004660)$mt_17wks, levels=c(levels(featureCategory(s2010004660)$mt_17wks),"Unreliable")) and get another error Error in count[names(tab), i] <- tab : subscript out of bounds Is this a bug or operator error? Thanks Mike > sessionInfo() R version 2.11.0 (2010-04-22) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] HTqPCR_1.2.0 limma_3.4.0 RColorBrewer_1.0-2 Biobase_2.8.0 loaded via a namespace (and not attached): [1] affy_1.26.0 affyio_1.16.0 gdata_2.7.1 gplots_2.7.4 gtools_2.6.1 preprocessCore_1.10.0 [7] tools_2.11.0 Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806
ADD REPLY
0
Entering edit mode
Hello Mike, On 7 May 2010, at 17:11, Michael Muratet wrote: > Greetings > > I have encountered a small problem when using the setCategories > method: > > s2010004660.cat <- setCategory(s2010004660, quantile=0.9, > groups=sampleNames(s2010004660)) > Categories after Ct.max and Ct.min filtering: > wt_4wks mt_4wks wt_17wks mt_17wks > OK 582 571 566 578 > Undetermined 186 197 202 190 > Categories after standard deviation filtering: > wt_4wks mt_4wks wt_17wks mt_17wks > OK 581 569 565 577 > Undetermined 186 197 202 190 > There were 50 or more warnings (use warnings() to see the first 50) > > warnings() > Warning messages: > 1: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") : > invalid factor level, NAs generated > 2: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") : > invalid factor level, NAs generated > 3: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") : > invalid factor level, NAs generated > Hm, I can't reproduce this with the standard data sets in HTqPCR, although this also only contains "OK" and "Undetermined" initially: > data(qPCRraw) > apply(featureCategory(qPCRraw), 2, table) sample1 sample2 sample3 sample4 sample5 sample6 OK 353 312 360 339 335 336 Undetermined 31 72 24 45 49 48 > setCategory(qPCRraw, quantile=0.9, groups=rep(c("A", "B"), 3)) Categories after Ct.max and Ct.min filtering: sample1 sample2 sample3 sample4 sample5 sample6 OK 313 264 327 295 296 286 Undetermined 68 119 56 86 86 96 Unreliable 3 1 1 3 2 2 Categories after standard deviation filtering: sample1 sample2 sample3 sample4 sample5 sample6 OK 309 258 323 291 292 281 Undetermined 68 119 56 86 86 96 Unreliable 7 7 5 7 6 7 How does featureCategory() of your new object look after you run setCategory? Also, it seems like all your Ct values fall within the default range given by Ct.max and Ct.min in setCategory(), hence none of the categories are adjusted during the "first round", before the standard deviation filtering. What happens if you set one of Ct.max or Ct.min so that some values are called as "Unreliable" based on this, e.g. by saying Ct.max=25? Do you still get the same warning? Cheers \Heidi > other 47 warnings are the same > > The categories that you get in the SDS data are "OK" and > "Undetermined" and it seems to be unwilling to add the new level > "Unreliable". I tried to manually add the levels: > > featureCategory(s2010004660)$wt_4wks <- factor(featureCategory > (s2010004660)$wt_4wks, levels=c(levels(featureCategory(s2010004660) > $wt_4wks),"Unreliable")) > featureCategory(s2010004660)$mt_4wks <- factor(featureCategory > (s2010004660)$mt_4wks, levels=c(levels(featureCategory(s2010004660) > $mt_4wks),"Unreliable")) > featureCategory(s2010004660)$wt_17wks <- factor(featureCategory > (s2010004660)$wt_17wks, levels=c(levels(featureCategory(s2010004660) > $wt_17wks),"Unreliable")) > featureCategory(s2010004660)$mt_17wks <- factor(featureCategory > (s2010004660)$mt_17wks, levels=c(levels(featureCategory(s2010004660) > $mt_17wks),"Unreliable")) > > and get another error > > Error in count[names(tab), i] <- tab : subscript out of bounds > > Is this a bug or operator error? > > Thanks > > Mike > > > sessionInfo() > R version 2.11.0 (2010-04-22) > i386-apple-darwin9.8.0 > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] HTqPCR_1.2.0 limma_3.4.0 RColorBrewer_1.0-2 > Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] affy_1.26.0 affyio_1.16.0 > gdata_2.7.1 gplots_2.7.4 gtools_2.6.1 > preprocessCore_1.10.0 > [7] tools_2.11.0 > > Michael Muratet, Ph.D. > Senior Scientist > HudsonAlpha Institute for Biotechnology > mmuratet at hudsonalpha.org > (256) 327-0473 (p) > (256) 327-0966 (f) > > Room 4005 > 601 Genome Way > Huntsville, Alabama 35806 > > > >
ADD REPLY
0
Entering edit mode
On May 10, 2010, at 9:05 AM, Heidi Dvinge wrote: > Hello Mike, > > On 7 May 2010, at 17:11, Michael Muratet wrote: > >> Greetings >> >> I have encountered a small problem when using the setCategories >> method: >> >> s2010004660.cat <- setCategory(s2010004660, quantile=0.9, >> groups=sampleNames(s2010004660)) >> Categories after Ct.max and Ct.min filtering: >> wt_4wks mt_4wks wt_17wks mt_17wks >> OK 582 571 566 578 >> Undetermined 186 197 202 190 >> Categories after standard deviation filtering: >> wt_4wks mt_4wks wt_17wks mt_17wks >> OK 581 569 565 577 >> Undetermined 186 197 202 190 >> There were 50 or more warnings (use warnings() to see the first 50) >> > warnings() >> Warning messages: >> 1: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") : >> invalid factor level, NAs generated >> 2: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") : >> invalid factor level, NAs generated >> 3: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") : >> invalid factor level, NAs generated >> > Hm, I can't reproduce this with the standard data sets in HTqPCR, > although this also only contains "OK" and "Undetermined" initially: > > > data(qPCRraw) > > apply(featureCategory(qPCRraw), 2, table) > sample1 sample2 sample3 sample4 sample5 sample6 > OK 353 312 360 339 335 336 > Undetermined 31 72 24 45 49 48 > > setCategory(qPCRraw, quantile=0.9, groups=rep(c("A", "B"), 3)) > Categories after Ct.max and Ct.min filtering: > sample1 sample2 sample3 sample4 sample5 sample6 > OK 313 264 327 295 296 286 > Undetermined 68 119 56 86 86 96 > Unreliable 3 1 1 3 2 2 > Categories after standard deviation filtering: > sample1 sample2 sample3 sample4 sample5 sample6 > OK 309 258 323 291 292 281 > Undetermined 68 119 56 86 86 96 > Unreliable 7 7 5 7 6 7 > > How does featureCategory() of your new object look after you run > setCategory? Also, it seems like all your Ct values fall within the > default range given by Ct.max and Ct.min in setCategory(), hence > none of the categories are adjusted during the "first round", before > the standard deviation filtering. What happens if you set one of > Ct.max or Ct.min so that some values are called as "Unreliable" > based on this, e.g. by saying Ct.max=25? Do you still get the same > warning? Heidi I have traced the problem to this line in setCategory by setting the options so that warnings are errors: featureCategory(out)[split.by==gene,groups==g][index] <- "Unreliable" I have never seen dual indices like this on a data frame before, can you tell me what this does? If I force the data to be called "Unreliable" by setting Ct.min I can get the same failure here: featureCategory(out)[data < Ct.min] <- "Unreliable" If you reference a single dimension from a data frame like this, don't you set/get just a column? I thought you had to use something like sapply in a case like this. I note that any of the 'primitive' data sets, i.e., the ones that I read with readCtData can be used in setCategory without a problem. Once I rbind the two data types together, I get the problem. I have looked at the rbind.qPCRset code and I don't see anything that would cause this to be the case but I suspect my problem is somehow tied up some artifact of data assembly. Thanks Mike > > Cheers > \Heidi > > >> other 47 warnings are the same >> >> The categories that you get in the SDS data are "OK" and >> "Undetermined" and it seems to be unwilling to add the new level >> "Unreliable". I tried to manually add the levels: >> >> featureCategory(s2010004660)$wt_4wks <- >> factor(featureCategory(s2010004660)$wt_4wks, >> levels=c(levels(featureCategory(s2010004660)$wt_4wks),"Unreliable")) >> featureCategory(s2010004660)$mt_4wks <- >> factor(featureCategory(s2010004660)$mt_4wks, >> levels=c(levels(featureCategory(s2010004660)$mt_4wks),"Unreliable")) >> featureCategory(s2010004660)$wt_17wks <- >> factor(featureCategory(s2010004660)$wt_17wks, >> levels=c(levels(featureCategory(s2010004660)$wt_17wks),"Unreliable")) >> featureCategory(s2010004660)$mt_17wks <- >> factor(featureCategory(s2010004660)$mt_17wks, >> levels=c(levels(featureCategory(s2010004660)$mt_17wks),"Unreliable")) >> >> and get another error >> >> Error in count[names(tab), i] <- tab : subscript out of bounds >> >> Is this a bug or operator error? >> >> Thanks >> >> Mike >> >> > sessionInfo() >> R version 2.11.0 (2010-04-22) >> i386-apple-darwin9.8.0 >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] HTqPCR_1.2.0 limma_3.4.0 RColorBrewer_1.0-2 >> Biobase_2.8.0 >> >> loaded via a namespace (and not attached): >> [1] affy_1.26.0 affyio_1.16.0 >> gdata_2.7.1 gplots_2.7.4 gtools_2.6.1 >> preprocessCore_1.10.0 >> [7] tools_2.11.0 >> >> Michael Muratet, Ph.D. >> Senior Scientist >> HudsonAlpha Institute for Biotechnology >> mmuratet at hudsonalpha.org >> (256) 327-0473 (p) >> (256) 327-0966 (f) >> >> Room 4005 >> 601 Genome Way >> Huntsville, Alabama 35806 >> >> >> >> > Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806
ADD REPLY
0
Entering edit mode
Hi Mike, you're absolutely right that the issue turned out to be with rbind of qPCRset! The feature categories were intended to be stored as characters, not factors, so that uses can easily add their own categories if they want. However, in rbind I've omitted the parameter "stringsAsFactors=FALSE" in a call to as.data.frame(), so I get: > data(qPCRraw) > class(featureCategory(qPCRraw)) [1] "data.frame" > class(featureCategory(qPCRraw)[,1]) [1] "character" > test <- rbind(qPCRraw[1:4,], qPCRraw[1:40,]) > class(featureCategory(test)) [1] "data.frame" > class(featureCategory(test)[,1]) [1] "factor" I'll sanitise the rbind (and probably also the cbind function), but I'm afraid it'll take a while for these changes to get into BioC devel. Unfortunately, in the meantime, you'll have to manually transfer your featureCategory back to a data frame of characters instead of factors: > featureCategory(test) <- data.frame(apply(featureCategory(test), 2, as.character), stringsAsFactors=FALSE) > class(featureCategory(test)[,1]) [1] "character" > setCategory(test, groups=rep(c("A", "B"),3), Ct.min=20) Categories after Ct.max and Ct.min filtering: sample1 sample2 sample3 sample4 sample5 sample6 OK 29 32 34 32 32 33 Undetermined 6 7 5 4 7 5 Unreliable 9 5 5 8 5 6 Categories after standard deviation filtering: sample1 sample2 sample3 sample4 sample5 sample6 OK 29 32 34 32 32 33 Undetermined 6 7 5 4 7 5 Unreliable 9 5 5 8 5 6 Yet another item on the ever-growing HTqPCR to-do list... \Heidi > > On May 10, 2010, at 9:05 AM, Heidi Dvinge wrote: > >> Hello Mike, >> >> On 7 May 2010, at 17:11, Michael Muratet wrote: >> >>> Greetings >>> >>> I have encountered a small problem when using the setCategories >>> method: >>> >>> s2010004660.cat <- setCategory(s2010004660, quantile=0.9, >>> groups=sampleNames(s2010004660)) >>> Categories after Ct.max and Ct.min filtering: >>> wt_4wks mt_4wks wt_17wks mt_17wks >>> OK 582 571 566 578 >>> Undetermined 186 197 202 190 >>> Categories after standard deviation filtering: >>> wt_4wks mt_4wks wt_17wks mt_17wks >>> OK 581 569 565 577 >>> Undetermined 186 197 202 190 >>> There were 50 or more warnings (use warnings() to see the first 50) >>> > warnings() >>> Warning messages: >>> 1: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") : >>> invalid factor level, NAs generated >>> 2: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") : >>> invalid factor level, NAs generated >>> 3: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") : >>> invalid factor level, NAs generated >>> >> Hm, I can't reproduce this with the standard data sets in HTqPCR, >> although this also only contains "OK" and "Undetermined" initially: >> >> > data(qPCRraw) >> > apply(featureCategory(qPCRraw), 2, table) >> sample1 sample2 sample3 sample4 sample5 sample6 >> OK 353 312 360 339 335 336 >> Undetermined 31 72 24 45 49 48 >> > setCategory(qPCRraw, quantile=0.9, groups=rep(c("A", "B"), 3)) >> Categories after Ct.max and Ct.min filtering: >> sample1 sample2 sample3 sample4 sample5 sample6 >> OK 313 264 327 295 296 286 >> Undetermined 68 119 56 86 86 96 >> Unreliable 3 1 1 3 2 2 >> Categories after standard deviation filtering: >> sample1 sample2 sample3 sample4 sample5 sample6 >> OK 309 258 323 291 292 281 >> Undetermined 68 119 56 86 86 96 >> Unreliable 7 7 5 7 6 7 >> >> How does featureCategory() of your new object look after you run >> setCategory? Also, it seems like all your Ct values fall within the >> default range given by Ct.max and Ct.min in setCategory(), hence >> none of the categories are adjusted during the "first round", before >> the standard deviation filtering. What happens if you set one of >> Ct.max or Ct.min so that some values are called as "Unreliable" >> based on this, e.g. by saying Ct.max=25? Do you still get the same >> warning? > > Heidi > > I have traced the problem to this line in setCategory by setting the > options so that warnings are errors: > > featureCategory(out)[split.by==gene,groups==g][index] <- "Unreliable" > > I have never seen dual indices like this on a data frame before, can > you tell me what this does? > > If I force the data to be called "Unreliable" by setting Ct.min I can > get the same failure here: > > featureCategory(out)[data < Ct.min] <- "Unreliable" > > If you reference a single dimension from a data frame like this, don't > you set/get just a column? I thought you had to use something like > sapply in a case like this. > > I note that any of the 'primitive' data sets, i.e., the ones that I > read with readCtData can be used in setCategory without a problem. > Once I rbind the two data types together, I get the problem. I have > looked at the rbind.qPCRset code and I don't see anything that would > cause this to be the case but I suspect my problem is somehow tied up > some artifact of data assembly. > > Thanks > > Mike > > > > > >> >> Cheers >> \Heidi >> >> >>> other 47 warnings are the same >>> >>> The categories that you get in the SDS data are "OK" and >>> "Undetermined" and it seems to be unwilling to add the new level >>> "Unreliable". I tried to manually add the levels: >>> >>> featureCategory(s2010004660)$wt_4wks <- >>> factor(featureCategory(s2010004660)$wt_4wks, >>> levels=c(levels(featureCategory(s2010004660)$wt_4wks),"Unreliable")) >>> featureCategory(s2010004660)$mt_4wks <- >>> factor(featureCategory(s2010004660)$mt_4wks, >>> levels=c(levels(featureCategory(s2010004660)$mt_4wks),"Unreliable")) >>> featureCategory(s2010004660)$wt_17wks <- >>> factor(featureCategory(s2010004660)$wt_17wks, >>> levels=c(levels(featureCategory(s2010004660)$wt_17wks),"Unreliable")) >>> featureCategory(s2010004660)$mt_17wks <- >>> factor(featureCategory(s2010004660)$mt_17wks, >>> levels=c(levels(featureCategory(s2010004660)$mt_17wks),"Unreliable")) >>> >>> and get another error >>> >>> Error in count[names(tab), i] <- tab : subscript out of bounds >>> >>> Is this a bug or operator error? >>> >>> Thanks >>> >>> Mike >>> >>> > sessionInfo() >>> R version 2.11.0 (2010-04-22) >>> i386-apple-darwin9.8.0 >>> >>> locale: >>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] HTqPCR_1.2.0 limma_3.4.0 RColorBrewer_1.0-2 >>> Biobase_2.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] affy_1.26.0 affyio_1.16.0 >>> gdata_2.7.1 gplots_2.7.4 gtools_2.6.1 >>> preprocessCore_1.10.0 >>> [7] tools_2.11.0 >>> >>> Michael Muratet, Ph.D. >>> Senior Scientist >>> HudsonAlpha Institute for Biotechnology >>> mmuratet at hudsonalpha.org >>> (256) 327-0473 (p) >>> (256) 327-0966 (f) >>> >>> Room 4005 >>> 601 Genome Way >>> Huntsville, Alabama 35806 >>> >>> >>> >>> >> > > Michael Muratet, Ph.D. > Senior Scientist > HudsonAlpha Institute for Biotechnology > mmuratet at hudsonalpha.org > (256) 327-0473 (p) > (256) 327-0966 (f) > > Room 4005 > 601 Genome Way > Huntsville, Alabama 35806 > > > > >
ADD REPLY
0
Entering edit mode
On May 10, 2010, at 5:11 PM, Heidi Dvinge wrote: > Hi Mike, > > you're absolutely right that the issue turned out to be with rbind of > qPCRset! The feature categories were intended to be stored as > characters, > not factors, so that uses can easily add their own categories if they > want. However, in rbind I've omitted the parameter > "stringsAsFactors=FALSE" in a call to as.data.frame(), so I get: Thanks Heidi! Manual is OK, as long as there's a path I can keep moving forward Mike > >> data(qPCRraw) >> class(featureCategory(qPCRraw)) > [1] "data.frame" >> class(featureCategory(qPCRraw)[,1]) > [1] "character" >> test <- rbind(qPCRraw[1:4,], qPCRraw[1:40,]) >> class(featureCategory(test)) > [1] "data.frame" >> class(featureCategory(test)[,1]) > [1] "factor" > > I'll sanitise the rbind (and probably also the cbind function), but > I'm > afraid it'll take a while for these changes to get into BioC devel. > > Unfortunately, in the meantime, you'll have to manually transfer your > featureCategory back to a data frame of characters instead of factors: > >> featureCategory(test) <- data.frame(apply(featureCategory(test), 2, > as.character), stringsAsFactors=FALSE) >> class(featureCategory(test)[,1]) > [1] "character" >> setCategory(test, groups=rep(c("A", "B"),3), Ct.min=20) > Categories after Ct.max and Ct.min filtering: > sample1 sample2 sample3 sample4 sample5 sample6 > OK 29 32 34 32 32 33 > Undetermined 6 7 5 4 7 5 > Unreliable 9 5 5 8 5 6 > Categories after standard deviation filtering: > sample1 sample2 sample3 sample4 sample5 sample6 > OK 29 32 34 32 32 33 > Undetermined 6 7 5 4 7 5 > Unreliable 9 5 5 8 5 6 > > Yet another item on the ever-growing HTqPCR to-do list... > \Heidi > >> >> On May 10, 2010, at 9:05 AM, Heidi Dvinge wrote: >> >>> Hello Mike, >>> >>> On 7 May 2010, at 17:11, Michael Muratet wrote: >>> >>>> Greetings >>>> >>>> I have encountered a small problem when using the setCategories >>>> method: >>>> >>>> s2010004660.cat <- setCategory(s2010004660, quantile=0.9, >>>> groups=sampleNames(s2010004660)) >>>> Categories after Ct.max and Ct.min filtering: >>>> wt_4wks mt_4wks wt_17wks mt_17wks >>>> OK 582 571 566 578 >>>> Undetermined 186 197 202 190 >>>> Categories after standard deviation filtering: >>>> wt_4wks mt_4wks wt_17wks mt_17wks >>>> OK 581 569 565 577 >>>> Undetermined 186 197 202 190 >>>> There were 50 or more warnings (use warnings() to see the first 50) >>>>> warnings() >>>> Warning messages: >>>> 1: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") : >>>> invalid factor level, NAs generated >>>> 2: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") : >>>> invalid factor level, NAs generated >>>> 3: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") : >>>> invalid factor level, NAs generated >>>> >>> Hm, I can't reproduce this with the standard data sets in HTqPCR, >>> although this also only contains "OK" and "Undetermined" initially: >>> >>>> data(qPCRraw) >>>> apply(featureCategory(qPCRraw), 2, table) >>> sample1 sample2 sample3 sample4 sample5 sample6 >>> OK 353 312 360 339 335 336 >>> Undetermined 31 72 24 45 49 48 >>>> setCategory(qPCRraw, quantile=0.9, groups=rep(c("A", "B"), 3)) >>> Categories after Ct.max and Ct.min filtering: >>> sample1 sample2 sample3 sample4 sample5 sample6 >>> OK 313 264 327 295 296 286 >>> Undetermined 68 119 56 86 86 96 >>> Unreliable 3 1 1 3 2 2 >>> Categories after standard deviation filtering: >>> sample1 sample2 sample3 sample4 sample5 sample6 >>> OK 309 258 323 291 292 281 >>> Undetermined 68 119 56 86 86 96 >>> Unreliable 7 7 5 7 6 7 >>> >>> How does featureCategory() of your new object look after you run >>> setCategory? Also, it seems like all your Ct values fall within the >>> default range given by Ct.max and Ct.min in setCategory(), hence >>> none of the categories are adjusted during the "first round", before >>> the standard deviation filtering. What happens if you set one of >>> Ct.max or Ct.min so that some values are called as "Unreliable" >>> based on this, e.g. by saying Ct.max=25? Do you still get the same >>> warning? >> >> Heidi >> >> I have traced the problem to this line in setCategory by setting the >> options so that warnings are errors: >> >> featureCategory(out)[split.by==gene,groups==g][index] <- "Unreliable" >> >> I have never seen dual indices like this on a data frame before, can >> you tell me what this does? >> >> If I force the data to be called "Unreliable" by setting Ct.min I can >> get the same failure here: >> >> featureCategory(out)[data < Ct.min] <- "Unreliable" >> >> If you reference a single dimension from a data frame like this, >> don't >> you set/get just a column? I thought you had to use something like >> sapply in a case like this. >> >> I note that any of the 'primitive' data sets, i.e., the ones that I >> read with readCtData can be used in setCategory without a problem. >> Once I rbind the two data types together, I get the problem. I have >> looked at the rbind.qPCRset code and I don't see anything that would >> cause this to be the case but I suspect my problem is somehow tied up >> some artifact of data assembly. >> >> Thanks >> >> Mike >> >> >> >> >> >>> >>> Cheers >>> \Heidi >>> >>> >>>> other 47 warnings are the same >>>> >>>> The categories that you get in the SDS data are "OK" and >>>> "Undetermined" and it seems to be unwilling to add the new level >>>> "Unreliable". I tried to manually add the levels: >>>> >>>> featureCategory(s2010004660)$wt_4wks <- >>>> factor(featureCategory(s2010004660)$wt_4wks, >>>> levels >>>> =c(levels(featureCategory(s2010004660)$wt_4wks),"Unreliable")) >>>> featureCategory(s2010004660)$mt_4wks <- >>>> factor(featureCategory(s2010004660)$mt_4wks, >>>> levels >>>> =c(levels(featureCategory(s2010004660)$mt_4wks),"Unreliable")) >>>> featureCategory(s2010004660)$wt_17wks <- >>>> factor(featureCategory(s2010004660)$wt_17wks, >>>> levels >>>> =c(levels(featureCategory(s2010004660)$wt_17wks),"Unreliable")) >>>> featureCategory(s2010004660)$mt_17wks <- >>>> factor(featureCategory(s2010004660)$mt_17wks, >>>> levels >>>> =c(levels(featureCategory(s2010004660)$mt_17wks),"Unreliable")) >>>> >>>> and get another error >>>> >>>> Error in count[names(tab), i] <- tab : subscript out of bounds >>>> >>>> Is this a bug or operator error? >>>> >>>> Thanks >>>> >>>> Mike >>>> >>>>> sessionInfo() >>>> R version 2.11.0 (2010-04-22) >>>> i386-apple-darwin9.8.0 >>>> >>>> locale: >>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods >>>> base >>>> >>>> other attached packages: >>>> [1] HTqPCR_1.2.0 limma_3.4.0 RColorBrewer_1.0-2 >>>> Biobase_2.8.0 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] affy_1.26.0 affyio_1.16.0 >>>> gdata_2.7.1 gplots_2.7.4 gtools_2.6.1 >>>> preprocessCore_1.10.0 >>>> [7] tools_2.11.0 >>>> >>>> Michael Muratet, Ph.D. >>>> Senior Scientist >>>> HudsonAlpha Institute for Biotechnology >>>> mmuratet at hudsonalpha.org >>>> (256) 327-0473 (p) >>>> (256) 327-0966 (f) >>>> >>>> Room 4005 >>>> 601 Genome Way >>>> Huntsville, Alabama 35806 >>>> >>>> >>>> >>>> >>> >> >> Michael Muratet, Ph.D. >> Senior Scientist >> HudsonAlpha Institute for Biotechnology >> mmuratet at hudsonalpha.org >> (256) 327-0473 (p) >> (256) 327-0966 (f) >> >> Room 4005 >> 601 Genome Way >> Huntsville, Alabama 35806 >> >> >> >> >> > > Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806
ADD REPLY
0
Entering edit mode
Heidi Works famously now. Thanks for all the help. I have another question which may be more about how to use R than HTqPCR, but here it is. My samples consist of some number of wells with one replicate, some with two replicates (experimental) and some number with up to 8 replicates (controls). I have tried to use ttestCtData to calculate p values and get: Error in t.test.default(x[, g1], x[, g2], alternative = alternative, paired = paired, : not enough 'x' observations I am assuming this is because the method cannot calculate a t test on a single replicate and does not partition the features into replicates itself. I don't see anything in the vignette or manual about splitting data like this and so I set about to split the data into features with only one and two or more replicates as it is in setCategory. Now I have a list where each element is a data frame of one or more entries. The question is: what's the best way to select from the list all those elements where the data.frame has more than one element and make a new qPCRset object? Or have I been using the ttestCtData method incorrectly? Thanks Mike On May 10, 2010, at 5:11 PM, Heidi Dvinge wrote: > Hi Mike, > > you're absolutely right that the issue turned out to be with rbind of > qPCRset! The feature categories were intended to be stored as > characters, > not factors, so that uses can easily add their own categories if they > want. However, in rbind I've omitted the parameter > "stringsAsFactors=FALSE" in a call to as.data.frame(), so I get: > >> data(qPCRraw) >> class(featureCategory(qPCRraw)) > [1] "data.frame" >> class(featureCategory(qPCRraw)[,1]) > [1] "character" >> test <- rbind(qPCRraw[1:4,], qPCRraw[1:40,]) >> class(featureCategory(test)) > [1] "data.frame" >> class(featureCategory(test)[,1]) > [1] "factor" > > I'll sanitise the rbind (and probably also the cbind function), but > I'm > afraid it'll take a while for these changes to get into BioC devel. > > Unfortunately, in the meantime, you'll have to manually transfer your > featureCategory back to a data frame of characters instead of factors: > >> featureCategory(test) <- data.frame(apply(featureCategory(test), 2, > as.character), stringsAsFactors=FALSE) >> class(featureCategory(test)[,1]) > [1] "character" >> setCategory(test, groups=rep(c("A", "B"),3), Ct.min=20) > Categories after Ct.max and Ct.min filtering: > sample1 sample2 sample3 sample4 sample5 sample6 > OK 29 32 34 32 32 33 > Undetermined 6 7 5 4 7 5 > Unreliable 9 5 5 8 5 6 > Categories after standard deviation filtering: > sample1 sample2 sample3 sample4 sample5 sample6 > OK 29 32 34 32 32 33 > Undetermined 6 7 5 4 7 5 > Unreliable 9 5 5 8 5 6 > > Yet another item on the ever-growing HTqPCR to-do list... > \Heidi > >> >> On May 10, 2010, at 9:05 AM, Heidi Dvinge wrote: >> >>> Hello Mike, >>> >>> On 7 May 2010, at 17:11, Michael Muratet wrote: >>> >>>> Greetings >>>> >>>> I have encountered a small problem when using the setCategories >>>> method: >>>> >>>> s2010004660.cat <- setCategory(s2010004660, quantile=0.9, >>>> groups=sampleNames(s2010004660)) >>>> Categories after Ct.max and Ct.min filtering: >>>> wt_4wks mt_4wks wt_17wks mt_17wks >>>> OK 582 571 566 578 >>>> Undetermined 186 197 202 190 >>>> Categories after standard deviation filtering: >>>> wt_4wks mt_4wks wt_17wks mt_17wks >>>> OK 581 569 565 577 >>>> Undetermined 186 197 202 190 >>>> There were 50 or more warnings (use warnings() to see the first 50) >>>>> warnings() >>>> Warning messages: >>>> 1: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") : >>>> invalid factor level, NAs generated >>>> 2: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") : >>>> invalid factor level, NAs generated >>>> 3: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") : >>>> invalid factor level, NAs generated >>>> >>> Hm, I can't reproduce this with the standard data sets in HTqPCR, >>> although this also only contains "OK" and "Undetermined" initially: >>> >>>> data(qPCRraw) >>>> apply(featureCategory(qPCRraw), 2, table) >>> sample1 sample2 sample3 sample4 sample5 sample6 >>> OK 353 312 360 339 335 336 >>> Undetermined 31 72 24 45 49 48 >>>> setCategory(qPCRraw, quantile=0.9, groups=rep(c("A", "B"), 3)) >>> Categories after Ct.max and Ct.min filtering: >>> sample1 sample2 sample3 sample4 sample5 sample6 >>> OK 313 264 327 295 296 286 >>> Undetermined 68 119 56 86 86 96 >>> Unreliable 3 1 1 3 2 2 >>> Categories after standard deviation filtering: >>> sample1 sample2 sample3 sample4 sample5 sample6 >>> OK 309 258 323 291 292 281 >>> Undetermined 68 119 56 86 86 96 >>> Unreliable 7 7 5 7 6 7 >>> >>> How does featureCategory() of your new object look after you run >>> setCategory? Also, it seems like all your Ct values fall within the >>> default range given by Ct.max and Ct.min in setCategory(), hence >>> none of the categories are adjusted during the "first round", before >>> the standard deviation filtering. What happens if you set one of >>> Ct.max or Ct.min so that some values are called as "Unreliable" >>> based on this, e.g. by saying Ct.max=25? Do you still get the same >>> warning? >> >> Heidi >> >> I have traced the problem to this line in setCategory by setting the >> options so that warnings are errors: >> >> featureCategory(out)[split.by==gene,groups==g][index] <- "Unreliable" >> >> I have never seen dual indices like this on a data frame before, can >> you tell me what this does? >> >> If I force the data to be called "Unreliable" by setting Ct.min I can >> get the same failure here: >> >> featureCategory(out)[data < Ct.min] <- "Unreliable" >> >> If you reference a single dimension from a data frame like this, >> don't >> you set/get just a column? I thought you had to use something like >> sapply in a case like this. >> >> I note that any of the 'primitive' data sets, i.e., the ones that I >> read with readCtData can be used in setCategory without a problem. >> Once I rbind the two data types together, I get the problem. I have >> looked at the rbind.qPCRset code and I don't see anything that would >> cause this to be the case but I suspect my problem is somehow tied up >> some artifact of data assembly. >> >> Thanks >> >> Mike >> >> >> >> >> >>> >>> Cheers >>> \Heidi >>> >>> >>>> other 47 warnings are the same >>>> >>>> The categories that you get in the SDS data are "OK" and >>>> "Undetermined" and it seems to be unwilling to add the new level >>>> "Unreliable". I tried to manually add the levels: >>>> >>>> featureCategory(s2010004660)$wt_4wks <- >>>> factor(featureCategory(s2010004660)$wt_4wks, >>>> levels >>>> =c(levels(featureCategory(s2010004660)$wt_4wks),"Unreliable")) >>>> featureCategory(s2010004660)$mt_4wks <- >>>> factor(featureCategory(s2010004660)$mt_4wks, >>>> levels >>>> =c(levels(featureCategory(s2010004660)$mt_4wks),"Unreliable")) >>>> featureCategory(s2010004660)$wt_17wks <- >>>> factor(featureCategory(s2010004660)$wt_17wks, >>>> levels >>>> =c(levels(featureCategory(s2010004660)$wt_17wks),"Unreliable")) >>>> featureCategory(s2010004660)$mt_17wks <- >>>> factor(featureCategory(s2010004660)$mt_17wks, >>>> levels >>>> =c(levels(featureCategory(s2010004660)$mt_17wks),"Unreliable")) >>>> >>>> and get another error >>>> >>>> Error in count[names(tab), i] <- tab : subscript out of bounds >>>> >>>> Is this a bug or operator error? >>>> >>>> Thanks >>>> >>>> Mike >>>> >>>>> sessionInfo() >>>> R version 2.11.0 (2010-04-22) >>>> i386-apple-darwin9.8.0 >>>> >>>> locale: >>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods >>>> base >>>> >>>> other attached packages: >>>> [1] HTqPCR_1.2.0 limma_3.4.0 RColorBrewer_1.0-2 >>>> Biobase_2.8.0 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] affy_1.26.0 affyio_1.16.0 >>>> gdata_2.7.1 gplots_2.7.4 gtools_2.6.1 >>>> preprocessCore_1.10.0 >>>> [7] tools_2.11.0 >>>> >>>> Michael Muratet, Ph.D. >>>> Senior Scientist >>>> HudsonAlpha Institute for Biotechnology >>>> mmuratet at hudsonalpha.org >>>> (256) 327-0473 (p) >>>> (256) 327-0966 (f) >>>> >>>> Room 4005 >>>> 601 Genome Way >>>> Huntsville, Alabama 35806 >>>> >>>> >>>> >>>> >>> >> >> Michael Muratet, Ph.D. >> Senior Scientist >> HudsonAlpha Institute for Biotechnology >> mmuratet at hudsonalpha.org >> (256) 327-0473 (p) >> (256) 327-0966 (f) >> >> Room 4005 >> 601 Genome Way >> Huntsville, Alabama 35806 >> >> >> >> >> > > Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806
ADD REPLY
0
Entering edit mode
Greetings I have answered my own earlier question. I split the expression data on row names, selected for data frames with more than one row in the resulting list with sapply, selected from the qPCRset object using rownames(exprs(qpcr)) %in% names(duplicate_list). I can now calculate t test values. Thanks Mike Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806
ADD REPLY

Login before adding your answer.

Traffic: 735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6