HTqPCR problems
1
0
Entering edit mode
Simon Melov ▴ 340
@simon-melov-266
Last seen 9.6 years ago
Hi, I'm having some troubles selectively sub-setting, and graphing up QPCR data within HTqPCR for Biomark plates (both 48.48 and 96.96 plates). I'd like to be able to visualize specific genes, with specific groups we run routinely on our Biomark system. Typical runs are across multiple plates, and have multiple biological replicates, and usually 2 or more technical replicates (although we are moving away from technical reps, as the CVs are so tight). Can anyone help with this? Heidi? raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48, n.data=48, samples=samples) #Ive read the samples in from a separate file, as when you read it in, it doesnt take the sample names supplied in the biomark output# #Next, I want to plot genes of interest, with samples of interest, and I'm having trouble getting an appropriate output# g=featureNames(raw6)[1:2] plotCtOverview(raw6, genes=g, groups=groupID$Treatment, col=rainbow(5)) #This plots 1 gene across all 48 samples# #but the legend doesnt behave, its placed on top of the plot, and I cant get it to display in a non-overlapping fashion# #I've tried all sorts of things in par, but nothing seems to shift the legend's position# #I now want to plot a subset of the samples for specific genes# > LOY=subset(groupID,groupID$Treatment=="LO" | groupID$Treatment== "LFY") > LOY Sample Treatment 2 L20 LFY 5 L30 LFY 7 L45 LO 20 L40 LO 27 L43 LO 33 L29 LFY 36 L38 LO 40 L39 LO 43 L23 LFY > plotCtOverview(raw6, genes=g, groups=LOY, col=rainbow(5)) Warning messages: 1: In split.default(t(x), sample.split) : data length is not a multiple of split variable 2: In qt(p, df, lower.tail, log.p) : NaNs produced > #it displays the two groups defined by treatment, but doesnt do so nicely, very skinny bars, and the legend doesnt reflect what its displaying# #again, I've tried monkeying around with par, but not sure what HTqPCR is calling to make the plots# please help! thanks Simon.
HTqPCR HTqPCR • 1.5k views
ADD COMMENT
0
Entering edit mode
Heidi Dvinge ★ 2.0k
@heidi-dvinge-2195
Last seen 9.6 years ago
> Hi, > I'm having some troubles selectively sub-setting, and graphing up QPCR > data within HTqPCR for Biomark plates (both 48.48 and 96.96 plates). I'd > like to be able to visualize specific genes, with specific groups we run > routinely on our Biomark system. Typical runs are across multiple plates, > and have multiple biological replicates, and usually 2 or more technical > replicates (although we are moving away from technical reps, as the CVs > are so tight). > > Can anyone help with this? Heidi? > > raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48, > n.data=48, samples=samples) > #Ive read the samples in from a separate file, as when you read it in, it > doesnt take the sample names supplied in the biomark output# > #Next, I want to plot genes of interest, with samples of interest, and I'm > having trouble getting an appropriate output# > > g=featureNames(raw6)[1:2] > plotCtOverview(raw6, genes=g, groups=groupID$Treatment, col=rainbow(5)) > > #This plots 1 gene across all 48 samples# > #but the legend doesnt behave, its placed on top of the plot, and I cant > get it to display in a non-overlapping fashion# > #I've tried all sorts of things in par, but nothing seems to shift the > legend's position# > As the old saying goes, whenever you want a job done well, you'll have to do it yourself ;). In this case, the easiest thing is probably to use legend=FALSE in plotCtOverview, and then afterwards add it yourself in the desired location using legend(). That way, if you have a lot of different features or groups to display, you can also use the ncol parameter in legend to make several columns within the legend, such as 3x4 instead of the default 12x1. Alternatively, you can use either xlim or ylim in plotCtOverview to add some empty space on the side where there's then room for the legend. > #I now want to plot a subset of the samples for specific genes# >> LOY=subset(groupID,groupID$Treatment=="LO" | groupID$Treatment== "LFY") >> LOY > Sample Treatment > 2 L20 LFY > 5 L30 LFY > 7 L45 LO > 20 L40 LO > 27 L43 LO > 33 L29 LFY > 36 L38 LO > 40 L39 LO > 43 L23 LFY > > >> plotCtOverview(raw6, genes=g, groups=LOY, col=rainbow(5)) > Warning messages: > 1: In split.default(t(x), sample.split) : > data length is not a multiple of split variable > 2: In qt(p, df, lower.tail, log.p) : NaNs produced >> Does it make sense if you split by groups=LOY$Treatment? It looks like the object LOY itself is a data frame, rather than the expected vector. Also, you may have to 'repeat' the col=rainbow() argument to fit your number of features. > > #it displays the two groups defined by treatment, but doesnt do so nicely, > very skinny bars, and the legend doesnt reflect what its displaying# > #again, I've tried monkeying around with par, but not sure what HTqPCR is > calling to make the plots# > If the bars are very skinny, it's probably because you're displaying a lot of features. Nothing much to do about that, except increasing the width or your plot :(. \Heidi > please help! > > thanks > > Simon. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
Thanks! I will work on it some more, didn't realize groups was defined by vector. On Jun 26, 2012, at 12:48 PM, Heidi Dvinge <heidi at="" ebi.ac.uk=""> wrote: >> Hi, >> I'm having some troubles selectively sub-setting, and graphing up QPCR >> data within HTqPCR for Biomark plates (both 48.48 and 96.96 plates). I'd >> like to be able to visualize specific genes, with specific groups we run >> routinely on our Biomark system. Typical runs are across multiple plates, >> and have multiple biological replicates, and usually 2 or more technical >> replicates (although we are moving away from technical reps, as the CVs >> are so tight). >> >> Can anyone help with this? Heidi? >> >> raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48, >> n.data=48, samples=samples) >> #Ive read the samples in from a separate file, as when you read it in, it >> doesnt take the sample names supplied in the biomark output# >> #Next, I want to plot genes of interest, with samples of interest, and I'm >> having trouble getting an appropriate output# >> >> g=featureNames(raw6)[1:2] >> plotCtOverview(raw6, genes=g, groups=groupID$Treatment, col=rainbow(5)) >> >> #This plots 1 gene across all 48 samples# >> #but the legend doesnt behave, its placed on top of the plot, and I cant >> get it to display in a non-overlapping fashion# >> #I've tried all sorts of things in par, but nothing seems to shift the >> legend's position# >> > As the old saying goes, whenever you want a job done well, you'll have to > do it yourself ;). In this case, the easiest thing is probably to use > legend=FALSE in plotCtOverview, and then afterwards add it yourself in the > desired location using legend(). That way, if you have a lot of different > features or groups to display, you can also use the ncol parameter in > legend to make several columns within the legend, such as 3x4 instead of > the default 12x1. > > Alternatively, you can use either xlim or ylim in plotCtOverview to add > some empty space on the side where there's then room for the legend. > >> #I now want to plot a subset of the samples for specific genes# >>> LOY=subset(groupID,groupID$Treatment=="LO" | groupID$Treatment== "LFY") >>> LOY >> Sample Treatment >> 2 L20 LFY >> 5 L30 LFY >> 7 L45 LO >> 20 L40 LO >> 27 L43 LO >> 33 L29 LFY >> 36 L38 LO >> 40 L39 LO >> 43 L23 LFY >> >> >>> plotCtOverview(raw6, genes=g, groups=LOY, col=rainbow(5)) >> Warning messages: >> 1: In split.default(t(x), sample.split) : >> data length is not a multiple of split variable >> 2: In qt(p, df, lower.tail, log.p) : NaNs produced >>> > > Does it make sense if you split by groups=LOY$Treatment? It looks like the > object LOY itself is a data frame, rather than the expected vector. > > Also, you may have to 'repeat' the col=rainbow() argument to fit your > number of features. > >> >> #it displays the two groups defined by treatment, but doesnt do so nicely, >> very skinny bars, and the legend doesnt reflect what its displaying# >> #again, I've tried monkeying around with par, but not sure what HTqPCR is >> calling to make the plots# >> > If the bars are very skinny, it's probably because you're displaying a lot > of features. Nothing much to do about that, except increasing the width or > your plot :(. > > \Heidi > >> please help! >> >> thanks >> >> Simon. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >
ADD REPLY
0
Entering edit mode
Thanks for the help Heidi, but I'm still having troubles, your comments on the plotting helped me solve the outputs. But if I want to just display some groups (for example the LO group in the example below), how do I associate a group with multiple samples (ie biological reps)? Currently I'm associating genes with samples by reading in the file as below plate6=read.delim("plate6Sample.txt", header=FALSE) #this is a file to associate sample ID with the genes in the biomark data, as currently HTqPCR does not seem to associate the sample info in the Biomark output to the gene IDs samples=as.vector(t(plate6)) raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48, n.data=48, samples=samples) #now I have samples and genes similar to your example in the guide, but I want to associate samples to groups now. In the guide, you have an example where you have entire files as distinct samples, but in our runs, we never have that situation. I have a file which associates samples to groups, which I read in... groupID=read.csv("plate6key.csv") but how do I associate the samples with their appropriate groups for biological replicates with any of the functions in HtQPCR? You recommend below using a vector, but I dont see how that helps me associate the samples in the Expression set. thanks again! s On Jun 26, 2012, at 12:48 PM, Heidi Dvinge wrote: >> Hi, >> I'm having some troubles selectively sub-setting, and graphing up QPCR >> data within HTqPCR for Biomark plates (both 48.48 and 96.96 plates). I'd >> like to be able to visualize specific genes, with specific groups we run >> routinely on our Biomark system. Typical runs are across multiple plates, >> and have multiple biological replicates, and usually 2 or more technical >> replicates (although we are moving away from technical reps, as the CVs >> are so tight). >> >> Can anyone help with this? Heidi? >> >> raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48, >> n.data=48, samples=samples) >> #Ive read the samples in from a separate file, as when you read it in, it >> doesnt take the sample names supplied in the biomark output# >> #Next, I want to plot genes of interest, with samples of interest, and I'm >> having trouble getting an appropriate output# >> >> g=featureNames(raw6)[1:2] >> plotCtOverview(raw6, genes=g, groups=groupID$Treatment, col=rainbow(5)) >> >> #This plots 1 gene across all 48 samples# >> #but the legend doesnt behave, its placed on top of the plot, and I cant >> get it to display in a non-overlapping fashion# >> #I've tried all sorts of things in par, but nothing seems to shift the >> legend's position# >> > As the old saying goes, whenever you want a job done well, you'll have to > do it yourself ;). In this case, the easiest thing is probably to use > legend=FALSE in plotCtOverview, and then afterwards add it yourself in the > desired location using legend(). That way, if you have a lot of different > features or groups to display, you can also use the ncol parameter in > legend to make several columns within the legend, such as 3x4 instead of > the default 12x1. > > Alternatively, you can use either xlim or ylim in plotCtOverview to add > some empty space on the side where there's then room for the legend. > >> #I now want to plot a subset of the samples for specific genes# >>> LOY=subset(groupID,groupID$Treatment=="LO" | groupID$Treatment== "LFY") >>> LOY >> Sample Treatment >> 2 L20 LFY >> 5 L30 LFY >> 7 L45 LO >> 20 L40 LO >> 27 L43 LO >> 33 L29 LFY >> 36 L38 LO >> 40 L39 LO >> 43 L23 LFY >> >> >>> plotCtOverview(raw6, genes=g, groups=LOY, col=rainbow(5)) >> Warning messages: >> 1: In split.default(t(x), sample.split) : >> data length is not a multiple of split variable >> 2: In qt(p, df, lower.tail, log.p) : NaNs produced >>> > > Does it make sense if you split by groups=LOY$Treatment? It looks like the > object LOY itself is a data frame, rather than the expected vector. > > Also, you may have to 'repeat' the col=rainbow() argument to fit your > number of features. > >> >> #it displays the two groups defined by treatment, but doesnt do so nicely, >> very skinny bars, and the legend doesnt reflect what its displaying# >> #again, I've tried monkeying around with par, but not sure what HTqPCR is >> calling to make the plots# >> > If the bars are very skinny, it's probably because you're displaying a lot > of features. Nothing much to do about that, except increasing the width or > your plot :(. > > \Heidi > >> please help! >> >> thanks >> >> Simon. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >
ADD REPLY
0
Entering edit mode
Hi Simon, > Thanks for the help Heidi, > but I'm still having troubles, your comments on the plotting helped me > solve the outputs. But if I want to just display some groups (for example > the LO group in the example below), how do I associate a group with > multiple samples (ie biological reps)? > > Currently I'm associating genes with samples by reading in the file as > below > plate6=read.delim("plate6Sample.txt", header=FALSE) > #this is a file to associate sample ID with the genes in the biomark data, > as currently HTqPCR does not seem to associate the sample info in the > Biomark output to the gene IDs > Erm, no, it doesn't :-/ > samples=as.vector(t(plate6)) > raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48, > n.data=48, samples=samples) > #now I have samples and genes similar to your example in the guide, but I > want to associate samples to groups now. In the guide, you have an example > where you have entire files as distinct samples, but in our runs, we never > have that situation. I have a file which associates samples to groups, > which I read in... > > groupID=read.csv("plate6key.csv") > > but how do I associate the samples with their appropriate groups for > biological replicates with any of the functions in HtQPCR? I'm afraid I'm slightly confused here (sorry, long day). Exactly how is your data formatted? I.e. are the columns either individual samples, or from files containing multiple samples? So for example for a single 48.48 arrays, is your qPCRset object 2304 x 1 or 48 x 48? >From your readCtData command I'm guessing you have 48 x 48, i.e. all 48 samples from your 1 array are in columns. In that case the 'groups' parameter in plotCtOverview will need to be a vector of length 48, indicating how you want the 48 columns in your qPCRset object to be grouped together. Below is an example of (admittedly ugly) plots. I don't know if that's similar to your situation at all. \Heidi > # Reading in data > exPath <- system.file("exData", package = "HTqPCR") > raw1 <- readCtData(files = "BioMark_sample.csv", path = exPath, format = "BioMark", n.features = 48, n.data = 48) > # Check sample names > head(sampleNames(raw1)) [1] "Sample1" "Sample2" "Sample3" "Sample4" "Sample5" "Sample6" > # Associate samples with (randomly chosen) groups > anno <- data.frame(sampleID=sampleNames(raw1), Group=rep(c("A", "B", "C", "D"), times=c(4,24,5,15))) > head(anno) sampleID Group 1 Sample1 A 2 Sample2 A 3 Sample3 A 4 Sample4 A 5 Sample5 B 6 Sample6 B > # Plot the first gene - for each sample individually > plotCtOverview(raw1, genes=featureNames(raw1)[1], legend=FALSE, col=1:nrow(anno)) > # Plot the first gene - for each group > plotCtOverview(raw1, genes=featureNames(raw1)[1], group=anno$Group, legend=FALSE, col=1:length(unique(anno$Group))) > # Plot the first gene, using group "A" as a control > plotCtOverview(raw1, genes=featureNames(raw1)[1], group=anno$Group, legend=FALSE, col=1:length(unique(anno$Group)), calibrator="A") > You recommend below using a vector, but I dont see how that helps me > associate the samples in the Expression set. > > thanks again! > > s > > On Jun 26, 2012, at 12:48 PM, Heidi Dvinge wrote: > >>> Hi, >>> I'm having some troubles selectively sub-setting, and graphing up QPCR >>> data within HTqPCR for Biomark plates (both 48.48 and 96.96 plates). >>> I'd >>> like to be able to visualize specific genes, with specific groups we >>> run >>> routinely on our Biomark system. Typical runs are across multiple >>> plates, >>> and have multiple biological replicates, and usually 2 or more >>> technical >>> replicates (although we are moving away from technical reps, as the CVs >>> are so tight). >>> >>> Can anyone help with this? Heidi? >>> >>> raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48, >>> n.data=48, samples=samples) >>> #Ive read the samples in from a separate file, as when you read it in, >>> it >>> doesnt take the sample names supplied in the biomark output# >>> #Next, I want to plot genes of interest, with samples of interest, and >>> I'm >>> having trouble getting an appropriate output# >>> >>> g=featureNames(raw6)[1:2] >>> plotCtOverview(raw6, genes=g, groups=groupID$Treatment, col=rainbow(5)) >>> >>> #This plots 1 gene across all 48 samples# >>> #but the legend doesnt behave, its placed on top of the plot, and I >>> cant >>> get it to display in a non-overlapping fashion# >>> #I've tried all sorts of things in par, but nothing seems to shift the >>> legend's position# >>> >> As the old saying goes, whenever you want a job done well, you'll have >> to >> do it yourself ;). In this case, the easiest thing is probably to use >> legend=FALSE in plotCtOverview, and then afterwards add it yourself in >> the >> desired location using legend(). That way, if you have a lot of >> different >> features or groups to display, you can also use the ncol parameter in >> legend to make several columns within the legend, such as 3x4 instead of >> the default 12x1. >> >> Alternatively, you can use either xlim or ylim in plotCtOverview to add >> some empty space on the side where there's then room for the legend. >> >>> #I now want to plot a subset of the samples for specific genes# >>>> LOY=subset(groupID,groupID$Treatment=="LO" | groupID$Treatment== >>>> "LFY") >>>> LOY >>> Sample Treatment >>> 2 L20 LFY >>> 5 L30 LFY >>> 7 L45 LO >>> 20 L40 LO >>> 27 L43 LO >>> 33 L29 LFY >>> 36 L38 LO >>> 40 L39 LO >>> 43 L23 LFY >>> >>> >>>> plotCtOverview(raw6, genes=g, groups=LOY, col=rainbow(5)) >>> Warning messages: >>> 1: In split.default(t(x), sample.split) : >>> data length is not a multiple of split variable >>> 2: In qt(p, df, lower.tail, log.p) : NaNs produced >>>> >> >> Does it make sense if you split by groups=LOY$Treatment? It looks like >> the >> object LOY itself is a data frame, rather than the expected vector. >> >> Also, you may have to 'repeat' the col=rainbow() argument to fit your >> number of features. >> >>> >>> #it displays the two groups defined by treatment, but doesnt do so >>> nicely, >>> very skinny bars, and the legend doesnt reflect what its displaying# >>> #again, I've tried monkeying around with par, but not sure what HTqPCR >>> is >>> calling to make the plots# >>> >> If the bars are very skinny, it's probably because you're displaying a >> lot >> of features. Nothing much to do about that, except increasing the width >> or >> your plot :(. >> >> \Heidi >> >>> please help! >>> >>> thanks >>> >>> Simon. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > >
ADD REPLY
0
Entering edit mode
Hi Heidi, you are correct, yes 48.48. The example you provide below is exactly what I needed for clarification for groups. I was trying to reverse engineer what you had done with the original expression set package for microarrays, but from below, I can get this to work now. Just to be clear, I have 5 48.48 plates. Should I normalize each individually prior to combining, or should I reformat to a 2304x1 each, combine, then normalize (not sure if you can do that or not) thanks again for your prompt responses! best s On Jun 27, 2012, at 2:26 PM, Heidi Dvinge wrote: > Hi Simon, > >> Thanks for the help Heidi, >> but I'm still having troubles, your comments on the plotting helped me >> solve the outputs. But if I want to just display some groups (for example >> the LO group in the example below), how do I associate a group with >> multiple samples (ie biological reps)? >> >> Currently I'm associating genes with samples by reading in the file as >> below >> plate6=read.delim("plate6Sample.txt", header=FALSE) >> #this is a file to associate sample ID with the genes in the biomark data, >> as currently HTqPCR does not seem to associate the sample info in the >> Biomark output to the gene IDs >> > Erm, no, it doesn't :-/ > >> samples=as.vector(t(plate6)) >> raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48, >> n.data=48, samples=samples) >> #now I have samples and genes similar to your example in the guide, but I >> want to associate samples to groups now. In the guide, you have an example >> where you have entire files as distinct samples, but in our runs, we never >> have that situation. I have a file which associates samples to groups, >> which I read in... >> >> groupID=read.csv("plate6key.csv") >> >> but how do I associate the samples with their appropriate groups for >> biological replicates with any of the functions in HtQPCR? > > I'm afraid I'm slightly confused here (sorry, long day). Exactly how is > your data formatted? I.e. are the columns either individual samples, or > from files containing multiple samples? So for example for a single 48.48 > arrays, is your qPCRset object 2304 x 1 or 48 x 48? > > From your readCtData command I'm guessing you have 48 x 48, i.e. all 48 > samples from your 1 array are in columns. In that case the 'groups' > parameter in plotCtOverview will need to be a vector of length 48, > indicating how you want the 48 columns in your qPCRset object to be > grouped together. > > Below is an example of (admittedly ugly) plots. I don't know if that's > similar to your situation at all. > > \Heidi > >> # Reading in data >> exPath <- system.file("exData", package = "HTqPCR") >> raw1 <- readCtData(files = "BioMark_sample.csv", path = exPath, format = > "BioMark", n.features = 48, n.data = 48) >> # Check sample names >> head(sampleNames(raw1)) > [1] "Sample1" "Sample2" "Sample3" "Sample4" "Sample5" "Sample6" >> # Associate samples with (randomly chosen) groups >> anno <- data.frame(sampleID=sampleNames(raw1), Group=rep(c("A", "B", > "C", "D"), times=c(4,24,5,15))) >> head(anno) > sampleID Group > 1 Sample1 A > 2 Sample2 A > 3 Sample3 A > 4 Sample4 A > 5 Sample5 B > 6 Sample6 B >> # Plot the first gene - for each sample individually >> plotCtOverview(raw1, genes=featureNames(raw1)[1], legend=FALSE, > col=1:nrow(anno)) >> # Plot the first gene - for each group >> plotCtOverview(raw1, genes=featureNames(raw1)[1], group=anno$Group, > legend=FALSE, col=1:length(unique(anno$Group))) >> # Plot the first gene, using group "A" as a control >> plotCtOverview(raw1, genes=featureNames(raw1)[1], group=anno$Group, > legend=FALSE, col=1:length(unique(anno$Group)), calibrator="A") > > > >> You recommend below using a vector, but I dont see how that helps me >> associate the samples in the Expression set. >> >> thanks again! >> >> s >> >> On Jun 26, 2012, at 12:48 PM, Heidi Dvinge wrote: >> >>>> Hi, >>>> I'm having some troubles selectively sub-setting, and graphing up QPCR >>>> data within HTqPCR for Biomark plates (both 48.48 and 96.96 plates). >>>> I'd >>>> like to be able to visualize specific genes, with specific groups we >>>> run >>>> routinely on our Biomark system. Typical runs are across multiple >>>> plates, >>>> and have multiple biological replicates, and usually 2 or more >>>> technical >>>> replicates (although we are moving away from technical reps, as the CVs >>>> are so tight). >>>> >>>> Can anyone help with this? Heidi? >>>> >>>> raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48, >>>> n.data=48, samples=samples) >>>> #Ive read the samples in from a separate file, as when you read it in, >>>> it >>>> doesnt take the sample names supplied in the biomark output# >>>> #Next, I want to plot genes of interest, with samples of interest, and >>>> I'm >>>> having trouble getting an appropriate output# >>>> >>>> g=featureNames(raw6)[1:2] >>>> plotCtOverview(raw6, genes=g, groups=groupID$Treatment, col=rainbow(5)) >>>> >>>> #This plots 1 gene across all 48 samples# >>>> #but the legend doesnt behave, its placed on top of the plot, and I >>>> cant >>>> get it to display in a non-overlapping fashion# >>>> #I've tried all sorts of things in par, but nothing seems to shift the >>>> legend's position# >>>> >>> As the old saying goes, whenever you want a job done well, you'll have >>> to >>> do it yourself ;). In this case, the easiest thing is probably to use >>> legend=FALSE in plotCtOverview, and then afterwards add it yourself in >>> the >>> desired location using legend(). That way, if you have a lot of >>> different >>> features or groups to display, you can also use the ncol parameter in >>> legend to make several columns within the legend, such as 3x4 instead of >>> the default 12x1. >>> >>> Alternatively, you can use either xlim or ylim in plotCtOverview to add >>> some empty space on the side where there's then room for the legend. >>> >>>> #I now want to plot a subset of the samples for specific genes# >>>>> LOY=subset(groupID,groupID$Treatment=="LO" | groupID$Treatment== >>>>> "LFY") >>>>> LOY >>>> Sample Treatment >>>> 2 L20 LFY >>>> 5 L30 LFY >>>> 7 L45 LO >>>> 20 L40 LO >>>> 27 L43 LO >>>> 33 L29 LFY >>>> 36 L38 LO >>>> 40 L39 LO >>>> 43 L23 LFY >>>> >>>> >>>>> plotCtOverview(raw6, genes=g, groups=LOY, col=rainbow(5)) >>>> Warning messages: >>>> 1: In split.default(t(x), sample.split) : >>>> data length is not a multiple of split variable >>>> 2: In qt(p, df, lower.tail, log.p) : NaNs produced >>>>> >>> >>> Does it make sense if you split by groups=LOY$Treatment? It looks like >>> the >>> object LOY itself is a data frame, rather than the expected vector. >>> >>> Also, you may have to 'repeat' the col=rainbow() argument to fit your >>> number of features. >>> >>>> >>>> #it displays the two groups defined by treatment, but doesnt do so >>>> nicely, >>>> very skinny bars, and the legend doesnt reflect what its displaying# >>>> #again, I've tried monkeying around with par, but not sure what HTqPCR >>>> is >>>> calling to make the plots# >>>> >>> If the bars are very skinny, it's probably because you're displaying a >>> lot >>> of features. Nothing much to do about that, except increasing the width >>> or >>> your plot :(. >>> >>> \Heidi >>> >>>> please help! >>>> >>>> thanks >>>> >>>> Simon. >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> >> >> > >
ADD REPLY
0
Entering edit mode
> Hi Heidi, > you are correct, yes 48.48. > The example you provide below is exactly what I needed for clarification > for groups. I was trying to reverse engineer what you had done with the > original expression set package for microarrays, but from below, I can get > this to work now. > Glad it works. Hopefully by the next BioConductor release I'll remember to clarify the plotCtOverview help file. > Just to be clear, I have 5 48.48 plates. Should I normalize each > individually prior to combining, or should I reformat to a 2304x1 each, > combine, then normalize (not sure if you can do that or not) > Hm, that's one of the questions I've also been asking myself, so I would be curious to hear what your results from this are. If you suspect that there are some major factors influencing the 5 plates systematically, then normalising them in a 2304 x 5 object should (hopefully) correct for that. For example, they may have been run on different days, by different people, or perhaps there was a short power cut during the processing of one of them. This might be visible if you have for example a boxplot of Ct from all 48*5 samples, and you see blocks of them shifted up or down. Obviously, this doesn't take care of normalisation between your samples within each plate though. If you suspect your samples to have some systematic variation that you need to account for, then you can normalise each plate individually (as a 48 x 48) object. Alternatively, you can try to combine within- and between-sample normalisation by taking all 48 x 240 values at once. In principle, you can split, reformat and the recombine the data in however many ways you like. Personally, with any sort of data I prefer to go with as little preprocessing as possible, since each additional step can potentially introduce its own biases into the data. So unless there are some obvious variation between your 5 plates, I'd probably stick with just normalisation between the samples, e.. using a 48 x 240 object. Of course, you may have different preferences, or find out that a completely different approach is required for this particular data set. \Heidi > thanks again for your prompt responses! > > best > > s > > On Jun 27, 2012, at 2:26 PM, Heidi Dvinge wrote: > >> Hi Simon, >> >>> Thanks for the help Heidi, >>> but I'm still having troubles, your comments on the plotting helped me >>> solve the outputs. But if I want to just display some groups (for >>> example >>> the LO group in the example below), how do I associate a group with >>> multiple samples (ie biological reps)? >>> >>> Currently I'm associating genes with samples by reading in the file as >>> below >>> plate6=read.delim("plate6Sample.txt", header=FALSE) >>> #this is a file to associate sample ID with the genes in the biomark >>> data, >>> as currently HTqPCR does not seem to associate the sample info in the >>> Biomark output to the gene IDs >>> >> Erm, no, it doesn't :-/ >> >>> samples=as.vector(t(plate6)) >>> raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48, >>> n.data=48, samples=samples) >>> #now I have samples and genes similar to your example in the guide, but >>> I >>> want to associate samples to groups now. In the guide, you have an >>> example >>> where you have entire files as distinct samples, but in our runs, we >>> never >>> have that situation. I have a file which associates samples to groups, >>> which I read in... >>> >>> groupID=read.csv("plate6key.csv") >>> >>> but how do I associate the samples with their appropriate groups for >>> biological replicates with any of the functions in HtQPCR? >> >> I'm afraid I'm slightly confused here (sorry, long day). Exactly how is >> your data formatted? I.e. are the columns either individual samples, or >> from files containing multiple samples? So for example for a single >> 48.48 >> arrays, is your qPCRset object 2304 x 1 or 48 x 48? >> >> From your readCtData command I'm guessing you have 48 x 48, i.e. all 48 >> samples from your 1 array are in columns. In that case the 'groups' >> parameter in plotCtOverview will need to be a vector of length 48, >> indicating how you want the 48 columns in your qPCRset object to be >> grouped together. >> >> Below is an example of (admittedly ugly) plots. I don't know if that's >> similar to your situation at all. >> >> \Heidi >> >>> # Reading in data >>> exPath <- system.file("exData", package = "HTqPCR") >>> raw1 <- readCtData(files = "BioMark_sample.csv", path = exPath, format >>> = >> "BioMark", n.features = 48, n.data = 48) >>> # Check sample names >>> head(sampleNames(raw1)) >> [1] "Sample1" "Sample2" "Sample3" "Sample4" "Sample5" "Sample6" >>> # Associate samples with (randomly chosen) groups >>> anno <- data.frame(sampleID=sampleNames(raw1), Group=rep(c("A", "B", >> "C", "D"), times=c(4,24,5,15))) >>> head(anno) >> sampleID Group >> 1 Sample1 A >> 2 Sample2 A >> 3 Sample3 A >> 4 Sample4 A >> 5 Sample5 B >> 6 Sample6 B >>> # Plot the first gene - for each sample individually >>> plotCtOverview(raw1, genes=featureNames(raw1)[1], legend=FALSE, >> col=1:nrow(anno)) >>> # Plot the first gene - for each group >>> plotCtOverview(raw1, genes=featureNames(raw1)[1], group=anno$Group, >> legend=FALSE, col=1:length(unique(anno$Group))) >>> # Plot the first gene, using group "A" as a control >>> plotCtOverview(raw1, genes=featureNames(raw1)[1], group=anno$Group, >> legend=FALSE, col=1:length(unique(anno$Group)), calibrator="A") >> >> >> >>> You recommend below using a vector, but I dont see how that helps me >>> associate the samples in the Expression set. >>> >>> thanks again! >>> >>> s >>> >>> On Jun 26, 2012, at 12:48 PM, Heidi Dvinge wrote: >>> >>>>> Hi, >>>>> I'm having some troubles selectively sub-setting, and graphing up >>>>> QPCR >>>>> data within HTqPCR for Biomark plates (both 48.48 and 96.96 plates). >>>>> I'd >>>>> like to be able to visualize specific genes, with specific groups we >>>>> run >>>>> routinely on our Biomark system. Typical runs are across multiple >>>>> plates, >>>>> and have multiple biological replicates, and usually 2 or more >>>>> technical >>>>> replicates (although we are moving away from technical reps, as the >>>>> CVs >>>>> are so tight). >>>>> >>>>> Can anyone help with this? Heidi? >>>>> >>>>> raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48, >>>>> n.data=48, samples=samples) >>>>> #Ive read the samples in from a separate file, as when you read it >>>>> in, >>>>> it >>>>> doesnt take the sample names supplied in the biomark output# >>>>> #Next, I want to plot genes of interest, with samples of interest, >>>>> and >>>>> I'm >>>>> having trouble getting an appropriate output# >>>>> >>>>> g=featureNames(raw6)[1:2] >>>>> plotCtOverview(raw6, genes=g, groups=groupID$Treatment, >>>>> col=rainbow(5)) >>>>> >>>>> #This plots 1 gene across all 48 samples# >>>>> #but the legend doesnt behave, its placed on top of the plot, and I >>>>> cant >>>>> get it to display in a non-overlapping fashion# >>>>> #I've tried all sorts of things in par, but nothing seems to shift >>>>> the >>>>> legend's position# >>>>> >>>> As the old saying goes, whenever you want a job done well, you'll have >>>> to >>>> do it yourself ;). In this case, the easiest thing is probably to use >>>> legend=FALSE in plotCtOverview, and then afterwards add it yourself in >>>> the >>>> desired location using legend(). That way, if you have a lot of >>>> different >>>> features or groups to display, you can also use the ncol parameter in >>>> legend to make several columns within the legend, such as 3x4 instead >>>> of >>>> the default 12x1. >>>> >>>> Alternatively, you can use either xlim or ylim in plotCtOverview to >>>> add >>>> some empty space on the side where there's then room for the legend. >>>> >>>>> #I now want to plot a subset of the samples for specific genes# >>>>>> LOY=subset(groupID,groupID$Treatment=="LO" | groupID$Treatment== >>>>>> "LFY") >>>>>> LOY >>>>> Sample Treatment >>>>> 2 L20 LFY >>>>> 5 L30 LFY >>>>> 7 L45 LO >>>>> 20 L40 LO >>>>> 27 L43 LO >>>>> 33 L29 LFY >>>>> 36 L38 LO >>>>> 40 L39 LO >>>>> 43 L23 LFY >>>>> >>>>> >>>>>> plotCtOverview(raw6, genes=g, groups=LOY, col=rainbow(5)) >>>>> Warning messages: >>>>> 1: In split.default(t(x), sample.split) : >>>>> data length is not a multiple of split variable >>>>> 2: In qt(p, df, lower.tail, log.p) : NaNs produced >>>>>> >>>> >>>> Does it make sense if you split by groups=LOY$Treatment? It looks like >>>> the >>>> object LOY itself is a data frame, rather than the expected vector. >>>> >>>> Also, you may have to 'repeat' the col=rainbow() argument to fit your >>>> number of features. >>>> >>>>> >>>>> #it displays the two groups defined by treatment, but doesnt do so >>>>> nicely, >>>>> very skinny bars, and the legend doesnt reflect what its displaying# >>>>> #again, I've tried monkeying around with par, but not sure what >>>>> HTqPCR >>>>> is >>>>> calling to make the plots# >>>>> >>>> If the bars are very skinny, it's probably because you're displaying a >>>> lot >>>> of features. Nothing much to do about that, except increasing the >>>> width >>>> or >>>> your plot :(. >>>> >>>> \Heidi >>>> >>>>> please help! >>>>> >>>>> thanks >>>>> >>>>> Simon. >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>>> >>> >>> >> >> > >
ADD REPLY
0
Entering edit mode
Hi Heidi, getting there, hopefully if you can clarify the following issue, all will be well and good. After yesterdays correspondence, I'm now producing nice plots, when I check the actual values being plotted, they dont match up to the sample ID's. In fact, if I dont bother assigning groups, the sample ID's dont match to their respective gene CT values. I'm worried there is some underlying problem with the data structure I'm not understanding. I understand the code, its just the samples dont match the reported gene values in the csv file. for example > head(groupID) Sample Treatment 1 S28 SMY 2 L20 LFY 3 M26 MMY 4 L1 LFR 5 L30 LFY 6 K13 KMO >plotCtOverview(raw6, genes=featureNames(raw6)[1], group=groupID$Treatment,legend=FALSE, col=1:length(unique(groupID$Treatment))) produces a nice plot of a tubulin gene across the groups, as you suggested yesterday . Yet if I look at the values, they dont match the CSV values for specific genes/samples I used. If I turn off groups, and look at samples without merging by group, I can see that the values dont match the appropriate gene being displayed. My question is, where is the sample order being drawn from in the CSV file? Is there a simple check I can use to see that what is being plotted, is what I think is being plotted? The group ID sample-Treatment is correct, and all the samples in the original CSV file are correct. Is it possible that the package is assigning gene/sample ID in some other order than that I've supplied? I just want to be sure that when HTqPCR pulls the sample ID and maps it to the appropriate gene/Group, some transformation is not happening. Fluidigm suggests a particular order in loading samples and genes. These are numbered 1-48 (sample), and 1-48 (gene) for a 48.48 plate (and the same for a 96.96 plate). This is the order I supplied the sample IDs in the groupID file above. How do you map the raw csv output to gene/sample id? Is there a way of checking that the sample/gene/group ID is correct? as always, thanks in advance for your help best s On Jun 27, 2012, at 3:27 PM, Heidi Dvinge wrote: >> Hi Heidi, >> you are correct, yes 48.48. >> The example you provide below is exactly what I needed for clarification >> for groups. I was trying to reverse engineer what you had done with the >> original expression set package for microarrays, but from below, I can get >> this to work now. >> > Glad it works. Hopefully by the next BioConductor release I'll remember to > clarify the plotCtOverview help file. > >> Just to be clear, I have 5 48.48 plates. Should I normalize each >> individually prior to combining, or should I reformat to a 2304x1 each, >> combine, then normalize (not sure if you can do that or not) >> > Hm, that's one of the questions I've also been asking myself, so I would > be curious to hear what your results from this are. > > If you suspect that there are some major factors influencing the 5 plates > systematically, then normalising them in a 2304 x 5 object should > (hopefully) correct for that. For example, they may have been run on > different days, by different people, or perhaps there was a short power > cut during the processing of one of them. This might be visible if you > have for example a boxplot of Ct from all 48*5 samples, and you see blocks > of them shifted up or down. > > Obviously, this doesn't take care of normalisation between your samples > within each plate though. If you suspect your samples to have some > systematic variation that you need to account for, then you can normalise > each plate individually (as a 48 x 48) object. Alternatively, you can try > to combine within- and between-sample normalisation by taking all 48 x 240 > values at once. > > In principle, you can split, reformat and the recombine the data in > however many ways you like. Personally, with any sort of data I prefer to > go with as little preprocessing as possible, since each additional step > can potentially introduce its own biases into the data. So unless there > are some obvious variation between your 5 plates, I'd probably stick with > just normalisation between the samples, e.. using a 48 x 240 object. > > Of course, you may have different preferences, or find out that a > completely different approach is required for this particular data set. > > \Heidi > >> thanks again for your prompt responses! >> >> best >> >> s >> >> On Jun 27, 2012, at 2:26 PM, Heidi Dvinge wrote: >> >>> Hi Simon, >>> >>>> Thanks for the help Heidi, >>>> but I'm still having troubles, your comments on the plotting helped me >>>> solve the outputs. But if I want to just display some groups (for >>>> example >>>> the LO group in the example below), how do I associate a group with >>>> multiple samples (ie biological reps)? >>>> >>>> Currently I'm associating genes with samples by reading in the file as >>>> below >>>> plate6=read.delim("plate6Sample.txt", header=FALSE) >>>> #this is a file to associate sample ID with the genes in the biomark >>>> data, >>>> as currently HTqPCR does not seem to associate the sample info in the >>>> Biomark output to the gene IDs >>>> >>> Erm, no, it doesn't :-/ >>> >>>> samples=as.vector(t(plate6)) >>>> raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48, >>>> n.data=48, samples=samples) >>>> #now I have samples and genes similar to your example in the guide, but >>>> I >>>> want to associate samples to groups now. In the guide, you have an >>>> example >>>> where you have entire files as distinct samples, but in our runs, we >>>> never >>>> have that situation. I have a file which associates samples to groups, >>>> which I read in... >>>> >>>> groupID=read.csv("plate6key.csv") >>>> >>>> but how do I associate the samples with their appropriate groups for >>>> biological replicates with any of the functions in HtQPCR? >>> >>> I'm afraid I'm slightly confused here (sorry, long day). Exactly how is >>> your data formatted? I.e. are the columns either individual samples, or >>> from files containing multiple samples? So for example for a single >>> 48.48 >>> arrays, is your qPCRset object 2304 x 1 or 48 x 48? >>> >>> From your readCtData command I'm guessing you have 48 x 48, i.e. all 48 >>> samples from your 1 array are in columns. In that case the 'groups' >>> parameter in plotCtOverview will need to be a vector of length 48, >>> indicating how you want the 48 columns in your qPCRset object to be >>> grouped together. >>> >>> Below is an example of (admittedly ugly) plots. I don't know if that's >>> similar to your situation at all. >>> >>> \Heidi >>> >>>> # Reading in data >>>> exPath <- system.file("exData", package = "HTqPCR") >>>> raw1 <- readCtData(files = "BioMark_sample.csv", path = exPath, format >>>> = >>> "BioMark", n.features = 48, n.data = 48) >>>> # Check sample names >>>> head(sampleNames(raw1)) >>> [1] "Sample1" "Sample2" "Sample3" "Sample4" "Sample5" "Sample6" >>>> # Associate samples with (randomly chosen) groups >>>> anno <- data.frame(sampleID=sampleNames(raw1), Group=rep(c("A", "B", >>> "C", "D"), times=c(4,24,5,15))) >>>> head(anno) >>> sampleID Group >>> 1 Sample1 A >>> 2 Sample2 A >>> 3 Sample3 A >>> 4 Sample4 A >>> 5 Sample5 B >>> 6 Sample6 B >>>> # Plot the first gene - for each sample individually >>>> plotCtOverview(raw1, genes=featureNames(raw1)[1], legend=FALSE, >>> col=1:nrow(anno)) >>>> # Plot the first gene - for each group >>>> plotCtOverview(raw1, genes=featureNames(raw1)[1], group=anno$Group, >>> legend=FALSE, col=1:length(unique(anno$Group))) >>>> # Plot the first gene, using group "A" as a control >>>> plotCtOverview(raw1, genes=featureNames(raw1)[1], group=anno$Group, >>> legend=FALSE, col=1:length(unique(anno$Group)), calibrator="A") >>> >>> >>> >>>> You recommend below using a vector, but I dont see how that helps me >>>> associate the samples in the Expression set. >>>> >>>> thanks again! >>>> >>>> s >>>> >>>> On Jun 26, 2012, at 12:48 PM, Heidi Dvinge wrote: >>>> >>>>>> Hi, >>>>>> I'm having some troubles selectively sub-setting, and graphing up >>>>>> QPCR >>>>>> data within HTqPCR for Biomark plates (both 48.48 and 96.96 plates). >>>>>> I'd >>>>>> like to be able to visualize specific genes, with specific groups we >>>>>> run >>>>>> routinely on our Biomark system. Typical runs are across multiple >>>>>> plates, >>>>>> and have multiple biological replicates, and usually 2 or more >>>>>> technical >>>>>> replicates (although we are moving away from technical reps, as the >>>>>> CVs >>>>>> are so tight). >>>>>> >>>>>> Can anyone help with this? Heidi? >>>>>> >>>>>> raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48, >>>>>> n.data=48, samples=samples) >>>>>> #Ive read the samples in from a separate file, as when you read it >>>>>> in, >>>>>> it >>>>>> doesnt take the sample names supplied in the biomark output# >>>>>> #Next, I want to plot genes of interest, with samples of interest, >>>>>> and >>>>>> I'm >>>>>> having trouble getting an appropriate output# >>>>>> >>>>>> g=featureNames(raw6)[1:2] >>>>>> plotCtOverview(raw6, genes=g, groups=groupID$Treatment, >>>>>> col=rainbow(5)) >>>>>> >>>>>> #This plots 1 gene across all 48 samples# >>>>>> #but the legend doesnt behave, its placed on top of the plot, and I >>>>>> cant >>>>>> get it to display in a non-overlapping fashion# >>>>>> #I've tried all sorts of things in par, but nothing seems to shift >>>>>> the >>>>>> legend's position# >>>>>> >>>>> As the old saying goes, whenever you want a job done well, you'll have >>>>> to >>>>> do it yourself ;). In this case, the easiest thing is probably to use >>>>> legend=FALSE in plotCtOverview, and then afterwards add it yourself in >>>>> the >>>>> desired location using legend(). That way, if you have a lot of >>>>> different >>>>> features or groups to display, you can also use the ncol parameter in >>>>> legend to make several columns within the legend, such as 3x4 instead >>>>> of >>>>> the default 12x1. >>>>> >>>>> Alternatively, you can use either xlim or ylim in plotCtOverview to >>>>> add >>>>> some empty space on the side where there's then room for the legend. >>>>> >>>>>> #I now want to plot a subset of the samples for specific genes# >>>>>>> LOY=subset(groupID,groupID$Treatment=="LO" | groupID$Treatment== >>>>>>> "LFY") >>>>>>> LOY >>>>>> Sample Treatment >>>>>> 2 L20 LFY >>>>>> 5 L30 LFY >>>>>> 7 L45 LO >>>>>> 20 L40 LO >>>>>> 27 L43 LO >>>>>> 33 L29 LFY >>>>>> 36 L38 LO >>>>>> 40 L39 LO >>>>>> 43 L23 LFY >>>>>> >>>>>> >>>>>>> plotCtOverview(raw6, genes=g, groups=LOY, col=rainbow(5)) >>>>>> Warning messages: >>>>>> 1: In split.default(t(x), sample.split) : >>>>>> data length is not a multiple of split variable >>>>>> 2: In qt(p, df, lower.tail, log.p) : NaNs produced >>>>>>> >>>>> >>>>> Does it make sense if you split by groups=LOY$Treatment? It looks like >>>>> the >>>>> object LOY itself is a data frame, rather than the expected vector. >>>>> >>>>> Also, you may have to 'repeat' the col=rainbow() argument to fit your >>>>> number of features. >>>>> >>>>>> >>>>>> #it displays the two groups defined by treatment, but doesnt do so >>>>>> nicely, >>>>>> very skinny bars, and the legend doesnt reflect what its displaying# >>>>>> #again, I've tried monkeying around with par, but not sure what >>>>>> HTqPCR >>>>>> is >>>>>> calling to make the plots# >>>>>> >>>>> If the bars are very skinny, it's probably because you're displaying a >>>>> lot >>>>> of features. Nothing much to do about that, except increasing the >>>>> width >>>>> or >>>>> your plot :(. >>>>> >>>>> \Heidi >>>>> >>>>>> please help! >>>>>> >>>>>> thanks >>>>>> >>>>>> Simon. >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> > >
ADD REPLY
0
Entering edit mode
Hi Heidi, I think I've identified the problem. Currently it appears as though HTqPCR reads the sample ID's and genes in from top to bottom of the CSV output from the Biomark. This is not the sample order we load in. As long as thats made clear in the vignette, it will prevent any confusion. We typically have a loading list, in which we associate samples with groups (numbered 1-48, or 1-96 for both formats). I was getting confusing results (laid out below) as I assumed HTqPCR associated sample IDs in the loading order, not the CSV format top to bottom. Am I correct here in how HTqPCR reads in the data from the CSV file? thanks again, best Simon On Jun 28, 2012, at 2:11 PM, Simon Melov wrote: > Hi Heidi, > getting there, hopefully if you can clarify the following issue, all will be well and good. > > After yesterdays correspondence, I'm now producing nice plots, when I check the actual values being plotted, they dont match up > to the sample ID's. In fact, if I dont bother assigning groups, the sample ID's dont match to their respective gene CT values. I'm > worried there is some underlying problem with the data structure I'm not understanding. > > I understand the code, its just the samples dont match the reported gene values in the csv file. > > for example > >> head(groupID) > Sample Treatment > 1 S28 SMY > 2 L20 LFY > 3 M26 MMY > 4 L1 LFR > 5 L30 LFY > 6 K13 KMO > >> plotCtOverview(raw6, genes=featureNames(raw6)[1], group=groupID$Treatment,legend=FALSE, col=1:length(unique(groupID$Treatment))) > > produces a nice plot of a tubulin gene across the groups, as you suggested yesterday . Yet if I look at the values, they dont match > the CSV values for specific genes/samples I used. If I turn off groups, and look at samples without merging by group, I can see that the values dont match the appropriate gene being > displayed. My question is, where is the sample order being drawn from in the CSV file? Is there a simple check I can use to see that what is being plotted, > is what I think is being plotted? The group ID sample-Treatment is correct, and all the samples in the original CSV file are correct. > > Is it possible that the package is assigning gene/sample ID in some other order than that I've supplied? > I just want to be sure that when HTqPCR pulls the sample ID and maps it to the appropriate gene/Group, some transformation is not happening. > > Fluidigm suggests a particular order in loading samples and genes. These are numbered 1-48 (sample), and 1-48 (gene) for a 48.48 plate (and the same for a 96.96 plate). > This is the order I supplied the sample IDs in the groupID file above. How do you map the raw csv output to gene/sample id? > > Is there a way of checking that the sample/gene/group ID is correct? > > as always, thanks in advance for your help > > best > > s > On Jun 27, 2012, at 3:27 PM, Heidi Dvinge wrote: > >>> Hi Heidi, >>> you are correct, yes 48.48. >>> The example you provide below is exactly what I needed for clarification >>> for groups. I was trying to reverse engineer what you had done with the >>> original expression set package for microarrays, but from below, I can get >>> this to work now. >>> >> Glad it works. Hopefully by the next BioConductor release I'll remember to >> clarify the plotCtOverview help file. >> >>> Just to be clear, I have 5 48.48 plates. Should I normalize each >>> individually prior to combining, or should I reformat to a 2304x1 each, >>> combine, then normalize (not sure if you can do that or not) >>> >> Hm, that's one of the questions I've also been asking myself, so I would >> be curious to hear what your results from this are. >> >> If you suspect that there are some major factors influencing the 5 plates >> systematically, then normalising them in a 2304 x 5 object should >> (hopefully) correct for that. For example, they may have been run on >> different days, by different people, or perhaps there was a short power >> cut during the processing of one of them. This might be visible if you >> have for example a boxplot of Ct from all 48*5 samples, and you see blocks >> of them shifted up or down. >> >> Obviously, this doesn't take care of normalisation between your samples >> within each plate though. If you suspect your samples to have some >> systematic variation that you need to account for, then you can normalise >> each plate individually (as a 48 x 48) object. Alternatively, you can try >> to combine within- and between-sample normalisation by taking all 48 x 240 >> values at once. >> >> In principle, you can split, reformat and the recombine the data in >> however many ways you like. Personally, with any sort of data I prefer to >> go with as little preprocessing as possible, since each additional step >> can potentially introduce its own biases into the data. So unless there >> are some obvious variation between your 5 plates, I'd probably stick with >> just normalisation between the samples, e.. using a 48 x 240 object. >> >> Of course, you may have different preferences, or find out that a >> completely different approach is required for this particular data set. >> >> \Heidi >> >>> thanks again for your prompt responses! >>> >>> best >>> >>> s >>> >>> On Jun 27, 2012, at 2:26 PM, Heidi Dvinge wrote: >>> >>>> Hi Simon, >>>> >>>>> Thanks for the help Heidi, >>>>> but I'm still having troubles, your comments on the plotting helped me >>>>> solve the outputs. But if I want to just display some groups (for >>>>> example >>>>> the LO group in the example below), how do I associate a group with >>>>> multiple samples (ie biological reps)? >>>>> >>>>> Currently I'm associating genes with samples by reading in the file as >>>>> below >>>>> plate6=read.delim("plate6Sample.txt", header=FALSE) >>>>> #this is a file to associate sample ID with the genes in the biomark >>>>> data, >>>>> as currently HTqPCR does not seem to associate the sample info in the >>>>> Biomark output to the gene IDs >>>>> >>>> Erm, no, it doesn't :-/ >>>> >>>>> samples=as.vector(t(plate6)) >>>>> raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48, >>>>> n.data=48, samples=samples) >>>>> #now I have samples and genes similar to your example in the guide, but >>>>> I >>>>> want to associate samples to groups now. In the guide, you have an >>>>> example >>>>> where you have entire files as distinct samples, but in our runs, we >>>>> never >>>>> have that situation. I have a file which associates samples to groups, >>>>> which I read in... >>>>> >>>>> groupID=read.csv("plate6key.csv") >>>>> >>>>> but how do I associate the samples with their appropriate groups for >>>>> biological replicates with any of the functions in HtQPCR? >>>> >>>> I'm afraid I'm slightly confused here (sorry, long day). Exactly how is >>>> your data formatted? I.e. are the columns either individual samples, or >>>> from files containing multiple samples? So for example for a single >>>> 48.48 >>>> arrays, is your qPCRset object 2304 x 1 or 48 x 48? >>>> >>>> From your readCtData command I'm guessing you have 48 x 48, i.e. all 48 >>>> samples from your 1 array are in columns. In that case the 'groups' >>>> parameter in plotCtOverview will need to be a vector of length 48, >>>> indicating how you want the 48 columns in your qPCRset object to be >>>> grouped together. >>>> >>>> Below is an example of (admittedly ugly) plots. I don't know if that's >>>> similar to your situation at all. >>>> >>>> \Heidi >>>> >>>>> # Reading in data >>>>> exPath <- system.file("exData", package = "HTqPCR") >>>>> raw1 <- readCtData(files = "BioMark_sample.csv", path = exPath, format >>>>> = >>>> "BioMark", n.features = 48, n.data = 48) >>>>> # Check sample names >>>>> head(sampleNames(raw1)) >>>> [1] "Sample1" "Sample2" "Sample3" "Sample4" "Sample5" "Sample6" >>>>> # Associate samples with (randomly chosen) groups >>>>> anno <- data.frame(sampleID=sampleNames(raw1), Group=rep(c("A", "B", >>>> "C", "D"), times=c(4,24,5,15))) >>>>> head(anno) >>>> sampleID Group >>>> 1 Sample1 A >>>> 2 Sample2 A >>>> 3 Sample3 A >>>> 4 Sample4 A >>>> 5 Sample5 B >>>> 6 Sample6 B >>>>> # Plot the first gene - for each sample individually >>>>> plotCtOverview(raw1, genes=featureNames(raw1)[1], legend=FALSE, >>>> col=1:nrow(anno)) >>>>> # Plot the first gene - for each group >>>>> plotCtOverview(raw1, genes=featureNames(raw1)[1], group=anno$Group, >>>> legend=FALSE, col=1:length(unique(anno$Group))) >>>>> # Plot the first gene, using group "A" as a control >>>>> plotCtOverview(raw1, genes=featureNames(raw1)[1], group=anno$Group, >>>> legend=FALSE, col=1:length(unique(anno$Group)), calibrator="A") >>>> >>>> >>>> >>>>> You recommend below using a vector, but I dont see how that helps me >>>>> associate the samples in the Expression set. >>>>> >>>>> thanks again! >>>>> >>>>> s >>>>> >>>>> On Jun 26, 2012, at 12:48 PM, Heidi Dvinge wrote: >>>>> >>>>>>> Hi, >>>>>>> I'm having some troubles selectively sub-setting, and graphing up >>>>>>> QPCR >>>>>>> data within HTqPCR for Biomark plates (both 48.48 and 96.96 plates). >>>>>>> I'd >>>>>>> like to be able to visualize specific genes, with specific groups we >>>>>>> run >>>>>>> routinely on our Biomark system. Typical runs are across multiple >>>>>>> plates, >>>>>>> and have multiple biological replicates, and usually 2 or more >>>>>>> technical >>>>>>> replicates (although we are moving away from technical reps, as the >>>>>>> CVs >>>>>>> are so tight). >>>>>>> >>>>>>> Can anyone help with this? Heidi? >>>>>>> >>>>>>> raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48, >>>>>>> n.data=48, samples=samples) >>>>>>> #Ive read the samples in from a separate file, as when you read it >>>>>>> in, >>>>>>> it >>>>>>> doesnt take the sample names supplied in the biomark output# >>>>>>> #Next, I want to plot genes of interest, with samples of interest, >>>>>>> and >>>>>>> I'm >>>>>>> having trouble getting an appropriate output# >>>>>>> >>>>>>> g=featureNames(raw6)[1:2] >>>>>>> plotCtOverview(raw6, genes=g, groups=groupID$Treatment, >>>>>>> col=rainbow(5)) >>>>>>> >>>>>>> #This plots 1 gene across all 48 samples# >>>>>>> #but the legend doesnt behave, its placed on top of the plot, and I >>>>>>> cant >>>>>>> get it to display in a non-overlapping fashion# >>>>>>> #I've tried all sorts of things in par, but nothing seems to shift >>>>>>> the >>>>>>> legend's position# >>>>>>> >>>>>> As the old saying goes, whenever you want a job done well, you'll have >>>>>> to >>>>>> do it yourself ;). In this case, the easiest thing is probably to use >>>>>> legend=FALSE in plotCtOverview, and then afterwards add it yourself in >>>>>> the >>>>>> desired location using legend(). That way, if you have a lot of >>>>>> different >>>>>> features or groups to display, you can also use the ncol parameter in >>>>>> legend to make several columns within the legend, such as 3x4 instead >>>>>> of >>>>>> the default 12x1. >>>>>> >>>>>> Alternatively, you can use either xlim or ylim in plotCtOverview to >>>>>> add >>>>>> some empty space on the side where there's then room for the legend. >>>>>> >>>>>>> #I now want to plot a subset of the samples for specific genes# >>>>>>>> LOY=subset(groupID,groupID$Treatment=="LO" | groupID$Treatment== >>>>>>>> "LFY") >>>>>>>> LOY >>>>>>> Sample Treatment >>>>>>> 2 L20 LFY >>>>>>> 5 L30 LFY >>>>>>> 7 L45 LO >>>>>>> 20 L40 LO >>>>>>> 27 L43 LO >>>>>>> 33 L29 LFY >>>>>>> 36 L38 LO >>>>>>> 40 L39 LO >>>>>>> 43 L23 LFY >>>>>>> >>>>>>> >>>>>>>> plotCtOverview(raw6, genes=g, groups=LOY, col=rainbow(5)) >>>>>>> Warning messages: >>>>>>> 1: In split.default(t(x), sample.split) : >>>>>>> data length is not a multiple of split variable >>>>>>> 2: In qt(p, df, lower.tail, log.p) : NaNs produced >>>>>>>> >>>>>> >>>>>> Does it make sense if you split by groups=LOY$Treatment? It looks like >>>>>> the >>>>>> object LOY itself is a data frame, rather than the expected vector. >>>>>> >>>>>> Also, you may have to 'repeat' the col=rainbow() argument to fit your >>>>>> number of features. >>>>>> >>>>>>> >>>>>>> #it displays the two groups defined by treatment, but doesnt do so >>>>>>> nicely, >>>>>>> very skinny bars, and the legend doesnt reflect what its displaying# >>>>>>> #again, I've tried monkeying around with par, but not sure what >>>>>>> HTqPCR >>>>>>> is >>>>>>> calling to make the plots# >>>>>>> >>>>>> If the bars are very skinny, it's probably because you're displaying a >>>>>> lot >>>>>> of features. Nothing much to do about that, except increasing the >>>>>> width >>>>>> or >>>>>> your plot :(. >>>>>> >>>>>> \Heidi >>>>>> >>>>>>> please help! >>>>>>> >>>>>>> thanks >>>>>>> >>>>>>> Simon. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at r-project.org >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>> Search the archives: >>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6