HTqPCR to analyze Fluidigm 96.96 Dynamic Array data

0

Entering edit mode

V. Oostra ▴ 30

@v-oostra-4131

Last seen 9.6 years ago

Dear list, I'm having some difficulties in using HTqPCR to analyze qPCR data obtained using the Biomark Fluidigm 96.96 array. With the Fluidigm chips, one can measure expression of 96 genes in 96 samples on one plate, i.e. 9216 PCRs per plate (see http://www.fluidigm.com/products/biomark-chips.html for details). In my experiment, I use 9 such plates. On each plate I have 88 different experimental samples, with different samples on each plate, totalling 792 unique experimental samples (associated with specific experimental conditions). On each plate, I also have 8 standard samples that are the same across all plates (1 NTC, 1 cDNA mix +RT, 1 cDNA mix -RT, 5 samples of a dilution series). I use 32 different genes (features), each replicated 3 times, in the same order on each plate. Each original data file (as exported by the FLuidigm software) has data on one plate, i.e. 9216 rows with one PCR per row, with columns for sample name, feature name, Ct, quality calls, etc. I managed to read in the 9 data files (from 9 plates) into one qPCRset object: An object of class "qPCRset" Size: 96 features, 864 samples Feature types: Reference, Test Feature names: 1.BGRP 1.BGRP 1.BGRP ... Feature classes: Feature categories: OK, Undetermined Sample names: A1.1h A1.6a A1.11h ... Is this a good way to structure my data? Or would it be better to create 9 qPCRset objects (1 for each plate)? Before spending more effort continuing this approach I'd appreciate your opinion on whether this is the way forward. Among other things, I would like to do the following: 1. Check for spatial effects. When I use plotCtCard, it only plots one sample at a time, even though I have 96 samples on each plate. Is it possible to plot my 96 samples x 96 features? How can I specify this kind of layout? 2. Control for plate-specific effects. I have the same 8 standard samples on each plate (for all genes), and would like to use these repeated measurements to 'normalize' all other data across plates. However, I'm having a hard time even accessing and plotting the data. 3. Speficiy technical replicates. Each sample has been run on 32 genes in triplicate. Each feature name is represented 3 times (once for each technical rep). How can I specify that my 96 features are grouped per 3? 4. Add information about my experimental design. My 792 experimental samples were obtained in a full factorial design with several biological replicates per treatment. Is it possible to add extra data to my qPCR set object? E.g. a matrix containing, per sample, information on sample name, value for factor 1, value for factor 2, etc.? I do understand that this package was not developed specifically for dealing with data from these Fluidigm chips, but I haven't found any such package and as far as I know HTqPCR is the best package around for analysis of high-throughput qPCR data. I hope someone can help me out a little bit. I'm new to R, but I'm not asking you to do my work for me, just some directions to help me do it myself. Thanks! Cheers, Vicencio [[alternative HTML version deleted]]

qPCR HTqPCR qPCR HTqPCR • 2.4k views

ADD COMMENT • link updated 13.9 years ago by Heidi Dvinge ★ 2.0k • written 13.9 years ago by V. Oostra ▴ 30

0

Entering edit mode

Heidi Dvinge ★ 2.0k

@heidi-dvinge-2195

Last seen 9.6 years ago

Hello Vicencio, > Dear list, > > I'm having some difficulties in using HTqPCR to analyze qPCR data obtained > using the Biomark Fluidigm 96.96 array. > interesting question. For a while I was toying with the idea of incorporating functions specifically for Fluidigm data into HTqPCR. I never went through with it though, since each individual Fluidigm array can have its own design, so it's not necessarily common across samples the way it is for e.g. ABI and Roche cards. Nevertheless, it should be possible to use HTqPCR for Fluidigm data. > With the Fluidigm chips, one can measure expression of 96 genes in 96 > samples on one plate, i.e. 9216 PCRs per plate (see > http://www.fluidigm.com/products/biomark-chips.html for details). > > In my experiment, I use 9 such plates. On each plate I have 88 different > experimental samples, with different samples on each plate, totalling 792 > unique experimental samples (associated with specific experimental > conditions). On each plate, I also have 8 standard samples that are the > same across all plates (1 NTC, 1 cDNA mix +RT, 1 cDNA mix -RT, 5 samples > of a dilution series). > > > I use 32 different genes (features), each replicated 3 times, in the same > order on each plate. > > Each original data file (as exported by the FLuidigm software) has data on > one plate, i.e. 9216 rows with one PCR per row, with columns for sample > name, feature name, Ct, quality calls, etc. > I managed to read in the 9 data files (from 9 plates) into one qPCRset > object: > An object of class "qPCRset" > Size: 96 features, 864 samples > Feature types: Reference, Test > Feature names: 1.BGRP 1.BGRP 1.BGRP ... > Feature classes: > Feature categories: OK, Undetermined > Sample names: A1.1h A1.6a A1.11h ... > > Is this a good way to structure my data? Or would it be better to create 9 > qPCRset objects (1 for each plate)? Before spending more effort continuing > this approach I'd appreciate your opinion on whether this is the way > forward. > It depends a bit on how clean your data is, and how you want to preprocess it. If you suspect there are any array-specific effects at all, you'll probably want to normalise your 9 plates separately, i.e. have them in a qPCRset object with 96x96 rows and 9 columns. Do you have your data in a single or 9 files? Either way, you can create such a qPCRset. Or possibly, if you want to use the object you already have loaded into R, you can split it up using something like this (untested, and unelegant): q <- your_qPCRset # To get the columns originating from the same array start <- seq(1, 9*96, 96) # Make a list of 9 individual 96x96 qPCRset objects q.list <- list() for (i in seq_along(start)) { q.list[[i]] <- q[,start[i]:(start[i]+95)] } # Convert each list entry from 96x96 to 9216x1 dimension qPCRset for (i in seq_along(q.list)) { temp <- list() for (j in 1:96) temp[[j]] <- q.list[[i]][,j] q.list[[i]] <- do.call("rbind", temp) } # Join them all together into 9216x9 q.new <- do.call("cbind", q.list) A bit of data exploration is probably required to check whether you have any particular biases that needs correcting in your data. Based on the qPCRset object you have now, you can e.g. try clustering your data using clusterCt(), and see if the samples, especially the controls, cluster together by sample type or based on what array they were run on. Also, what's the correlation between samples like (plotCtCor)? By the time you get to the actual statistical testing you'd want your data in a format like the one you have now, i.e. 1 row per gene (3 rows per gene in your case due to your replicates) and 1 column per sample. If your start with 9216 rows x 9 columns for doing the normalisation, you can reformat the data afterwards using the changeCtLayout function. > > Among other things, I would like to do the following: > 1. Check for spatial effects. When I use plotCtCard, it only plots one > sample at a time, even though I have 96 samples on each plate. Is it > possible to plot my 96 samples x 96 features? How can I specify this kind > of layout? To plot each array separately, you'd need to have each array in a single column! Note though, that the plotCtCard is optimised for the standard size rectangular well plate. Aesthetically speaking it might not look so nice for a 96x96 square array. I started making a plotCtArray function for Fluidigm data at some point; let me know if you're keen to be a guinea pig. > 2. Control for plate-specific effects. I have the same 8 standard samples > on each plate (for all genes), and would like to use these repeated > measurements to 'normalize' all other data across plates. However, I'm > having a hard time even accessing and plotting the data. For using these 8 control genes for normalisation you can use the function normalizeCtData(q, norm = "deltaCt", deltaCt.genes) where deltaCt.genes is a vector of the gene names you want to use as standard. Note that these 8 gene names must appear exactly as they are in featureNames(q). > 3. Speficiy technical replicates. Each sample has been run on 32 genes in > triplicate. Each feature name is represented 3 times (once for each > technical rep). How can I specify that my 96 features are grouped per 3? You don't have to specify technical replicates directly anywhere within your qPCRset objects. Several functions, such as ttestCtData has a parameter "replicates" which can be set to TRUE if you want to consider replicates. If so the function(s) combine data across genes that have identical featureNames. A small note here: featureNames don't have to be unique, in fact it's often easier for downstream analysis if identical genes are named the same, and not e.g. gene1_rep1, gene1_rep2 etc. The way to tell them apart is then using the featurePos information. This corresponds to the location of each gene on the array, or pos1...pos9216 if not positional information is supplied to readCtData. The output from e.g. ttestCtData will report both the featureNames and featurePos, so even for replicates you can always trace each result back to the original value. > 4. Add information about my experimental design. My 792 experimental > samples were obtained in a full factorial design with several biological > replicates per treatment. Is it possible to add extra data to my qPCR set > object? E.g. a matrix containing, per sample, information on sample name, > value for factor 1, value for factor 2, etc.? > I'm afraid there's no "optional" slot in qPCRset objects where users can add additional data, whether that's data frames, matrices or lists. HTH \Heidi > I do understand that this package was not developed specifically for > dealing with data from these Fluidigm chips, but I haven't found any such > package and as far as I know HTqPCR is the best package around for > analysis of high-throughput qPCR data. > > I hope someone can help me out a little bit. I'm new to R, but I'm not > asking you to do my work for me, just some directions to help me do it > myself. Thanks! > > Cheers, > Vicencio > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 13.9 years ago Heidi Dvinge ★ 2.0k

0

Entering edit mode

Heidi Dvinge ★ 2.0k

@heidi-dvinge-2195

Last seen 9.6 years ago

Hello Vicencio, > Dear list, > > I'm having some difficulties in using HTqPCR to analyze qPCR data obtained > using the Biomark Fluidigm 96.96 array. > interesting question. For a while I was toying with the idea of incorporating functions specifically for Fluidigm data into HTqPCR. I never went through with it though, since each individual Fluidigm array can have its own design, so it's not necessarily common across samples the way it is for e.g. ABI and Roche cards. Nevertheless, it should be possible to use HTqPCR for Fluidigm data. > With the Fluidigm chips, one can measure expression of 96 genes in 96 > samples on one plate, i.e. 9216 PCRs per plate (see > http://www.fluidigm.com/products/biomark-chips.html for details). > > In my experiment, I use 9 such plates. On each plate I have 88 different > experimental samples, with different samples on each plate, totalling 792 > unique experimental samples (associated with specific experimental > conditions). On each plate, I also have 8 standard samples that are the > same across all plates (1 NTC, 1 cDNA mix +RT, 1 cDNA mix -RT, 5 samples > of a dilution series). > > > I use 32 different genes (features), each replicated 3 times, in the same > order on each plate. > > Each original data file (as exported by the FLuidigm software) has data on > one plate, i.e. 9216 rows with one PCR per row, with columns for sample > name, feature name, Ct, quality calls, etc. > I managed to read in the 9 data files (from 9 plates) into one qPCRset > object: > An object of class "qPCRset" > Size: 96 features, 864 samples > Feature types: Reference, Test > Feature names: 1.BGRP 1.BGRP 1.BGRP ... > Feature classes: > Feature categories: OK, Undetermined > Sample names: A1.1h A1.6a A1.11h ... > > Is this a good way to structure my data? Or would it be better to create 9 > qPCRset objects (1 for each plate)? Before spending more effort continuing > this approach I'd appreciate your opinion on whether this is the way > forward. > It depends a bit on how clean your data is, and how you want to preprocess it. If you suspect there are any array-specific effects at all, you'll probably want to normalise your 9 plates separately, i.e. have them in a qPCRset object with 96x96 rows and 9 columns. Do you have your data in a single or 9 files? Either way, you can create such a qPCRset. Or possibly, if you want to use the object you already have loaded into R, you can split it up using something like this (untested, and unelegant): q <- your_qPCRset # To get the columns originating from the same array start <- seq(1, 9*96, 96) # Make a list of 9 individual 96x96 qPCRset objects q.list <- list() for (i in seq_along(start)) { q.list[[i]] <- q[,start[i]:(start[i]+95)] } # Convert each list entry from 96x96 to 9216x1 dimension qPCRset for (i in seq_along(q.list)) { temp <- list() for (j in 1:96) temp[[j]] <- q.list[[i]][,j] q.list[[i]] <- do.call("rbind", temp) } # Join them all together into 9216x9 q.new <- do.call("cbind", q.list) A bit of data exploration is probably required to check whether you have any particular biases that needs correcting in your data. Based on the qPCRset object you have now, you can e.g. try clustering your data using clusterCt(), and see if the samples, especially the controls, cluster together by sample type or based on what array they were run on. Also, what's the correlation between samples like (plotCtCor)? By the time you get to the actual statistical testing you'd want your data in a format like the one you have now, i.e. 1 row per gene (3 rows per gene in your case due to your replicates) and 1 column per sample. If your start with 9216 rows x 9 columns for doing the normalisation, you can reformat the data afterwards using the changeCtLayout function. > > Among other things, I would like to do the following: > 1. Check for spatial effects. When I use plotCtCard, it only plots one > sample at a time, even though I have 96 samples on each plate. Is it > possible to plot my 96 samples x 96 features? How can I specify this kind > of layout? To plot each array separately, you'd need to have each array in a single column! Note though, that the plotCtCard is optimised for the standard size rectangular well plate. Aesthetically speaking it might not look so nice for a 96x96 square array. I started making a plotCtArray function for Fluidigm data at some point; let me know if you're keen to be a guinea pig. > 2. Control for plate-specific effects. I have the same 8 standard samples > on each plate (for all genes), and would like to use these repeated > measurements to 'normalize' all other data across plates. However, I'm > having a hard time even accessing and plotting the data. For using these 8 control genes for normalisation you can use the function normalizeCtData(q, norm = "deltaCt", deltaCt.genes) where deltaCt.genes is a vector of the gene names you want to use as standard. Note that these 8 gene names must appear exactly as they are in featureNames(q). > 3. Speficiy technical replicates. Each sample has been run on 32 genes in > triplicate. Each feature name is represented 3 times (once for each > technical rep). How can I specify that my 96 features are grouped per 3? You don't have to specify technical replicates directly anywhere within your qPCRset objects. Several functions, such as ttestCtData has a parameter "replicates" which can be set to TRUE if you want to consider replicates. If so the function(s) combine data across genes that have identical featureNames. A small note here: featureNames don't have to be unique, in fact it's often easier for downstream analysis if identical genes are named the same, and not e.g. gene1_rep1, gene1_rep2 etc. The way to tell them apart is then using the featurePos information. This corresponds to the location of each gene on the array, or pos1...pos9216 if not positional information is supplied to readCtData. The output from e.g. ttestCtData will report both the featureNames and featurePos, so even for replicates you can always trace each result back to the original value. > 4. Add information about my experimental design. My 792 experimental > samples were obtained in a full factorial design with several biological > replicates per treatment. Is it possible to add extra data to my qPCR set > object? E.g. a matrix containing, per sample, information on sample name, > value for factor 1, value for factor 2, etc.? > I'm afraid there's no "optional" slot in qPCRset objects where users can add additional data, whether that's data frames, matrices or lists. HTH \Heidi > I do understand that this package was not developed specifically for > dealing with data from these Fluidigm chips, but I haven't found any such > package and as far as I know HTqPCR is the best package around for > analysis of high-throughput qPCR data. > > I hope someone can help me out a little bit. I'm new to R, but I'm not > asking you to do my work for me, just some directions to help me do it > myself. Thanks! > > Cheers, > Vicencio > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 13.9 years ago Heidi Dvinge ★ 2.0k

0

Entering edit mode

Hello Heidi, Thanks a lot for your heplful comments. I followed your suggestions regarding the re-structuring of the data, and am now doing some basic data exploration. I will give an update once I know a bit more, but I do have some questions already in relation to your comments. > > Is this a good way to structure my data? Or would it be better to create > 9 > > qPCRset objects (1 for each plate)? Before spending more effort > continuing > > this approach I'd appreciate your opinion on whether this is the way > > forward. > > > It depends a bit on how clean your data is, and how you want to preprocess > it. If you suspect there are any array-specific effects at all, you'll > probably want to normalise your 9 plates separately, i.e. have them in a > qPCRset object with 96x96 rows and 9 columns. > > Do you have your data in a single or 9 files? Either way, you can create > such a qPCRset. Or possibly, if you want to use the object you already > have loaded into R, you can split it up using something like this > (untested, and unelegant): > I had my original data in 9 files, but I imported them into one qPCRset object (96 features, 864 samples). Using your code I splitted it up again and I found the array-specific effects--which I indeed suspected: I had run the same dilution series on all arrays, so those samples should have similar Ct values across arrays. However, when I plotted them (for each gene separately) against array I found some arrays that were consistently higher or lower than the others. My idea is to use this dilution series to correct for array-specific effects. Note that I'm not talking about control genes here, but 'control' samples: the same samples ran for all genes on all 9 arrays. It never occurred to me to use different types of data structuring for different parts of the analysis (data eploration, QC, etc). A very good idea! > A bit of data exploration is probably required to check whether you have > any particular biases that needs correcting in your data. Based on the > qPCRset object you have now, you can e.g. try clustering your data using > clusterCt(), and see if the samples, especially the controls, cluster > together by sample type or based on what array they were run on. Also, > what's the correlation between samples like (plotCtCor)? Actually, this didn't work for me, neither in my old or new data structure. Where can I add information on types of samples? E.g. with in 'Feature type' I can put info on each gene (reference, test). But is where can I put info on each sample? E.g. positive control sample, negative control, etc.? In a data frame separate from the qPCRset? Cheers,, Vicencio

ADD REPLY • link 13.9 years ago V. Oostra ▴ 30

0

Entering edit mode

Hello Vicencio, sorry, your email seems to have slipped under my radar. > Hello Heidi, > > Thanks a lot for your heplful comments. I followed your suggestions > regarding the re-structuring of the data, and am now doing some basic > data exploration. I will give an update once I know a bit more, but I do > have some questions already in relation to your comments. > >> > Is this a good way to structure my data? Or would it be better to > create >> 9 >> > qPCRset objects (1 for each plate)? Before spending more effort >> continuing >> > this approach I'd appreciate your opinion on whether this is the way >> > forward. >> > >> It depends a bit on how clean your data is, and how you want to > preprocess >> it. If you suspect there are any array-specific effects at all, you'll >> probably want to normalise your 9 plates separately, i.e. have them in > a >> qPCRset object with 96x96 rows and 9 columns. >> >> Do you have your data in a single or 9 files? Either way, you can > create >> such a qPCRset. Or possibly, if you want to use the object you already >> have loaded into R, you can split it up using something like this >> (untested, and unelegant): >> > > I had my original data in 9 files, but I imported them into one qPCRset > object (96 features, 864 samples). Using your code I splitted it up > again and I found the array-specific effects--which I indeed suspected: > I had run the same dilution series on all arrays, so those samples > should have similar Ct values across arrays. However, when I plotted > them (for each gene separately) against array I found some arrays that > were consistently higher or lower than the others. My idea is to use > this dilution series to correct for array-specific effects. That makes sense. So if you have your data as e.g. 9216 x 9 (i.e. treat each individual plate as a "pseudo-sample"), then I guess you can e.g. rename your featureNames to be gene_sample to get all 96x96 combinations and then select the gene_controlsample to normalise against. > Note that > I'm not talking about control genes here, but 'control' samples: the > same samples ran for all genes on all 9 arrays. > It never occurred to me to use different types of data structuring for > different parts of the analysis (data eploration, QC, etc). A very good > idea! > > >> A bit of data exploration is probably required to check whether you > have >> any particular biases that needs correcting in your data. Based on the >> qPCRset object you have now, you can e.g. try clustering your data > using >> clusterCt(), and see if the samples, especially the controls, cluster >> together by sample type or based on what array they were run on. Also, >> what's the correlation between samples like (plotCtCor)? > > Actually, this didn't work for me, neither in my old or new data > structure. Exactly what didn't work here? Both clusterCt, plotCtCor and/or other functions? If you can send me the exact code+error message I can try to chase down any bugs. > Where can I add information on types of samples? E.g. with in > 'Feature type' I can put info on each gene (reference, test). But is > where can I put info on each sample? E.g. positive control sample, > negative control, etc.? In a data frame separate from the qPCRset? > I'm afraid you'll probably have to do this externally, i.e. separate from the qPCRset object. Unless of course you use featureType, and instead of reference and test use the name of each individual sample. There shouldn't be any upper limit on how many different groups can be in featureType. HTH \Heidi > Cheers,, > Vicencio > > ------------------------//------------------------ Heidi Dvinge European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge, CB10 1SD heidi at ebi.ac.uk

ADD REPLY • link 13.8 years ago Heidi Dvinge ★ 2.0k

Login before adding your answer.