Is there an expected coefficient of variation for the hybridization probe intensities among technical replicates?

0

Entering edit mode

Matthew Thornton ▴ 350

@matthew-thornton-5564

Last seen 5 weeks ago

USA, Los Angeles, USC

Hello! I am digging through the control intensities for technical replicates obtained with an Affymetrix GeneChip Rat Gene 2.0 ST array. I have 3 technical replicates of the same sample. The kernel density plot overlay of the raw data shows differences in the intensity distributions. When I pull out the raw intensities with grepl("^AFFX", GeneName), I get the recorded intensities for each pixel mapped to a TranscriptClusterID corresponding to the particular AFFX control, some are 20 pixels others are 11. The groups of AFFX controls I get are the hybridization controls: AFFX-BioB, AFFX-BioC, AFFX-BioDn, AFFX-CreX; then the PolyA controls:AFFX-DapX, AFFX-LysX, AFFX-PheX, AFFX-ThrX, AFFX-TrpnX (?! not listed anywhere, is it a PolyA or hyb?), AFFX-r2-Bs-dap, AFFX-r2-Bs-lys, AFFX-r2-Bs-phe, and AFFX-r2-Bs-thr. Another set of PolyA controls: AFFX-r2-Ec-bioB, AFFX-r2-Ec-bioC, AFFX-r2-Ec-bioD, AFFX-r2-P1-cre. I also get the ERCC spike-in controls (which we did use). The maximum intensity from all controls is 25,069 grey levels the average is 1500 grey levels. What is a "good" average intensity? If I calculate the coefficient of variation across each row in my data set, which corresponds to each X,Y intensity measurement across my replicates for each probe. I see an average coefficient of variation at 22% just for the hybridization controls. Is this consistent with the experience of the list? What should be the expected coefficient of variation for the hybridization controls across technical replicates? the coefficient of variation is very similar for samples that show very low intensity values which are probably not expressed and for samples close to the average intensity. Even though I haven't done it, I don't think excluding probes based on a threshold with improve the average, because the numbers are uniformly similar. Here is my code: #!/usr/bin/Rscript --vanilla --slave # Load Libraries library(Biobase) library(xps) # Directory to store ROOT scheme files scmdir <- "/data/met/scmdir/" scheme_RaGene <- root.scheme(file.path(scmdir, "scheme_RaGene20stv1.root")) # Collect scheme information - Pulls out information from the Root scheme ann <- export(scheme_RaGene, treetype="ann", outfile="scheme_RaGene_ann.txt", as.dataframe=TRUE, verbose=FALSE) scm <- export(scheme_RaGene, treetype="scm", outfile="scheme_RaGene_scm.txt", as.dataframe=TRUE, verbose=FALSE) # more file locations datdir <- "/data/met/26Feb14_TR_CTRs/" celdir <- "/data/met/26Feb14_TR_CTRs/CEL/" celfiles <- c("CTR_TR1.CEL", "CTR_TR2.CEL", "CTR_TR3.CEL") celnames <- c("CTR_TR1", "CTR_TR2", "CTR_TR3") outdir <- "/data/met/26Feb14_TR_CTRs/" # Import CEL files and begin data_raw <- import.data(scheme_RaGene, "raw_data_CTR", filedir=outdir, celdir=celdir, celfiles=celfiles, celnames=celnames, verbose=TRUE) # Attach info data_raw <- attachMask(data_raw) data_raw <- attachInten(data_raw) tmp <- intensity(data_raw) write.csv(tmp, file="raw_data.csv", row.names=FALSE) # Save an image save.image(file="24Feb14_1.RData") ## ## Manipulation of "Raw Data" ## # plots png(file="Raw_Data_Density_Plot.png", width=600, height=600) par(mar=c(6,3,1,1)); hist(data_raw, add.legend=TRUE) dev.off() # Get AFFX controls for assessment of PolyA, ERCC, and Hyb annot_raw <- merge(tmp, scm, by=c("X", "Y"), all.x=TRUE) annot_raw2 <- merge(annot_raw, ann, all.x=TRUE) raw_data <- annot_raw2[ order(annot_raw2[,3], annot_raw2[,2]), ] write.csv(raw_data, file="raw_data_annotated.csv", row.names=FALSE) # get the only handle for ERCC set.affx <- subset(raw_data, grepl("^AFFX", GeneName)) write.csv(set.affx, file="affx_controls_raw.csv", row.names=FALSE) # I usually open the csv file with a spreadsheet program delete the columns that I don't need and calculate the coefficient of variation for each set. Any comments, advice, or information are greatly appreciated!! Matt

probe probe • 1.5k views

ADD COMMENT • link updated 10.2 years ago by cstrato ★ 3.9k • written 10.2 years ago by Matthew Thornton ▴ 350

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 1 minute ago

United States

Hi Matt, I don't use xps, so will have to go about looking at this from an oligo perspective. It just so happens I am working on a study using these same arrays. library(oligo) dat <- read.celfiles(filenames = list.files("../CEL", pattern = "[Cc][Ee][Ll]$", full.names = TRUE)) eset <- rma(dat) con <- db(dat) > dbGetQuery(con, "select * from type_dict;") type type_id 1 1 main 2 2 control->affx 3 3 control->chip 4 4 control->bgp->antigenomic 5 5 control->bgp->genomic 6 6 normgene->exon 7 7 normgene->intron 8 8 rescue->FLmRNA->unmapped 9 9 control->affx->bac_spike 10 10 oligo_spike_in 11 11 r1_bac_spike_at > table(dbGetQuery(con, "select type from featureSet inner join core_mps using(fsetid);")[,1]) 1 2 4 7 9 215001 18 23 5083 18 So I see control->affx and control->affx->bac_spike controls. affx <- exprs(eset)[as.character(dbGetQuery(con, "select featureSet.transcript_cluster_id from featureSet inner join core_mps using(fsetid) where type='2';")[,1]),] affx2 <- exprs(eset)[as.character(dbGetQuery(con, "select featureSet.transcript_cluster_id from featureSet inner join core_mps using(fsetid) where type='9';")[,1]),] > z <-round(cbind(apply(affx, 1, function(x) sd(x)/mean(x)),apply(affx2, 1, function(x) sd(x)/mean(x)), deparse.level=0), 2) > row.names(z) <- NULL > z [,1] [,2] [1,] 0.02 0.02 [2,] 0.03 0.03 [3,] 0.02 0.02 [4,] 0.03 0.04 [5,] 0.02 0.02 [6,] 0.03 0.04 [7,] 0.01 0.01 [8,] 0.03 0.03 [9,] 0.01 0.01 [10,] 0.03 0.03 [11,] 0.01 0.01 [12,] 0.01 0.04 [13,] 0.01 0.01 [14,] 0.02 0.03 [15,] 0.01 0.00 [16,] 0.02 0.04 [17,] 0.01 0.00 [18,] 0.03 0.03 So much lower CVs than what you see. However, you should note that you are not using logged data, which is fairly nonsensical in this situation (e.g., computing a CV on highly skewed data is a recipe for biased results). Best, Jim On 2/26/2014 6:02 PM, Thornton, Matthew wrote: > Hello! > > I am digging through the control intensities for technical replicates obtained with an Affymetrix GeneChip Rat Gene 2.0 ST array. > > I have 3 technical replicates of the same sample. The kernel density plot overlay of the raw data shows differences in the intensity distributions. > > When I pull out the raw intensities with grepl("^AFFX", GeneName), I get the recorded intensities for each pixel mapped to a TranscriptClusterID corresponding to the particular AFFX control, some are 20 pixels others are 11. The groups of AFFX controls I get are the hybridization controls: AFFX-BioB, AFFX-BioC, AFFX-BioDn, > AFFX-CreX; then the PolyA controls:AFFX-DapX, AFFX-LysX, AFFX-PheX, AFFX-ThrX, AFFX-TrpnX (?! not listed anywhere, is it a PolyA or hyb?), AFFX-r2-Bs-dap, AFFX-r2-Bs-lys, AFFX-r2-Bs-phe, and AFFX-r2-Bs-thr. Another set of PolyA controls: AFFX-r2-Ec-bioB, AFFX-r2-Ec-bioC, AFFX-r2-Ec-bioD, AFFX-r2-P1-cre. I also get the ERCC spike-in controls (which we did use). The maximum intensity from all controls is 25,069 grey levels the average is 1500 grey levels. What is a "good" average intensity? > > If I calculate the coefficient of variation across each row in my data set, which corresponds to each X,Y intensity measurement across my replicates for each probe. I see an average coefficient of variation at 22% just for the hybridization controls. Is this consistent with the experience of the list? What should be the expected coefficient of variation for the hybridization controls across technical replicates? the coefficient of variation is very similar for samples that show very low intensity values which are probably not expressed and for samples close to the average intensity. Even though I haven't done it, I don't think excluding probes based on a threshold with improve the average, because the numbers are uniformly similar. > > Here is my code: > #!/usr/bin/Rscript --vanilla --slave > > # Load Libraries > library(Biobase) > library(xps) > > # Directory to store ROOT scheme files > scmdir <- "/data/met/scmdir/" > scheme_RaGene <- root.scheme(file.path(scmdir, "scheme_RaGene20stv1.root")) > > # Collect scheme information - Pulls out information from the Root scheme > ann <- export(scheme_RaGene, treetype="ann", outfile="scheme_RaGene_ann.txt", as.dataframe=TRUE, verbose=FALSE) > scm <- export(scheme_RaGene, treetype="scm", outfile="scheme_RaGene_scm.txt", as.dataframe=TRUE, verbose=FALSE) > > # more file locations > datdir <- "/data/met/26Feb14_TR_CTRs/" > celdir <- "/data/met/26Feb14_TR_CTRs/CEL/" > celfiles <- c("CTR_TR1.CEL", "CTR_TR2.CEL", "CTR_TR3.CEL") > celnames <- c("CTR_TR1", "CTR_TR2", "CTR_TR3") > outdir <- "/data/met/26Feb14_TR_CTRs/" > > # Import CEL files and begin > data_raw <- import.data(scheme_RaGene, "raw_data_CTR", filedir=outdir, celdir=celdir, celfiles=celfiles, celnames=celnames, verbose=TRUE) > > # Attach info > data_raw <- attachMask(data_raw) > data_raw <- attachInten(data_raw) > > tmp <- intensity(data_raw) > > write.csv(tmp, file="raw_data.csv", row.names=FALSE) > > # Save an image > save.image(file="24Feb14_1.RData") > > ## > ## Manipulation of "Raw Data" > ## > > # plots > png(file="Raw_Data_Density_Plot.png", width=600, height=600) > par(mar=c(6,3,1,1)); > hist(data_raw, add.legend=TRUE) > dev.off() > > # Get AFFX controls for assessment of PolyA, ERCC, and Hyb > annot_raw <- merge(tmp, scm, by=c("X", "Y"), all.x=TRUE) > annot_raw2 <- merge(annot_raw, ann, all.x=TRUE) > raw_data <- annot_raw2[ order(annot_raw2[,3], annot_raw2[,2]), ] > write.csv(raw_data, file="raw_data_annotated.csv", row.names=FALSE) > > # get the only handle for ERCC > set.affx <- subset(raw_data, grepl("^AFFX", GeneName)) > write.csv(set.affx, file="affx_controls_raw.csv", row.names=FALSE) > > # I usually open the csv file with a spreadsheet program delete the columns that I don't need and calculate the coefficient of variation for each set. > > Any comments, advice, or information are greatly appreciated!! > > Matt > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 10.2 years ago James W. MacDonald 65k

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 1 minute ago

United States

Hi Matt, Please don't take conversations off list. On 2/27/2014 12:37 PM, Thornton, Matthew wrote: > Hi Jim! > > Thank you for your reply, you have helped me so many times.. > > So this data that I am pulling out is raw, no background correction or normalization. If we were to process the data with RMA or VSN, then that wouldn't give us the same information. I am trying to see how similar technical replicates as raw data should be before processing. This probably more of an Affymetrix technical question. I figured that people who use Bioconductor to process these chips in large numbers might have a better real world consensus. I am thinking that because the hybridization cocktail is a master-mix added exactly the same to each sample before hybridization that the coeffcient of variation should be very similar and that the intensity distribution should be practically identical. How realistic is it to have practically identical technical replicates? Without background correction and normalization? Not very. You have to think about what we are doing here. We have this silicone wafer with a bunch of little short oligos attached, and we are splashing some labeled cDNA (of various lengths) on, and letting it swish around for a while, hoping that the cDNA will bind to complementary sequences on the array. We then wash that off, and splash on some SAPE, let that bind for a while and then wash off the excess. We then put it in a scanner that takes a picture and digitizes the intensities. You then want to take these digitized intensities and see how consistent they are between arrays. But those intensities arise from a combination of actual binding of cDNA to the oligo as well as biotin-streptavidin binding, non-specific binding, quality of the various washes, pipetting acumen of the tech, manufacturing inconsistencies, batch effects, differences in how 'warmed up' the scanner is, hobgoblins of the mind (ok, well maybe not that one), etc, etc. And of all those things, only one has to do with what you are trying to measure. We can't really quantify the contribution(s) of the other things, but instead just try to make some assumptions and strip out the technical variation as best we can. But if you don't even try to strip out the technical variation, then you are looking at signal+noise and expecting consistency. Best, Jim > > Thank you!! > > Matt > > ________________________________________ > From: James W. MacDonald [jmacdon at uw.edu] > Sent: Thursday, February 27, 2014 9:00 AM > To: Thornton, Matthew > Cc: bioconductor at r-project.org > Subject: Re: [BioC] Is there an expected coefficient of variation for the hybridization probe intensities among technical replicates? > > Hi Matt, > > I don't use xps, so will have to go about looking at this from an oligo > perspective. It just so happens I am working on a study using these same > arrays. > > library(oligo) > dat <- read.celfiles(filenames = list.files("../CEL", pattern = > "[Cc][Ee][Ll]$", full.names = TRUE)) > eset <- rma(dat) > con <- db(dat) > > dbGetQuery(con, "select * from type_dict;") > type type_id > 1 1 main > 2 2 control->affx > 3 3 control->chip > 4 4 control->bgp->antigenomic > 5 5 control->bgp->genomic > 6 6 normgene->exon > 7 7 normgene->intron > 8 8 rescue->FLmRNA->unmapped > 9 9 control->affx->bac_spike > 10 10 oligo_spike_in > 11 11 r1_bac_spike_at > > table(dbGetQuery(con, "select type from featureSet inner join > core_mps using(fsetid);")[,1]) > > 1 2 4 7 9 > 215001 18 23 5083 18 > > So I see control->affx and control->affx->bac_spike controls. > > affx <- exprs(eset)[as.character(dbGetQuery(con, "select > featureSet.transcript_cluster_id from featureSet inner join core_mps > using(fsetid) where type='2';")[,1]),] > affx2 <- exprs(eset)[as.character(dbGetQuery(con, "select > featureSet.transcript_cluster_id from featureSet inner join core_mps > using(fsetid) where type='9';")[,1]),] > > > z <-round(cbind(apply(affx, 1, function(x) > sd(x)/mean(x)),apply(affx2, 1, function(x) sd(x)/mean(x)), > deparse.level=0), 2) > > row.names(z) <- NULL > > z > [,1] [,2] > [1,] 0.02 0.02 > [2,] 0.03 0.03 > [3,] 0.02 0.02 > [4,] 0.03 0.04 > [5,] 0.02 0.02 > [6,] 0.03 0.04 > [7,] 0.01 0.01 > [8,] 0.03 0.03 > [9,] 0.01 0.01 > [10,] 0.03 0.03 > [11,] 0.01 0.01 > [12,] 0.01 0.04 > [13,] 0.01 0.01 > [14,] 0.02 0.03 > [15,] 0.01 0.00 > [16,] 0.02 0.04 > [17,] 0.01 0.00 > [18,] 0.03 0.03 > > So much lower CVs than what you see. However, you should note that you > are not using logged data, which is fairly nonsensical in this situation > (e.g., computing a CV on highly skewed data is a recipe for biased results). > > Best, > > Jim > > > > > > On 2/26/2014 6:02 PM, Thornton, Matthew wrote: >> Hello! >> >> I am digging through the control intensities for technical replicates obtained with an Affymetrix GeneChip Rat Gene 2.0 ST array. >> >> I have 3 technical replicates of the same sample. The kernel density plot overlay of the raw data shows differences in the intensity distributions. >> >> When I pull out the raw intensities with grepl("^AFFX", GeneName), I get the recorded intensities for each pixel mapped to a TranscriptClusterID corresponding to the particular AFFX control, some are 20 pixels others are 11. The groups of AFFX controls I get are the hybridization controls: AFFX-BioB, AFFX-BioC, AFFX-BioDn, >> AFFX-CreX; then the PolyA controls:AFFX-DapX, AFFX-LysX, AFFX-PheX, AFFX-ThrX, AFFX-TrpnX (?! not listed anywhere, is it a PolyA or hyb?), AFFX-r2-Bs-dap, AFFX-r2-Bs-lys, AFFX-r2-Bs-phe, and AFFX-r2-Bs-thr. Another set of PolyA controls: AFFX-r2-Ec-bioB, AFFX-r2-Ec-bioC, AFFX-r2-Ec-bioD, AFFX-r2-P1-cre. I also get the ERCC spike-in controls (which we did use). The maximum intensity from all controls is 25,069 grey levels the average is 1500 grey levels. What is a "good" average intensity? >> >> If I calculate the coefficient of variation across each row in my data set, which corresponds to each X,Y intensity measurement across my replicates for each probe. I see an average coefficient of variation at 22% just for the hybridization controls. Is this consistent with the experience of the list? What should be the expected coefficient of variation for the hybridization controls across technical replicates? the coefficient of variation is very similar for samples that show very low intensity values which are probably not expressed and for samples close to the average intensity. Even though I haven't done it, I don't think excluding probes based on a threshold with improve the average, because the numbers are uniformly similar. >> >> Here is my code: >> #!/usr/bin/Rscript --vanilla --slave >> >> # Load Libraries >> library(Biobase) >> library(xps) >> >> # Directory to store ROOT scheme files >> scmdir <- "/data/met/scmdir/" >> scheme_RaGene <- root.scheme(file.path(scmdir, "scheme_RaGene20stv1.root")) >> >> # Collect scheme information - Pulls out information from the Root scheme >> ann <- export(scheme_RaGene, treetype="ann", outfile="scheme_RaGene_ann.txt", as.dataframe=TRUE, verbose=FALSE) >> scm <- export(scheme_RaGene, treetype="scm", outfile="scheme_RaGene_scm.txt", as.dataframe=TRUE, verbose=FALSE) >> >> # more file locations >> datdir <- "/data/met/26Feb14_TR_CTRs/" >> celdir <- "/data/met/26Feb14_TR_CTRs/CEL/" >> celfiles <- c("CTR_TR1.CEL", "CTR_TR2.CEL", "CTR_TR3.CEL") >> celnames <- c("CTR_TR1", "CTR_TR2", "CTR_TR3") >> outdir <- "/data/met/26Feb14_TR_CTRs/" >> >> # Import CEL files and begin >> data_raw <- import.data(scheme_RaGene, "raw_data_CTR", filedir=outdir, celdir=celdir, celfiles=celfiles, celnames=celnames, verbose=TRUE) >> >> # Attach info >> data_raw <- attachMask(data_raw) >> data_raw <- attachInten(data_raw) >> >> tmp <- intensity(data_raw) >> >> write.csv(tmp, file="raw_data.csv", row.names=FALSE) >> >> # Save an image >> save.image(file="24Feb14_1.RData") >> >> ## >> ## Manipulation of "Raw Data" >> ## >> >> # plots >> png(file="Raw_Data_Density_Plot.png", width=600, height=600) >> par(mar=c(6,3,1,1)); >> hist(data_raw, add.legend=TRUE) >> dev.off() >> >> # Get AFFX controls for assessment of PolyA, ERCC, and Hyb >> annot_raw <- merge(tmp, scm, by=c("X", "Y"), all.x=TRUE) >> annot_raw2 <- merge(annot_raw, ann, all.x=TRUE) >> raw_data <- annot_raw2[ order(annot_raw2[,3], annot_raw2[,2]), ] >> write.csv(raw_data, file="raw_data_annotated.csv", row.names=FALSE) >> >> # get the only handle for ERCC >> set.affx <- subset(raw_data, grepl("^AFFX", GeneName)) >> write.csv(set.affx, file="affx_controls_raw.csv", row.names=FALSE) >> >> # I usually open the csv file with a spreadsheet program delete the columns that I don't need and calculate the coefficient of variation for each set. >> >> Any comments, advice, or information are greatly appreciated!! >> >> Matt >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 10.2 years ago James W. MacDonald 65k

0

Entering edit mode

cstrato ★ 3.9k

@cstrato-908

Last seen 5.6 years ago

Austria

Dear Matt, Jim was already so kind to show you how he calculates the CoV, and just while I was trying to reply, Jim did also answer your question regarding using the raw data from the CEL-files, even before background correction and quantile normalization. Maybe I can add some comments on the evaluation of the quality of technical replicates: First note that in an ideal world technical replicates should give identical intensities for ALL probes, not only the AFFX controls. Thus the most simple way to evaluate the quality of the replicates is to draw scatter plots of the raw data. You can find an example on page 35 of vignette xps.pdf. Furthermore you could use NUSE and RLE plots to see the effect of background correction and normalization compared to the raw data. Once again you can find examples on pages 26-27 of vignette xps.pdf. Other possibilities are borderplot() and coiplot() (see pages 23-24). If you want to know how good your replicates are I would suggest to download data examples from Affymetrix and sample sets from GEO which contain replicates, and compare the variation in your replicates to the other replicates. Best regards, Christian On 2/27/14 12:02 AM, Thornton, Matthew wrote: > Hello! > > I am digging through the control intensities for technical replicates obtained with an Affymetrix GeneChip Rat Gene 2.0 ST array. > > I have 3 technical replicates of the same sample. The kernel density plot overlay of the raw data shows differences in the intensity distributions. > > When I pull out the raw intensities with grepl("^AFFX", GeneName), I get the recorded intensities for each pixel mapped to a TranscriptClusterID corresponding to the particular AFFX control, some are 20 pixels others are 11. The groups of AFFX controls I get are the hybridization controls: AFFX-BioB, AFFX-BioC, AFFX-BioDn, > AFFX-CreX; then the PolyA controls:AFFX-DapX, AFFX-LysX, AFFX-PheX, AFFX-ThrX, AFFX-TrpnX (?! not listed anywhere, is it a PolyA or hyb?), AFFX-r2-Bs-dap, AFFX-r2-Bs-lys, AFFX-r2-Bs-phe, and AFFX-r2-Bs-thr. Another set of PolyA controls: AFFX-r2-Ec-bioB, AFFX-r2-Ec-bioC, AFFX-r2-Ec-bioD, AFFX-r2-P1-cre. I also get the ERCC spike-in controls (which we did use). The maximum intensity from all controls is 25,069 grey levels the average is 1500 grey levels. What is a "good" average intensity? > > If I calculate the coefficient of variation across each row in my data set, which corresponds to each X,Y intensity measurement across my replicates for each probe. I see an average coefficient of variation at 22% just for the hybridization controls. Is this consistent with the experience of the list? What should be the expected coefficient of variation for the hybridization controls across technical replicates? the coefficient of variation is very similar for samples that show very low intensity values which are probably not expressed and for samples close to the average intensity. Even though I haven't done it, I don't think excluding probes based on a threshold with improve the average, because the numbers are uniformly similar. > > Here is my code: > #!/usr/bin/Rscript --vanilla --slave > > # Load Libraries > library(Biobase) > library(xps) > > # Directory to store ROOT scheme files > scmdir <- "/data/met/scmdir/" > scheme_RaGene <- root.scheme(file.path(scmdir, "scheme_RaGene20stv1.root")) > > # Collect scheme information - Pulls out information from the Root scheme > ann <- export(scheme_RaGene, treetype="ann", outfile="scheme_RaGene_ann.txt", as.dataframe=TRUE, verbose=FALSE) > scm <- export(scheme_RaGene, treetype="scm", outfile="scheme_RaGene_scm.txt", as.dataframe=TRUE, verbose=FALSE) > > # more file locations > datdir <- "/data/met/26Feb14_TR_CTRs/" > celdir <- "/data/met/26Feb14_TR_CTRs/CEL/" > celfiles <- c("CTR_TR1.CEL", "CTR_TR2.CEL", "CTR_TR3.CEL") > celnames <- c("CTR_TR1", "CTR_TR2", "CTR_TR3") > outdir <- "/data/met/26Feb14_TR_CTRs/" > > # Import CEL files and begin > data_raw <- import.data(scheme_RaGene, "raw_data_CTR", filedir=outdir, celdir=celdir, celfiles=celfiles, celnames=celnames, verbose=TRUE) > > # Attach info > data_raw <- attachMask(data_raw) > data_raw <- attachInten(data_raw) > > tmp <- intensity(data_raw) > > write.csv(tmp, file="raw_data.csv", row.names=FALSE) > > # Save an image > save.image(file="24Feb14_1.RData") > > ## > ## Manipulation of "Raw Data" > ## > > # plots > png(file="Raw_Data_Density_Plot.png", width=600, height=600) > par(mar=c(6,3,1,1)); > hist(data_raw, add.legend=TRUE) > dev.off() > > # Get AFFX controls for assessment of PolyA, ERCC, and Hyb > annot_raw <- merge(tmp, scm, by=c("X", "Y"), all.x=TRUE) > annot_raw2 <- merge(annot_raw, ann, all.x=TRUE) > raw_data <- annot_raw2[ order(annot_raw2[,3], annot_raw2[,2]), ] > write.csv(raw_data, file="raw_data_annotated.csv", row.names=FALSE) > > # get the only handle for ERCC > set.affx <- subset(raw_data, grepl("^AFFX", GeneName)) > write.csv(set.affx, file="affx_controls_raw.csv", row.names=FALSE) > > # I usually open the csv file with a spreadsheet program delete the columns that I don't need and calculate the coefficient of variation for each set. > > Any comments, advice, or information are greatly appreciated!! > > Matt > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 10.2 years ago cstrato ★ 3.9k

Login before adding your answer.