Probe-level analysis of exon arrays using xps

0

Entering edit mode

Lavorgna Giovanni ▴ 80

@lavorgna-giovanni-4817

Last seen 5.4 years ago

As you guys know, there is a growing evidence showing that a probe- level analysis (as opposed to a probeset-level or a gene-level analysis) could be useful in analyzing Exon arrays. I am currently analyzing several human exon chips (185!) on a 4 GB machine and I am using the xps package. I would like to stick to this software since it allows to manage several chips with the resources at hand and I was wondering if anyone has ever tried to perform probe level analysis using xps. Also, I would be grateful if anyone could point out alternative resources to perform this job. Thanks in advance. Giovanni ----------------------------------------------------------------- Dai il tuo 5XMILLE al San Raffaele. Basta una firma. Se firmi per la ricerca sanitaria del San Raffaele di Milano, firmi per tutti. C.F. 03 06 42 80 153. INFO: 5xmille at hsr.it - www.5xmille.org

probe xps probe xps • 1.1k views

ADD COMMENT • link updated 12.9 years ago by cstrato ★ 3.9k • written 12.9 years ago by Lavorgna Giovanni ▴ 80

0

Entering edit mode

cstrato ★ 3.9k

@cstrato-908

Last seen 5.8 years ago

Austria

Dear Giovanni, In principle you could do the background and normalization steps separately, e.g.: > data.bg.rma <- bgcorrect.rma(data.exon, ...) > data.qu.rma <- normalize.quantiles(data.bg.rma, ...) Now you have the normalized probes, however, you cannot do any summarization such as median-polish or mean. It is not quite clear to me how you want to proceed with the normalized probe intensities? Best regards Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._ On 8/18/11 7:08 PM, Lavorgna Giovanni wrote: > > As you guys know, there is a growing evidence showing that a probe- level analysis (as opposed to a probeset-level or a gene-level analysis) could be useful in analyzing Exon arrays. I am currently analyzing several human exon chips (185!) on a 4 GB machine and I am using the xps package. I would like to stick to this software since it allows to manage several chips with the resources at hand and I was wondering if anyone has ever tried to perform probe level analysis using xps. Also, I would be grateful if anyone could point out alternative resources to perform this job. > Thanks in advance. > Giovanni > > > > > ----------------------------------------------------------------- > > Dai il tuo 5XMILLE al San Raffaele. Basta una firma. > Se firmi per la ricerca sanitaria del San Raffaele di Milano, firmi per tutti. > C.F. 03 06 42 80 153. > INFO: 5xmille at hsr.it - www.5xmille.org > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 12.9 years ago cstrato ★ 3.9k

0

Entering edit mode

Dear Cristian, many thanks for your prompt answer. In my case, once I have the probe intensities, I would like to do the following two preliminary steps: a) Remove probes that hybridize to multiple loci in the genome. b) Remove probes that show a correlation coefficient below a certain threshold. Then, I would like to calculate the differential expression in diseased vs. healthy samples for a few transcripts of mine by averaging the ratio of the probe intensities. I hope that the increased granularity and the reduced background of this method can give a better resolution to my analysis. As I said before, similar probe selection methods have been already described (see for example Xing Y, Kapur K, Wong WH. PLoS ONE. 2006 20;1:e88) and applied to genome wide studies. In my case, after step a), I would like simply to dump the probes to a text file, select those of my interest and read them into to a spreadsheet in order to calculate the correlation coefficient and the fold-change of my transcripts. You already started to sketch the beginning of pipeline I should use: if I am not asking for too much, I would be grateful if you could elaborate it a little more to include also these final two steps. Thanks again for your assistance and keep up the good work. Giovanni ________________________________________ Da: cstrato [cstrato at aon.at] Inviato: gioved? 18 agosto 2011 20.28 A: Lavorgna Giovanni Cc: bioconductor at r-project.org Oggetto: Re: [BioC] Probe-level analysis of exon arrays using xps Dear Giovanni, In principle you could do the background and normalization steps separately, e.g.: > data.bg.rma <- bgcorrect.rma(data.exon, ...) > data.qu.rma <- normalize.quantiles(data.bg.rma, ...) Now you have the normalized probes, however, you cannot do any summarization such as median-polish or mean. It is not quite clear to me how you want to proceed with the normalized probe intensities? Best regards Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._ On 8/18/11 7:08 PM, Lavorgna Giovanni wrote: > > As you guys know, there is a growing evidence showing that a probe- level analysis (as opposed to a probeset-level or a gene-level analysis) could be useful in analyzing Exon arrays. I am currently analyzing several human exon chips (185!) on a 4 GB machine and I am using the xps package. I would like to stick to this software since it allows to manage several chips with the resources at hand and I was wondering if anyone has ever tried to perform probe level analysis using xps. Also, I would be grateful if anyone could point out alternative resources to perform this job. > Thanks in advance. > Giovanni > > > > > ----------------------------------------------------------------- > > Dai il tuo 5XMILLE al San Raffaele. Basta una firma. > Se firmi per la ricerca sanitaria del San Raffaele di Milano, firmi per tutti. > C.F. 03 06 42 80 153. > INFO: 5xmille at hsr.it - www.5xmille.org > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > ----------------------------------------------------------------- Dai il tuo 5XMILLE al San Raffaele. Basta una firma. Se firmi per la ricerca sanitaria del San Raffaele di Milano, firmi per tutti. C.F. 03 06 42 80 153. INFO: 5xmille at hsr.it - www.5xmille.org

ADD REPLY • link 12.9 years ago Lavorgna Giovanni ▴ 80

0

Entering edit mode

Dear Giovanni, I will try to sketch the beginning of your pipeline using a subset of the Affymetrix exon dataset: First, let me suggest to use the new annotation files version na32 which Affymetrix have just released, to create the scheme file: ### import libary and annotation files > libdir <- "/Volumes/GigaDrive/Affy/libraryfiles" > anndir <- "/Volumes/GigaDrive/Affy/Annotation" > scmdir <- "/Volumes/GigaDrive/CRAN/Workspaces/Schemes/na32" > scheme.exon <- import.exon.scheme("huex10stv2", filedir = scmdir, + file.path(libdir, "HuEx-1_0-st-v2_libraryfile", "HuEx-1_0-st-r2", "HuEx-1_0-st-v2.r2.clf"), + file.path(libdir, "HuEx-1_0-st-v2_libraryfile", "HuEx-1_0-st-r2", "HuEx-1_0-st-v2.r2.pgf"), + file.path(anndir, "Version11Jul", "HuEx-1_0-st-v2.na32.hg19.probeset.csv"), + file.path(anndir, "Version11Jul", "HuEx-1_0-st-v2.na32.hg19.transcript.csv")) For this example I import the breast and prostate data from the Affymetrix exon dataset: ### import CEL-files > celdir <- "/Volumes/GigaDrive/ChipData/Exon/HuMixture" > datdir <- getwd() > celfiles <- c("huex_wta_breast_A.CEL","huex_wta_breast_B.CEL","huex_wta_breast_C.C EL", + "huex_wta_prostate_A.CEL","huex_wta_prostate_B.CEL","huex_wta_prostate _C.CEL") > celnames <- c("BreastA","BreastB","BreastC","ProstateA","ProstateB","ProstateC") > data.exon <- import.data(scheme.exon, "HuMixtureExon", filedir=datdir, + celdir=celdir, celfiles=celfiles, celnames=celnames) Now I suggest to start a new R-session for preprocessing: ### first, load ROOT scheme file and ROOT data file > scmdir <- "/Volumes/GigaDrive/CRAN/Workspaces/Schemes/na32" > scheme.exon <- root.scheme(file.path(scmdir,"huex10stv2.root")) > datdir <- getwd() > data.exon <- root.data(scheme.exon, file.path(datdir,"HuMixtureExon_cel.root")) > str(data.exon) ### 1.step: background - rma > data.bg.rma <- bgcorrect.rma(data.exon, "HuExonRMABgrd", filedir=datdir, + select="antigenomic", exonlevel="core+affx") ### 2step: normalization - quantile > data.qu.rma <- normalize.quantiles(data.bg.rma, "HuExonRMANorm", filedir=datdir , + exonlevel="core+affx") ### 3.step: summarization - medpol (not necessary in your case) #data.mp.rma <- summarize.rma(data.qu.rma, "HuExonRMASum", filedir=datdir, # exonlevel="core+affx") To dump the probes to a text file you simply do: ### export normalized probes intensities > export(data.qu.rma, treetype="cqu", varlist = "fInten", + outfile=paste(datdir, "BreastProstate_cqu.txt", sep="/")) Now you have table "BreastProstate_cqu.txt" containing the (X,Y)-coordinates followed by the normalized intensities for each sample. However, please note that for 6 samples this file has already a size of 180 MB, thus for your 185 samples it will probably have a size of more than 4 GB. Thus it is not quite clear to me how you want to proceed. Nevertheless, since you are interested in selected transcripts only, you need to get the (X,Y)-coordinates for these transcripts in order to extract the intensities from the table. Let us assume that you want to get the (X,Y)-coordinates for the CD44 gene. For this purpose you have three options: 1, If you are using the development version xps_1.13.6, which will be available for download from the BioC development site in a few days, you can do the following: ### get internal UNIT_ID for CD44 > id <- symbol2unitID(scheme.exon, symbol="CD44", unittype="transcript", as.list =TRUE) > id $`3326635` [1] "185195" $`3326730` [1] "185196" ### attach (x,y)-coordinates for all UNIT_IDs > data.qu.rma <- attachDataXY(data.qu.rma) ### get (x,y)-coordinates for CD44 > xy <- indexUnits(data.qu.rma, which="core+affx", unitID=unlist(id)) Error in .local(object, ...) : only 1 of 2 UNIT_ID are valid The reason for this error is that only transcript_cluster_id "3326635" belongs to level "core" while "3326730" belongs to level "extended". Thus you need to do: > xy <- indexUnits(data.qu.rma, which="core+affx", unitID=id[[1]]) > dim(xy) [1] 103 4 > head(xy) UNIT_ID X Y XY 3151152 185195 1638 388 994919 3151153 185195 2407 79 204648 3151154 185195 2393 882 2260314 3151155 185195 1728 1223 3132609 3151156 185195 1843 1404 3596084 3151157 185195 1369 2167 5548890 Now you have the (X,Y)-coordinates for the CD44 gene which you can use to get the normalized intensities from table "BreastProstate_cqu.txt". 2, If you would have enough RAM and the new version xps_1.13.6 then the most simple way would be to attach the data. In the following I will attach only the first two trees, however this causes already a lot of swapping on my Mac with 2GB RAM: ### attach data for 2 trees > treenames <- unlist(treeNames(data.qu.rma)) > data.qu.rma <- attachInten(data.qu.rma, treenames=treenames[1:2]) ## get the normalized intensities for CD44 > data <- validData(data.qu.rma, which="core+affx", unitID=185195) > dim(data) [1] 103 2 > head(data) BreastA.cqu_MEAN BreastB.cqu_MEAN 994919 29.51080 3.60245 204648 4.94509 58.01030 2260314 25.84620 2.86042 3132609 2.91305 3.93297 3596084 123.34400 147.01000 5548890 434.51000 367.34400 ### remove the data > data.qu.rma <- removeInten(data.qu.rma) 3, With the current version of xps extracting the (X,Y)-coordinates for the CD44 gene is more complicated: First you need to get the transcript_cluster_id for CD44: > ann <- export(scheme.exon, treetype="ann", varlist="fTranscriptID:fSymbol", + as.dataframe=TRUE, outfile="tmp_ann.txt") > id <- split(ann[,"GeneSymbol"], ann[,"TranscriptClusterID"]); > id <- lapply("CD44", function(x) names(which(id == x))); > id [[1]] [1] "3326635" "3326730" Then you need to get the internal UNIT_ID: > idx <- export(scheme.exon, treetype="idx", varlist="fUnitName:fTranscriptID", + as.dataframe=TRUE, outfile="tmp_idx.txt") > unitid <- split(idx[,"UnitName"], idx[,"UNIT_ID"]); > unitid <- lapply(unlist(id), function(x) names(which(unitid == x))); > unitid [[1]] [1] "185195" [[2]] [1] "185196" Finally you need to get the (X,Y)-coordinates for "3326635" only: > scm <- export(scheme.exon, treetype="scm", varlist="fUnitID:fX:fY:fMask", + as.dataframe=TRUE, outfile="tmp_scm.txt") > xy <- scm[which(scm[,"UNIT_ID"] == unlist(unitid)[1]),] > dim(xy) [1] 365 4 As you see dim(xy) has more rows than above, thus you need to get the "core" subset only (see ?exonLevel): > unique(xy[,"Mask"]) [1] 8192 1024 2048 256 512 4096 > xy <- rbind(xy[which(xy[,"Mask"] == 8192),], xy[which(xy[,"Mask"] == 1024),]) > dim(xy) [1] 103 4 > head(xy) UNIT_ID X Y Mask 3151152 185195 1638 388 8192 3151153 185195 2407 79 8192 3151154 185195 2393 882 8192 3151155 185195 1728 1223 8192 3151216 185195 1781 15 8192 3151217 185195 1442 762 8192 Now you have the (X,Y)-coordinates for the CD44 gene as above. ad a) Remove probes that hybridize to multiple loci in the genome: I do not know how you want to remove these probes, however you can get the sequences of all probes from the following table: > export(scheme.exon, treetype="prb") Furthermore, when exporting the probeset annotation by: > export(scheme.exon, treetype="anp") you will see that the table contains column "CrossHybType" (see the Affy README file for exon arrays). Only crosshyb_type = 1 (unique) does contain unique probesets. Please let me know if this is the info you were looking for. Best regards Christian On 8/18/11 11:48 PM, Lavorgna Giovanni wrote: > Dear Cristian, > many thanks for your prompt answer. In my case, once I have the probe intensities, I would like to do the following two preliminary steps: > a) Remove probes that hybridize to multiple loci in the genome. > b) Remove probes that show a correlation coefficient below a certain threshold. > Then, I would like to calculate the differential expression in diseased vs. healthy samples for a few transcripts of mine by averaging the ratio of the probe intensities. I hope that the increased granularity and the reduced background of this method can give a better resolution to my analysis. As I said before, similar probe selection methods have been already described (see for example Xing Y, Kapur K, Wong WH. PLoS ONE. 2006 20;1:e88) and applied to genome wide studies. > > In my case, after step a), I would like simply to dump the probes to a text file, select those of my interest and read them into to a spreadsheet in order to calculate the correlation coefficient and the fold-change of my transcripts. You already started to sketch the beginning of pipeline I should use: if I am not asking for too much, I would be grateful if you could elaborate it a little more to include also these final two steps. > Thanks again for your assistance and keep up the good work. > Giovanni > > ________________________________________ > Da: cstrato [cstrato at aon.at] > Inviato: gioved? 18 agosto 2011 20.28 > A: Lavorgna Giovanni > Cc: bioconductor at r-project.org > Oggetto: Re: [BioC] Probe-level analysis of exon arrays using xps > > Dear Giovanni, > > In principle you could do the background and normalization steps > separately, e.g.: > > data.bg.rma<- bgcorrect.rma(data.exon, ...) > > data.qu.rma<- normalize.quantiles(data.bg.rma, ...) > > Now you have the normalized probes, however, you cannot do any > summarization such as median-polish or mean. > > It is not quite clear to me how you want to proceed with the normalized > probe intensities? > > Best regards > Christian > _._._._._._._._._._._._._._._._._._ > C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a > V.i.e.n.n.a A.u.s.t.r.i.a > e.m.a.i.l: cstrato at aon.at > _._._._._._._._._._._._._._._._._._ > > > On 8/18/11 7:08 PM, Lavorgna Giovanni wrote: >> >> As you guys know, there is a growing evidence showing that a probe- level analysis (as opposed to a probeset-level or a gene-level analysis) could be useful in analyzing Exon arrays. I am currently analyzing several human exon chips (185!) on a 4 GB machine and I am using the xps package. I would like to stick to this software since it allows to manage several chips with the resources at hand and I was wondering if anyone has ever tried to perform probe level analysis using xps. Also, I would be grateful if anyone could point out alternative resources to perform this job. >> Thanks in advance. >> Giovanni >> >> >> >> >> ----------------------------------------------------------------- >> >> Dai il tuo 5XMILLE al San Raffaele. Basta una firma. >> Se firmi per la ricerca sanitaria del San Raffaele di Milano, firmi per tutti. >> C.F. 03 06 42 80 153. >> INFO: 5xmille at hsr.it - www.5xmille.org >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > ----------------------------------------------------------------- > > Dai il tuo 5XMILLE al San Raffaele. Basta una firma. > Se firmi per la ricerca sanitaria del San Raffaele di Milano, firmi per tutti. > C.F. 03 06 42 80 153. > INFO: 5xmille at hsr.it - www.5xmille.org >

ADD REPLY • link 12.9 years ago cstrato ★ 3.9k

0

Entering edit mode

Dear Christian, many thanks for your detailed explanations and the new xps development version. I couldn't hope for a better support and this is exactly what I was looking for. I am currently away from the lab with a slow internet connection and I couldn't test the whole pipeline you designed. However, I could successfully export to a text file the processed probe intensities by following along your sample script. Yes, it is true than I end with a 4Gb text file, but, once I grep out the probes of my interest, I can easily manage the data, even on a spreadsheet. Thanks again for your help. Giovanni ________________________________________ Da: cstrato [cstrato at aon.at] Inviato: domenica 21 agosto 2011 19.56 A: Lavorgna Giovanni Cc: bioconductor at r-project.org Oggetto: Re: R: [BioC] Probe-level analysis of exon arrays using xps Dear Giovanni, I will try to sketch the beginning of your pipeline using a subset of the Affymetrix exon dataset: First, let me suggest to use the new annotation files version na32 which Affymetrix have just released, to create the scheme file: ### import libary and annotation files > libdir <- "/Volumes/GigaDrive/Affy/libraryfiles" > anndir <- "/Volumes/GigaDrive/Affy/Annotation" > scmdir <- "/Volumes/GigaDrive/CRAN/Workspaces/Schemes/na32" > scheme.exon <- import.exon.scheme("huex10stv2", filedir = scmdir, + file.path(libdir, "HuEx-1_0-st-v2_libraryfile", "HuEx-1_0-st-r2", "HuEx-1_0-st-v2.r2.clf"), + file.path(libdir, "HuEx-1_0-st-v2_libraryfile", "HuEx-1_0-st-r2", "HuEx-1_0-st-v2.r2.pgf"), + file.path(anndir, "Version11Jul", "HuEx-1_0-st-v2.na32.hg19.probeset.csv"), + file.path(anndir, "Version11Jul", "HuEx-1_0-st-v2.na32.hg19.transcript.csv")) For this example I import the breast and prostate data from the Affymetrix exon dataset: ### import CEL-files > celdir <- "/Volumes/GigaDrive/ChipData/Exon/HuMixture" > datdir <- getwd() > celfiles <- c("huex_wta_breast_A.CEL","huex_wta_breast_B.CEL","huex_wta_breast_C.C EL", + "huex_wta_prostate_A.CEL","huex_wta_prostate_B.CEL","huex_wta_prostate _C.CEL") > celnames <- c("BreastA","BreastB","BreastC","ProstateA","ProstateB","ProstateC") > data.exon <- import.data(scheme.exon, "HuMixtureExon", filedir=datdir, + celdir=celdir, celfiles=celfiles, celnames=celnames) Now I suggest to start a new R-session for preprocessing: ### first, load ROOT scheme file and ROOT data file > scmdir <- "/Volumes/GigaDrive/CRAN/Workspaces/Schemes/na32" > scheme.exon <- root.scheme(file.path(scmdir,"huex10stv2.root")) > datdir <- getwd() > data.exon <- root.data(scheme.exon, file.path(datdir,"HuMixtureExon_cel.root")) > str(data.exon) ### 1.step: background - rma > data.bg.rma <- bgcorrect.rma(data.exon, "HuExonRMABgrd", filedir=datdir, + select="antigenomic", exonlevel="core+affx") ### 2step: normalization - quantile > data.qu.rma <- normalize.quantiles(data.bg.rma, "HuExonRMANorm", filedir=datdir , + exonlevel="core+affx") ### 3.step: summarization - medpol (not necessary in your case) #data.mp.rma <- summarize.rma(data.qu.rma, "HuExonRMASum", filedir=datdir, # exonlevel="core+affx") To dump the probes to a text file you simply do: ### export normalized probes intensities > export(data.qu.rma, treetype="cqu", varlist = "fInten", + outfile=paste(datdir, "BreastProstate_cqu.txt", sep="/")) Now you have table "BreastProstate_cqu.txt" containing the (X,Y)-coordinates followed by the normalized intensities for each sample. However, please note that for 6 samples this file has already a size of 180 MB, thus for your 185 samples it will probably have a size of more than 4 GB. Thus it is not quite clear to me how you want to proceed. Nevertheless, since you are interested in selected transcripts only, you need to get the (X,Y)-coordinates for these transcripts in order to extract the intensities from the table. Let us assume that you want to get the (X,Y)-coordinates for the CD44 gene. For this purpose you have three options: 1, If you are using the development version xps_1.13.6, which will be available for download from the BioC development site in a few days, you can do the following: ### get internal UNIT_ID for CD44 > id <- symbol2unitID(scheme.exon, symbol="CD44", unittype="transcript", as.list =TRUE) > id $`3326635` [1] "185195" $`3326730` [1] "185196" ### attach (x,y)-coordinates for all UNIT_IDs > data.qu.rma <- attachDataXY(data.qu.rma) ### get (x,y)-coordinates for CD44 > xy <- indexUnits(data.qu.rma, which="core+affx", unitID=unlist(id)) Error in .local(object, ...) : only 1 of 2 UNIT_ID are valid The reason for this error is that only transcript_cluster_id "3326635" belongs to level "core" while "3326730" belongs to level "extended". Thus you need to do: > xy <- indexUnits(data.qu.rma, which="core+affx", unitID=id[[1]]) > dim(xy) [1] 103 4 > head(xy) UNIT_ID X Y XY 3151152 185195 1638 388 994919 3151153 185195 2407 79 204648 3151154 185195 2393 882 2260314 3151155 185195 1728 1223 3132609 3151156 185195 1843 1404 3596084 3151157 185195 1369 2167 5548890 Now you have the (X,Y)-coordinates for the CD44 gene which you can use to get the normalized intensities from table "BreastProstate_cqu.txt". 2, If you would have enough RAM and the new version xps_1.13.6 then the most simple way would be to attach the data. In the following I will attach only the first two trees, however this causes already a lot of swapping on my Mac with 2GB RAM: ### attach data for 2 trees > treenames <- unlist(treeNames(data.qu.rma)) > data.qu.rma <- attachInten(data.qu.rma, treenames=treenames[1:2]) ## get the normalized intensities for CD44 > data <- validData(data.qu.rma, which="core+affx", unitID=185195) > dim(data) [1] 103 2 > head(data) BreastA.cqu_MEAN BreastB.cqu_MEAN 994919 29.51080 3.60245 204648 4.94509 58.01030 2260314 25.84620 2.86042 3132609 2.91305 3.93297 3596084 123.34400 147.01000 5548890 434.51000 367.34400 ### remove the data > data.qu.rma <- removeInten(data.qu.rma) 3, With the current version of xps extracting the (X,Y)-coordinates for the CD44 gene is more complicated: First you need to get the transcript_cluster_id for CD44: > ann <- export(scheme.exon, treetype="ann", varlist="fTranscriptID:fSymbol", + as.dataframe=TRUE, outfile="tmp_ann.txt") > id <- split(ann[,"GeneSymbol"], ann[,"TranscriptClusterID"]); > id <- lapply("CD44", function(x) names(which(id == x))); > id [[1]] [1] "3326635" "3326730" Then you need to get the internal UNIT_ID: > idx <- export(scheme.exon, treetype="idx", varlist="fUnitName:fTranscriptID", + as.dataframe=TRUE, outfile="tmp_idx.txt") > unitid <- split(idx[,"UnitName"], idx[,"UNIT_ID"]); > unitid <- lapply(unlist(id), function(x) names(which(unitid == x))); > unitid [[1]] [1] "185195" [[2]] [1] "185196" Finally you need to get the (X,Y)-coordinates for "3326635" only: > scm <- export(scheme.exon, treetype="scm", varlist="fUnitID:fX:fY:fMask", + as.dataframe=TRUE, outfile="tmp_scm.txt") > xy <- scm[which(scm[,"UNIT_ID"] == unlist(unitid)[1]),] > dim(xy) [1] 365 4 As you see dim(xy) has more rows than above, thus you need to get the "core" subset only (see ?exonLevel): > unique(xy[,"Mask"]) [1] 8192 1024 2048 256 512 4096 > xy <- rbind(xy[which(xy[,"Mask"] == 8192),], xy[which(xy[,"Mask"] == 1024),]) > dim(xy) [1] 103 4 > head(xy) UNIT_ID X Y Mask 3151152 185195 1638 388 8192 3151153 185195 2407 79 8192 3151154 185195 2393 882 8192 3151155 185195 1728 1223 8192 3151216 185195 1781 15 8192 3151217 185195 1442 762 8192 Now you have the (X,Y)-coordinates for the CD44 gene as above. ad a) Remove probes that hybridize to multiple loci in the genome: I do not know how you want to remove these probes, however you can get the sequences of all probes from the following table: > export(scheme.exon, treetype="prb") Furthermore, when exporting the probeset annotation by: > export(scheme.exon, treetype="anp") you will see that the table contains column "CrossHybType" (see the Affy README file for exon arrays). Only crosshyb_type = 1 (unique) does contain unique probesets. Please let me know if this is the info you were looking for. Best regards Christian On 8/18/11 11:48 PM, Lavorgna Giovanni wrote: > Dear Cristian, > many thanks for your prompt answer. In my case, once I have the probe intensities, I would like to do the following two preliminary steps: > a) Remove probes that hybridize to multiple loci in the genome. > b) Remove probes that show a correlation coefficient below a certain threshold. > Then, I would like to calculate the differential expression in diseased vs. healthy samples for a few transcripts of mine by averaging the ratio of the probe intensities. I hope that the increased granularity and the reduced background of this method can give a better resolution to my analysis. As I said before, similar probe selection methods have been already described (see for example Xing Y, Kapur K, Wong WH. PLoS ONE. 2006 20;1:e88) and applied to genome wide studies. > > In my case, after step a), I would like simply to dump the probes to a text file, select those of my interest and read them into to a spreadsheet in order to calculate the correlation coefficient and the fold-change of my transcripts. You already started to sketch the beginning of pipeline I should use: if I am not asking for too much, I would be grateful if you could elaborate it a little more to include also these final two steps. > Thanks again for your assistance and keep up the good work. > Giovanni > > ________________________________________ > Da: cstrato [cstrato at aon.at] > Inviato: gioved? 18 agosto 2011 20.28 > A: Lavorgna Giovanni > Cc: bioconductor at r-project.org > Oggetto: Re: [BioC] Probe-level analysis of exon arrays using xps > > Dear Giovanni, > > In principle you could do the background and normalization steps > separately, e.g.: > > data.bg.rma<- bgcorrect.rma(data.exon, ...) > > data.qu.rma<- normalize.quantiles(data.bg.rma, ...) > > Now you have the normalized probes, however, you cannot do any > summarization such as median-polish or mean. > > It is not quite clear to me how you want to proceed with the normalized > probe intensities? > > Best regards > Christian > _._._._._._._._._._._._._._._._._._ > C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a > V.i.e.n.n.a A.u.s.t.r.i.a > e.m.a.i.l: cstrato at aon.at > _._._._._._._._._._._._._._._._._._ > > > On 8/18/11 7:08 PM, Lavorgna Giovanni wrote: >> >> As you guys know, there is a growing evidence showing that a probe- level analysis (as opposed to a probeset-level or a gene-level analysis) could be useful in analyzing Exon arrays. I am currently analyzing several human exon chips (185!) on a 4 GB machine and I am using the xps package. I would like to stick to this software since it allows to manage several chips with the resources at hand and I was wondering if anyone has ever tried to perform probe level analysis using xps. Also, I would be grateful if anyone could point out alternative resources to perform this job. >> Thanks in advance. >> Giovanni >> >> >> >> >> ----------------------------------------------------------------- >> >> Dai il tuo 5XMILLE al San Raffaele. Basta una firma. >> Se firmi per la ricerca sanitaria del San Raffaele di Milano, firmi per tutti. >> C.F. 03 06 42 80 153. >> INFO: 5xmille at hsr.it - www.5xmille.org >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > ----------------------------------------------------------------- > > Dai il tuo 5XMILLE al San Raffaele. Basta una firma. > Se firmi per la ricerca sanitaria del San Raffaele di Milano, firmi per tutti. > C.F. 03 06 42 80 153. > INFO: 5xmille at hsr.it - www.5xmille.org > ----------------------------------------------------------------- Dai il tuo 5XMILLE al San Raffaele. Basta una firma. Se firmi per la ricerca sanitaria del San Raffaele di Milano, firmi per tutti. C.F. 03 06 42 80 153. INFO: 5xmille at hsr.it - www.5xmille.org

ADD REPLY • link 12.9 years ago Lavorgna Giovanni ▴ 80

Login before adding your answer.