Sc03b_MR_v04 CDF package
1
0
Entering edit mode
Ludo Muller ▴ 50
@ludo-muller-2991
Last seen 9.6 years ago
Hi all, I have data from hybridizations onto the Affymetrix yeast tiling array (Sc03b_MR_v04) which I would like to analyze using Bioconductor. However the CDF package (sc03bmrv04cdf) for this array doesn't seem to be available from the bioconductor website. Can anybody tell me if it is available elsewhere or whom I could contact about creating this package? Cheers, Ludo. --- Ludo A.H. Muller, Ph.D. Dept. of Molecular Genetics & Microbiology Box 3020, Duke University Medical Center Durham, NC 27710, USA Phone: +1 (919) 681-6781 or 681-6778 Fax: +1 (919) 684-8735 E-mail: ludo.muller at duke.edu Homepage: http://www.duke.edu/~mulle019
Genetics Yeast cdf Genetics Yeast cdf • 1.3k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 20 minutes ago
United States
Hi Ludo, Ludo Muller wrote: > Hi all, > > I have data from hybridizations onto the Affymetrix yeast tiling array > (Sc03b_MR_v04) which I would like to analyze using Bioconductor. However > the CDF package (sc03bmrv04cdf) for this array doesn't seem to be available > from the bioconductor website. Can anybody tell me if it is available > elsewhere or whom I could contact about creating this package? There will probably never be such a beast, as that implies the usage of the affy package. Instead, you should be using the oligo package and pdInfoBuilder. For that you will need the CIF and BPMAP files from Affy, and something like pkg <- new("AffyTilingPDInfoPkgSeed", version = "0.0.1", author = "You", email = "you at yours.com", biocViews = "AnnotationData", genomebuild = "thegenomebuild", bpmapFile = <name of="" bpmap="" file="">, cifFile = <name of="" cif="" file="">) makePdInfoPackage(pkg) And then install using R CMD INSTALL <the package="" name="">. You don't mention your OS, so that might simply entail running the above at a terminal prompt (if on Linux), or if you are on Windows or MacOS, you will need to get set up to build packages. See the R FAQ for either OS for further info about that. Best, Jim > > Cheers, > > Ludo. > > --- > Ludo A.H. Muller, Ph.D. > Dept. of Molecular Genetics & Microbiology > Box 3020, Duke University Medical Center > Durham, NC 27710, USA > > Phone: +1 (919) 681-6781 or 681-6778 > Fax: +1 (919) 684-8735 > E-mail: ludo.muller at duke.edu > Homepage: http://www.duke.edu/~mulle019 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662
ADD COMMENT
0
Entering edit mode
Hi James, Thank you for your help. I installed the pdInfoBuilder package locally, on my Win XP computer (I also requested it to be installed on a Linux server, in case my computer doesn't have enough memory), and I ran the following (I downloaded the appropriate .bpmap and .cif files): > pkg <- new("AffyTilingPDInfoPkgSeed",version="0.0.1",author="Ludo Muller",email="ludo.muller at ...",biocViews="AnnotationData",genomebuild="Stanford Yeast Genome Database, October 2003",bpmapFile="Sc03b_MR_v04.bpmap",cifFile="Sc03b_MR_v04.cif") > makePdInfoPackage(pkg,destDir=".") However, I get the following message: Creating package in ./pd.sc03b.mr.v04 Error in sqliteExecStatement(conn, statement, bind.data, ...) : RS-DBI driver: (RS_SQLite_exec: could not execute: PRIMARY KEY must be unique) Timing stopped at: 6.35 0.17 6.59 I found an earlier report dealing with a similar error message: https://stat.ethz.ch/pipermail/bioconductor/2008-June/023080.html Is it likely that the information files for this array are also inaccurate? Cheers, Ludo. --- Ludo A.H. Muller, Ph.D. Dept. of Molecular Genetics & Microbiology Box 3020, Duke University Medical Center Durham, NC 27710, USA Phone: +1 (919) 681-6781 or 681-6778 Fax: +1 (919) 684-8735 E-mail: ludo.muller at duke.edu Homepage: http://www.duke.edu/~mulle019 -----"James W. MacDonald" <jmacdon at="" med.umich.edu=""> wrote: ----- To: Ludo Muller <ludo.muller at="" duke.edu=""> From: "James W. MacDonald" <jmacdon@med.umich.edu> Date: 08/19/2008 09:05AM cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Sc03b_MR_v04 CDF package Hi Ludo, Ludo Muller wrote: > Hi all, > > I have data from hybridizations onto the Affymetrix yeast tiling array > (Sc03b_MR_v04) which I would like to analyze using Bioconductor. However > the CDF package (sc03bmrv04cdf) for this array doesn't seem to be available > from the bioconductor website. Can anybody tell me if it is available > elsewhere or whom I could contact about creating this package? There will probably never be such a beast, as that implies the usage of the affy package. Instead, you should be using the oligo package and pdInfoBuilder. For that you will need the CIF and BPMAP files from Affy, and something like pkg <- new("AffyTilingPDInfoPkgSeed", version = "0.0.1", author = "You", email = "you at yours.com", biocViews = "AnnotationData", genomebuild = "thegenomebuild", bpmapFile = <name of="" bpmap="" file="">, cifFile = <name of="" cif="" file="">) makePdInfoPackage(pkg) And then install using R CMD INSTALL <the package="" name="">. You don't mention your OS, so that might simply entail running the above at a terminal prompt (if on Linux), or if you are on Windows or MacOS, you will need to get set up to build packages. See the R FAQ for either OS for further info about that. Best, Jim > > Cheers, > > Ludo. > > --- > Ludo A.H. Muller, Ph.D. > Dept. of Molecular Genetics & Microbiology > Box 3020, Duke University Medical Center > Durham, NC 27710, USA > > Phone: +1 (919) 681-6781 or 681-6778 > Fax: +1 (919) 684-8735 > E-mail: ludo.muller at duke.edu > Homepage: http://www.duke.edu/~mulle019 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662
ADD REPLY
0
Entering edit mode
So I tracked this down, and it _doesn't_ appear to be a bug in pdInfoBuilder. It appears to be either an error or an inconsistency in the Affy bpmap files. I didn't use the yeast chip, as I am not sure I can get the one you used but I get the same error with the Arabidopsis bpmap so I used that one. It all boils down to this: > library(affxparser) > tmp <- readBpmap("At35b_MF_v04-2_TIGRv5.bpmap", 26) > head(do.call("cbind",tmp[[1]][2:5]), n=20) pmx pmy mmx mmy [1,] 2197 2194 2198 2194 [2,] 2379 997 2380 997 [3,] 2443 1111 2444 1111 [4,] 2485 826 2486 826 [5,] 2491 2374 2492 2374 [6,] 2497 826 2498 826 [7,] 2503 1602 2504 1602 [8,] 2505 1129 2506 1129 [9,] 2507 2022 2508 2022 [10,] 1687 169 1688 169 [11,] 1687 501 1688 501 [12,] 1687 638 1688 638 [13,] 1687 871 1688 871 [14,] 1687 873 1688 873 [15,] 1687 1007 1688 1007 [16,] 1687 1371 1688 1371 [17,] 1687 1492 1688 1492 [18,] 1687 2346 1688 2346 <- [19,] 1687 2347 1688 2346 <- [20,] 1689 287 1690 287 As you can see, for this QC probeset there are two MM probes that appear to be right on top of each other. I assume the second should really have an (1688, 2347) coordinate, but the bpmap is in error. Since this will make two identical indices which are used as the primary key for the table these data are being fed into we get an error as the primary key must be unique. For the A. thaliana chip there are 36 such errors in the bpmap file for just the QC probes. Best, Jim Ludo Muller wrote: > Hi James, > > Thank you for your help. I installed the pdInfoBuilder package locally, on > my Win XP computer (I also requested it to be installed on a Linux server, > in case my computer doesn't have enough memory), and I ran the following (I > downloaded the appropriate .bpmap and .cif files): > >> pkg <- new("AffyTilingPDInfoPkgSeed",version="0.0.1",author="Ludo > Muller",email="ludo.muller > at ...",biocViews="AnnotationData",genomebuild="Stanford Yeast Genome > Database, October > 2003",bpmapFile="Sc03b_MR_v04.bpmap",cifFile="Sc03b_MR_v04.cif") >> makePdInfoPackage(pkg,destDir=".") > > However, I get the following message: > > Creating package in ./pd.sc03b.mr.v04 > Error in sqliteExecStatement(conn, statement, bind.data, ...) : > RS-DBI driver: (RS_SQLite_exec: could not execute: PRIMARY KEY must be > unique) > Timing stopped at: 6.35 0.17 6.59 > > I found an earlier report dealing with a similar error message: > https://stat.ethz.ch/pipermail/bioconductor/2008-June/023080.html > > Is it likely that the information files for this array are also inaccurate? > > Cheers, > > Ludo. > > --- > Ludo A.H. Muller, Ph.D. > Dept. of Molecular Genetics & Microbiology > Box 3020, Duke University Medical Center > Durham, NC 27710, USA > > Phone: +1 (919) 681-6781 or 681-6778 > Fax: +1 (919) 684-8735 > E-mail: ludo.muller at duke.edu > Homepage: http://www.duke.edu/~mulle019 > > > > -----"James W. MacDonald" <jmacdon at="" med.umich.edu=""> wrote: ----- > > > To: Ludo Muller <ludo.muller at="" duke.edu=""> > From: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> > Date: 08/19/2008 09:05AM > cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Sc03b_MR_v04 CDF package > > Hi Ludo, > > Ludo Muller wrote: >> Hi all, >> >> I have data from hybridizations onto the Affymetrix yeast tiling array >> (Sc03b_MR_v04) which I would like to analyze using Bioconductor. However >> the CDF package (sc03bmrv04cdf) for this array doesn't seem to be > available >> from the bioconductor website. Can anybody tell me if it is available >> elsewhere or whom I could contact about creating this package? > > There will probably never be such a beast, as that implies the usage of > the affy package. Instead, you should be using the oligo package and > pdInfoBuilder. For that you will need the CIF and BPMAP files from Affy, > and something like > > pkg <- new("AffyTilingPDInfoPkgSeed", version = "0.0.1", author = "You", > email = "you at yours.com", biocViews = "AnnotationData", genomebuild = > "thegenomebuild", bpmapFile = <name of="" bpmap="" file="">, cifFile = <name of=""> cif file>) > > makePdInfoPackage(pkg) > > And then install using R CMD INSTALL <the package="" name="">. You don't > mention your OS, so that might simply entail running the above at a > terminal prompt (if on Linux), or if you are on Windows or MacOS, you > will need to get set up to build packages. See the R FAQ for either OS > for further info about that. > > Best, > > Jim > > >> Cheers, >> >> Ludo. >> >> --- >> Ludo A.H. Muller, Ph.D. >> Dept. of Molecular Genetics & Microbiology >> Box 3020, Duke University Medical Center >> Durham, NC 27710, USA >> >> Phone: +1 (919) 681-6781 or 681-6778 >> Fax: +1 (919) 684-8735 >> E-mail: ludo.muller at duke.edu >> Homepage: http://www.duke.edu/~mulle019 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > Hildebrandt Lab > 8220D MSRB III > 1150 W. Medical Center Drive > Ann Arbor MI 48109-0646 > 734-936-8662 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662
ADD REPLY
0
Entering edit mode
James, you are right about the erroneous bpmap files. I contacted Affymetrix about this problem and after looking into it, they confirmed that a lot of the mm probes have the same coordinates as the pm probes in the bpmap files. They are supposedly working to fix the problem, but couldn't give me an estimate of when they would have a corrected bpmap file. Do you know if it is possible to extract just the pm hybridization intensity values from the cel files without a corrected bpmap file? I don't necessarily need the mm probe intensity values since there are alternative ways for background correction that don't rely on them. Thanks again, Ludo. --- Ludo A.H. Muller, Ph.D. Dept. of Molecular Genetics & Microbiology Box 3020, Duke University Medical Center Durham, NC 27710, USA Phone: +1 (919) 681-6781 or 681-6778 Fax: +1 (919) 684-8735 E-mail: ludo.muller at duke.edu Homepage: http://www.duke.edu/~mulle019 -----"James W. MacDonald" <jmacdon at="" med.umich.edu=""> wrote: ----- To: Ludo Muller <ludo.muller at="" duke.edu=""> From: "James W. MacDonald" <jmacdon@med.umich.edu> Date: 08/22/2008 08:00AM cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Sc03b_MR_v04 CDF package So I tracked this down, and it _doesn't_ appear to be a bug in pdInfoBuilder. It appears to be either an error or an inconsistency in the Affy bpmap files. I didn't use the yeast chip, as I am not sure I can get the one you used but I get the same error with the Arabidopsis bpmap so I used that one. It all boils down to this: > library(affxparser) > tmp <- readBpmap("At35b_MF_v04-2_TIGRv5.bpmap", 26) > head(do.call("cbind",tmp[[1]][2:5]), n=20) pmx ?pmy ?mmx ?mmy [1,] 2197 2194 2198 2194 [2,] 2379 ?997 2380 ?997 [3,] 2443 1111 2444 1111 [4,] 2485 ?826 2486 ?826 [5,] 2491 2374 2492 2374 [6,] 2497 ?826 2498 ?826 [7,] 2503 1602 2504 1602 [8,] 2505 1129 2506 1129 [9,] 2507 2022 2508 2022 [10,] 1687 ?169 1688 ?169 [11,] 1687 ?501 1688 ?501 [12,] 1687 ?638 1688 ?638 [13,] 1687 ?871 1688 ?871 [14,] 1687 ?873 1688 ?873 [15,] 1687 1007 1688 1007 [16,] 1687 1371 1688 1371 [17,] 1687 1492 1688 1492 [18,] 1687 2346 1688 2346 <- [19,] 1687 2347 1688 2346 <- [20,] 1689 ?287 1690 ?287 As you can see, for this QC probeset there are two MM probes that appear to be right on top of each other. I assume the second should really have an (1688, 2347) coordinate, but the bpmap is in error. Since this will make two identical indices which are used as the primary key for the table these data are being fed into we get an error as the primary key must be unique. For the A. thaliana chip there are 36 such errors in the bpmap file for just the QC probes. Best, Jim Ludo Muller wrote: > Hi James, > > Thank you for your help. I installed the pdInfoBuilder package locally, on > my Win XP computer (I also requested it to be installed on a Linux server, > in case my computer doesn't have enough memory), and I ran the following (I > downloaded the appropriate .bpmap and .cif files): > >> pkg <- new("AffyTilingPDInfoPkgSeed",version="0.0.1",author="Ludo > Muller",email="ludo.muller > at ...",biocViews="AnnotationData",genomebuild="Stanford Yeast Genome > Database, October > 2003",bpmapFile="Sc03b_MR_v04.bpmap",cifFile="Sc03b_MR_v04.cif") >> makePdInfoPackage(pkg,destDir=".") > > However, I get the following message: > > Creating package in ./pd.sc03b.mr.v04 > Error in sqliteExecStatement(conn, statement, bind.data, ...) : > ? RS-DBI driver: (RS_SQLite_exec: could not execute: PRIMARY KEY must be > unique) > Timing stopped at: 6.35 0.17 6.59 > > I found an earlier report dealing with a similar error message: > https://stat.ethz.ch/pipermail/bioconductor/2008-June/023080.html > > Is it likely that the information files for this array are also inaccurate? > > Cheers, > > Ludo. > > --- > Ludo A.H. Muller, Ph.D. > Dept. of Molecular Genetics & Microbiology > Box 3020, Duke University Medical Center > Durham, NC 27710, USA > > Phone: +1 (919) 681-6781 or 681-6778 > Fax: +1 (919) 684-8735 > E-mail: ludo.muller at duke.edu > Homepage: http://www.duke.edu/~mulle019 > > > > -----"James W. MacDonald" <jmacdon at="" med.umich.edu=""> wrote: ----- > > > To: Ludo Muller <ludo.muller at="" duke.edu=""> > From: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> > Date: 08/19/2008 09:05AM > cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Sc03b_MR_v04 CDF package > > Hi Ludo, > > Ludo Muller wrote: >> Hi all, >> >> I have data from hybridizations onto the Affymetrix yeast tiling array >> (Sc03b_MR_v04) which I would like to analyze using Bioconductor. However >> the CDF package (sc03bmrv04cdf) for this array doesn't seem to be > available >> from the bioconductor website. Can anybody tell me if it is available >> elsewhere or whom I could contact about creating this package? > > There will probably never be such a beast, as that implies the usage of > the affy package. Instead, you should be using the oligo package and > pdInfoBuilder. For that you will need the CIF and BPMAP files from Affy, > and something like > > pkg <- new("AffyTilingPDInfoPkgSeed", version = "0.0.1", author = "You", > email = "you at yours.com", biocViews = "AnnotationData", genomebuild = > "thegenomebuild", bpmapFile = <name of="" bpmap="" file="">, cifFile = <name of=""> cif file>) > > makePdInfoPackage(pkg) > > And then install using R CMD INSTALL <the package="" name="">. You don't > mention your OS, so that might simply entail running the above at a > terminal prompt (if on Linux), or if you are on Windows or MacOS, you > will need to get set up to build packages. See the R FAQ for either OS > for further info about that. > > Best, > > Jim > > >> Cheers, >> >> Ludo. >> >> --- >> Ludo A.H. Muller, Ph.D. >> Dept. of Molecular Genetics & Microbiology >> Box 3020, Duke University Medical Center >> Durham, NC 27710, USA >> >> Phone: +1 (919) 681-6781 or 681-6778 >> Fax: +1 (919) 684-8735 >> E-mail: ludo.muller at duke.edu >> Homepage: http://www.duke.edu/~mulle019 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > Hildebrandt Lab > 8220D MSRB III > 1150 W. Medical Center Drive > Ann Arbor MI 48109-0646 > 734-936-8662 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662
ADD REPLY
0
Entering edit mode
Hi Ludo, Ludo Muller wrote: > James, you are right about the erroneous bpmap files. I contacted > Affymetrix about this problem and after looking into it, they confirmed > that a lot of the mm probes have the same coordinates as the pm probes in > the bpmap files. They are supposedly working to fix the problem, but > couldn't give me an estimate of when they would have a corrected bpmap > file. Interesting. I didn't look into the problem any further than to see that at least one MM probe shares the same (x, y) coordinates (in the cdf) with another MM probe. Although by definition the PM probes would also be identical, in this case they have their own coordinates. So there are two really good (IMO) possibilities here. First, Affy just blundered here, and the bpmap files are wrong. Apparently whomever you talked to at Affy thinks this is the case. Second, since there are supposed to be probes with on average 10 bases in between, I would have to imagine that there are a lot of probes that would be identical. I don't know if they just blithely tile down probes regardless of the fact that there are many replicated probes, or if they just put one down for each replicate and then point to that one probe for the duplicates in the bpmap file, or if they don't use the probes that aren't unique. It has been argued in offline communications that some permutation of the above is the case, and that we just need to change the code in pdInfoBuilder to account for that. It has also been recommended that we contact a higher-up at Affy to get confirmation which I haven't yet done, but will do now. > > Do you know if it is possible to extract just the pm hybridization > intensity values from the cel files without a corrected bpmap file? I don't > necessarily need the mm probe intensity values since there are alternative > ways for background correction that don't rely on them. Well, not within the pdInfoBuilder/oligo structure. If you just want to roll your own functions, then the affxparser package is your friend. It will quickly and easily get out just about any information you might want from just about any file type that Affy supplies, including the celfiles. Best, Jim > > Thanks again, > > Ludo. > > --- > Ludo A.H. Muller, Ph.D. > Dept. of Molecular Genetics & Microbiology > Box 3020, Duke University Medical Center > Durham, NC 27710, USA > > Phone: +1 (919) 681-6781 or 681-6778 > Fax: +1 (919) 684-8735 > E-mail: ludo.muller at duke.edu > Homepage: http://www.duke.edu/~mulle019 > > -----"James W. MacDonald" <jmacdon at="" med.umich.edu=""> wrote: ----- > > To: Ludo Muller <ludo.muller at="" duke.edu=""> > From: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> > Date: 08/22/2008 08:00AM > cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Sc03b_MR_v04 CDF package > > So I tracked this down, and it _doesn't_ appear to be a bug in > pdInfoBuilder. It appears to be either an error or an inconsistency in > the Affy bpmap files. I didn't use the yeast chip, as I am not sure I > can get the one you used but I get the same error with the Arabidopsis > bpmap so I used that one. > > It all boils down to this: > > > library(affxparser) > > tmp <- readBpmap("At35b_MF_v04-2_TIGRv5.bpmap", 26) > > head(do.call("cbind",tmp[[1]][2:5]), n=20) > pmx pmy mmx mmy > [1,] 2197 2194 2198 2194 > [2,] 2379 997 2380 997 > [3,] 2443 1111 2444 1111 > [4,] 2485 826 2486 826 > [5,] 2491 2374 2492 2374 > [6,] 2497 826 2498 826 > [7,] 2503 1602 2504 1602 > [8,] 2505 1129 2506 1129 > [9,] 2507 2022 2508 2022 > [10,] 1687 169 1688 169 > [11,] 1687 501 1688 501 > [12,] 1687 638 1688 638 > [13,] 1687 871 1688 871 > [14,] 1687 873 1688 873 > [15,] 1687 1007 1688 1007 > [16,] 1687 1371 1688 1371 > [17,] 1687 1492 1688 1492 > [18,] 1687 2346 1688 2346 <- > [19,] 1687 2347 1688 2346 <- > [20,] 1689 287 1690 287 > > As you can see, for this QC probeset there are two MM probes that appear > to be right on top of each other. I assume the second should really have > an (1688, 2347) coordinate, but the bpmap is in error. Since this will > make two identical indices which are used as the primary key for the > table these data are being fed into we get an error as the primary key > must be unique. > > For the A. thaliana chip there are 36 such errors in the bpmap file for > just the QC probes. > > Best, > > Jim > > > > Ludo Muller wrote: >> Hi James, >> >> Thank you for your help. I installed the pdInfoBuilder package locally, > on >> my Win XP computer (I also requested it to be installed on a Linux > server, >> in case my computer doesn't have enough memory), and I ran the following > (I >> downloaded the appropriate .bpmap and .cif files): >> >>> pkg <- new("AffyTilingPDInfoPkgSeed",version="0.0.1",author="Ludo >> Muller",email="ludo.muller >> at ...",biocViews="AnnotationData",genomebuild="Stanford Yeast Genome >> Database, October >> 2003",bpmapFile="Sc03b_MR_v04.bpmap",cifFile="Sc03b_MR_v04.cif") >>> makePdInfoPackage(pkg,destDir=".") >> However, I get the following message: >> >> Creating package in ./pd.sc03b.mr.v04 >> Error in sqliteExecStatement(conn, statement, bind.data, ...) : >> RS-DBI driver: (RS_SQLite_exec: could not execute: PRIMARY KEY must be >> unique) >> Timing stopped at: 6.35 0.17 6.59 >> >> I found an earlier report dealing with a similar error message: >> https://stat.ethz.ch/pipermail/bioconductor/2008-June/023080.html >> >> Is it likely that the information files for this array are also > inaccurate? >> Cheers, >> >> Ludo. >> >> --- >> Ludo A.H. Muller, Ph.D. >> Dept. of Molecular Genetics & Microbiology >> Box 3020, Duke University Medical Center >> Durham, NC 27710, USA >> >> Phone: +1 (919) 681-6781 or 681-6778 >> Fax: +1 (919) 684-8735 >> E-mail: ludo.muller at duke.edu >> Homepage: http://www.duke.edu/~mulle019 >> >> >> >> -----"James W. MacDonald" <jmacdon at="" med.umich.edu=""> wrote: ----- >> >> >> To: Ludo Muller <ludo.muller at="" duke.edu=""> >> From: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> >> Date: 08/19/2008 09:05AM >> cc: bioconductor at stat.math.ethz.ch >> Subject: Re: [BioC] Sc03b_MR_v04 CDF package >> >> Hi Ludo, >> >> Ludo Muller wrote: >>> Hi all, >>> >>> I have data from hybridizations onto the Affymetrix yeast tiling array >>> (Sc03b_MR_v04) which I would like to analyze using Bioconductor. However >>> the CDF package (sc03bmrv04cdf) for this array doesn't seem to be >> available >>> from the bioconductor website. Can anybody tell me if it is available >>> elsewhere or whom I could contact about creating this package? >> There will probably never be such a beast, as that implies the usage of >> the affy package. Instead, you should be using the oligo package and >> pdInfoBuilder. For that you will need the CIF and BPMAP files from Affy, >> and something like >> >> pkg <- new("AffyTilingPDInfoPkgSeed", version = "0.0.1", author = "You", >> email = "you at yours.com", biocViews = "AnnotationData", genomebuild = >> "thegenomebuild", bpmapFile = <name of="" bpmap="" file="">, cifFile = <name of="">> cif file>) >> >> makePdInfoPackage(pkg) >> >> And then install using R CMD INSTALL <the package="" name="">. You don't >> mention your OS, so that might simply entail running the above at a >> terminal prompt (if on Linux), or if you are on Windows or MacOS, you >> will need to get set up to build packages. See the R FAQ for either OS >> for further info about that. >> >> Best, >> >> Jim >> >> >>> Cheers, >>> >>> Ludo. >>> >>> --- >>> Ludo A.H. Muller, Ph.D. >>> Dept. of Molecular Genetics & Microbiology >>> Box 3020, Duke University Medical Center >>> Durham, NC 27710, USA >>> >>> Phone: +1 (919) 681-6781 or 681-6778 >>> Fax: +1 (919) 684-8735 >>> E-mail: ludo.muller at duke.edu >>> Homepage: http://www.duke.edu/~mulle019 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> Hildebrandt Lab >> 8220D MSRB III >> 1150 W. Medical Center Drive >> Ann Arbor MI 48109-0646 >> 734-936-8662 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > Hildebrandt Lab > 8220D MSRB III > 1150 W. Medical Center Drive > Ann Arbor MI 48109-0646 > 734-936-8662 > > -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662
ADD REPLY
0
Entering edit mode
Hi On Wed, Aug 27, 2008 at 1:08 PM, James W. MacDonald <jmacdon at="" med.umich.edu=""> wrote: > Hi Ludo, > > Ludo Muller wrote: >> >> James, you are right about the erroneous bpmap files. I contacted >> Affymetrix about this problem and after looking into it, they confirmed >> that a lot of the mm probes have the same coordinates as the pm probes in >> the bpmap files. They are supposedly working to fix the problem, but >> couldn't give me an estimate of when they would have a corrected bpmap >> file. Thanks for reporting back. > > Interesting. I didn't look into the problem any further than to see that at > least one MM probe shares the same (x, y) coordinates (in the cdf) with > another MM probe. Although by definition the PM probes would also be > identical, in this case they have their own coordinates. > > So there are two really good (IMO) possibilities here. First, Affy just > blundered here, and the bpmap files are wrong. Apparently whomever you > talked to at Affy thinks this is the case. > > Second, since there are supposed to be probes with on average 10 bases in > between, I would have to imagine that there are a lot of probes that would > be identical. I don't know if they just blithely tile down probes regardless > of the fact that there are many replicated probes, or if they just put one > down for each replicate and then point to that one probe for the duplicates > in the bpmap file, or if they don't use the probes that aren't unique. It > has been argued in offline communications that some permutation of the above > is the case, and that we just need to change the code in pdInfoBuilder to > account for that. > > It has also been recommended that we contact a higher-up at Affy to get > confirmation which I haven't yet done, but will do now. I would suggest that you use the 'Affymetrix Scientific Community Forums': https://www.affymetrix.com/community/forums/index.jspa for this. I quite sure that it will then be read by the "right" persons. Posting there will also "document" the issue and is much easier to refer to and follow up on later. When I've found mistakes in the NetAffx annotation files, I've been told to report to the 'NetAffx SDK' forum, so I suggest you use that too. Cheers Henrik > > >> >> Do you know if it is possible to extract just the pm hybridization >> intensity values from the cel files without a corrected bpmap file? I >> don't >> necessarily need the mm probe intensity values since there are alternative >> ways for background correction that don't rely on them. > > Well, not within the pdInfoBuilder/oligo structure. If you just want to roll > your own functions, then the affxparser package is your friend. It will > quickly and easily get out just about any information you might want from > just about any file type that Affy supplies, including the celfiles. > > Best, > > Jim > > >> >> Thanks again, >> >> Ludo. >> >> --- >> Ludo A.H. Muller, Ph.D. >> Dept. of Molecular Genetics & Microbiology >> Box 3020, Duke University Medical Center >> Durham, NC 27710, USA >> >> Phone: +1 (919) 681-6781 or 681-6778 >> Fax: +1 (919) 684-8735 >> E-mail: ludo.muller at duke.edu >> Homepage: http://www.duke.edu/~mulle019 >> >> -----"James W. MacDonald" <jmacdon at="" med.umich.edu=""> wrote: ----- >> >> To: Ludo Muller <ludo.muller at="" duke.edu=""> >> From: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> >> Date: 08/22/2008 08:00AM >> cc: bioconductor at stat.math.ethz.ch >> Subject: Re: [BioC] Sc03b_MR_v04 CDF package >> >> So I tracked this down, and it _doesn't_ appear to be a bug in >> pdInfoBuilder. It appears to be either an error or an inconsistency in >> the Affy bpmap files. I didn't use the yeast chip, as I am not sure I >> can get the one you used but I get the same error with the Arabidopsis >> bpmap so I used that one. >> >> It all boils down to this: >> >> > library(affxparser) >> > tmp <- readBpmap("At35b_MF_v04-2_TIGRv5.bpmap", 26) >> > head(do.call("cbind",tmp[[1]][2:5]), n=20) >> pmx pmy mmx mmy >> [1,] 2197 2194 2198 2194 >> [2,] 2379 997 2380 997 >> [3,] 2443 1111 2444 1111 >> [4,] 2485 826 2486 826 >> [5,] 2491 2374 2492 2374 >> [6,] 2497 826 2498 826 >> [7,] 2503 1602 2504 1602 >> [8,] 2505 1129 2506 1129 >> [9,] 2507 2022 2508 2022 >> [10,] 1687 169 1688 169 >> [11,] 1687 501 1688 501 >> [12,] 1687 638 1688 638 >> [13,] 1687 871 1688 871 >> [14,] 1687 873 1688 873 >> [15,] 1687 1007 1688 1007 >> [16,] 1687 1371 1688 1371 >> [17,] 1687 1492 1688 1492 >> [18,] 1687 2346 1688 2346 <- >> [19,] 1687 2347 1688 2346 <- >> [20,] 1689 287 1690 287 >> >> As you can see, for this QC probeset there are two MM probes that appear >> to be right on top of each other. I assume the second should really have >> an (1688, 2347) coordinate, but the bpmap is in error. Since this will >> make two identical indices which are used as the primary key for the >> table these data are being fed into we get an error as the primary key >> must be unique. >> >> For the A. thaliana chip there are 36 such errors in the bpmap file for >> just the QC probes. >> >> Best, >> >> Jim >> >> >> >> Ludo Muller wrote: >>> >>> Hi James, >>> >>> Thank you for your help. I installed the pdInfoBuilder package locally, >> >> on >>> >>> my Win XP computer (I also requested it to be installed on a Linux >> >> server, >>> >>> in case my computer doesn't have enough memory), and I ran the following >> >> (I >>> >>> downloaded the appropriate .bpmap and .cif files): >>> >>>> pkg <- new("AffyTilingPDInfoPkgSeed",version="0.0.1",author="Ludo >>> >>> Muller",email="ludo.muller >>> at ...",biocViews="AnnotationData",genomebuild="Stanford Yeast Genome >>> Database, October >>> 2003",bpmapFile="Sc03b_MR_v04.bpmap",cifFile="Sc03b_MR_v04.cif") >>>> >>>> makePdInfoPackage(pkg,destDir=".") >>> >>> However, I get the following message: >>> >>> Creating package in ./pd.sc03b.mr.v04 >>> Error in sqliteExecStatement(conn, statement, bind.data, ...) : >>> RS-DBI driver: (RS_SQLite_exec: could not execute: PRIMARY KEY must be >>> unique) >>> Timing stopped at: 6.35 0.17 6.59 >>> >>> I found an earlier report dealing with a similar error message: >>> https://stat.ethz.ch/pipermail/bioconductor/2008-June/023080.html >>> >>> Is it likely that the information files for this array are also >> >> inaccurate? >>> >>> Cheers, >>> >>> Ludo. >>> >>> --- >>> Ludo A.H. Muller, Ph.D. >>> Dept. of Molecular Genetics & Microbiology >>> Box 3020, Duke University Medical Center >>> Durham, NC 27710, USA >>> >>> Phone: +1 (919) 681-6781 or 681-6778 >>> Fax: +1 (919) 684-8735 >>> E-mail: ludo.muller at duke.edu >>> Homepage: http://www.duke.edu/~mulle019 >>> >>> >>> >>> -----"James W. MacDonald" <jmacdon at="" med.umich.edu=""> wrote: ----- >>> >>> >>> To: Ludo Muller <ludo.muller at="" duke.edu=""> >>> From: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> >>> Date: 08/19/2008 09:05AM >>> cc: bioconductor at stat.math.ethz.ch >>> Subject: Re: [BioC] Sc03b_MR_v04 CDF package >>> >>> Hi Ludo, >>> >>> Ludo Muller wrote: >>>> >>>> Hi all, >>>> >>>> I have data from hybridizations onto the Affymetrix yeast tiling array >>>> (Sc03b_MR_v04) which I would like to analyze using Bioconductor. However >>>> the CDF package (sc03bmrv04cdf) for this array doesn't seem to be >>> >>> available >>>> >>>> from the bioconductor website. Can anybody tell me if it is available >>>> elsewhere or whom I could contact about creating this package? >>> >>> There will probably never be such a beast, as that implies the usage of >>> the affy package. Instead, you should be using the oligo package and >>> pdInfoBuilder. For that you will need the CIF and BPMAP files from Affy, >>> and something like >>> >>> pkg <- new("AffyTilingPDInfoPkgSeed", version = "0.0.1", author = "You", >>> email = "you at yours.com", biocViews = "AnnotationData", genomebuild = >>> "thegenomebuild", bpmapFile = <name of="" bpmap="" file="">, cifFile = <name of="">>> cif file>) >>> >>> makePdInfoPackage(pkg) >>> >>> And then install using R CMD INSTALL <the package="" name="">. You don't >>> mention your OS, so that might simply entail running the above at a >>> terminal prompt (if on Linux), or if you are on Windows or MacOS, you >>> will need to get set up to build packages. See the R FAQ for either OS >>> for further info about that. >>> >>> Best, >>> >>> Jim >>> >>> >>>> Cheers, >>>> >>>> Ludo. >>>> >>>> --- >>>> Ludo A.H. Muller, Ph.D. >>>> Dept. of Molecular Genetics & Microbiology >>>> Box 3020, Duke University Medical Center >>>> Durham, NC 27710, USA >>>> >>>> Phone: +1 (919) 681-6781 or 681-6778 >>>> Fax: +1 (919) 684-8735 >>>> E-mail: ludo.muller at duke.edu >>>> Homepage: http://www.duke.edu/~mulle019 >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>> >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> Hildebrandt Lab >>> 8220D MSRB III >>> 1150 W. Medical Center Drive >>> Ann Arbor MI 48109-0646 >>> 734-936-8662 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> Hildebrandt Lab >> 8220D MSRB III >> 1150 W. Medical Center Drive >> Ann Arbor MI 48109-0646 >> 734-936-8662 >> >> > > -- > James W. MacDonald, M.S. > Biostatistician > Hildebrandt Lab > 8220D MSRB III > 1150 W. Medical Center Drive > Ann Arbor MI 48109-0646 > 734-936-8662 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY

Login before adding your answer.

Traffic: 621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6