problem read.maimage("Agilent") -limma

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 4 hours ago

WEHI, Melbourne, Australia

> Date: Mon, 25 Jul 2005 12:22:22 -0400 > From: Naomi Altman <naomi at="" stat.psu.edu=""> > Subject: [BioC] problem read.maimage("Agilent") -limma > To: bioconductor at stat.math.ethz.ch > > I am having trouble reading the Agilent arabidopsis 22575 gene array using > read.maimage in Limma under R 2.1.1 (I don't know the limma version, but I > just downloaded using the R packages interface, and also used the update, > so I presume this is the most recent. You should have limma 2.0.2. > Under R 2.0.1, there was no problem reading all the data in the arrays using: > > RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt","450 8.txt","4509.txt"),source="agilent" > ) > > dim(RGf$R) > 22575 6 > > > But under R 2.1.I, I get: > > RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt","450 8.txt","4509.txt"),source="agilent" > ) > > dim(RGf$R) > 12956 6 > > The last line of RGf$R is all NA. > > The problem might be in RGf$genes. When I try to print any row up to the > last one, everything looks normal. Trying to print the last row kills > R. The annotation for this gene appears to be exceptionally long. I've just tried reading in some AgilentFE data and didn't have any problems. So I wasn't able to reproduce the error that you describe. Try isolating which input file is causing the problem. If you don't find a solution, you could zip up an example data file which causes the error and send it to me. Gordon > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111

Annotation limma Annotation limma • 1.0k views

ADD COMMENT • link updated 18.8 years ago by Sean Davis 21k • written 18.8 years ago by Gordon Smyth 50k

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 3 months ago

United States

On Jul 26, 2005, at 8:13 AM, Gordon K Smyth wrote: >> Date: Mon, 25 Jul 2005 12:22:22 -0400 >> From: Naomi Altman <naomi at="" stat.psu.edu=""> >> Subject: [BioC] problem read.maimage("Agilent") -limma >> To: bioconductor at stat.math.ethz.ch >> >> I am having trouble reading the Agilent arabidopsis 22575 gene array >> using >> read.maimage in Limma under R 2.1.1 (I don't know the limma version, >> but I >> just downloaded using the R packages interface, and also used the >> update, >> so I presume this is the most recent. > > You should have limma 2.0.2. > >> Under R 2.0.1, there was no problem reading all the data in the >> arrays using: >> >> RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt","4508. >> txt","4509.txt"),source="agilent" >> ) >> >> dim(RGf$R) >> 22575 6 >> >> >> But under R 2.1.I, I get: >> >> RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt","4508. >> txt","4509.txt"),source="agilent" >> ) >> >> dim(RGf$R) >> 12956 6 >> >> The last line of RGf$R is all NA. >> >> The problem might be in RGf$genes. When I try to print any row up to >> the >> last one, everything looks normal. Trying to print the last row kills >> R. The annotation for this gene appears to be exceptionally long. >> I have had problems with Agilent annotation files containing "special" characters that cause similar "termination" of file reading. I would look at the annotation for quotation marks, single quotes, # symbols (no idea why this seems to affect things), and backslashes. I typically write a little perl script to "clean" the files. I'm not sure why this should vary from one version to the next, though. Sean

ADD COMMENT • link 18.8 years ago Sean Davis 21k

0

Entering edit mode

There are "\" and "#" before the offending line. I could not find any other unusual characters in the offending line. --Naomi At 09:59 AM 7/26/2005, Sean Davis wrote: >On Jul 26, 2005, at 8:13 AM, Gordon K Smyth wrote: > >>>Date: Mon, 25 Jul 2005 12:22:22 -0400 >>>From: Naomi Altman <naomi at="" stat.psu.edu=""> >>>Subject: [BioC] problem read.maimage("Agilent") -limma >>>To: bioconductor at stat.math.ethz.ch >>> >>>I am having trouble reading the Agilent arabidopsis 22575 gene array >>>using >>>read.maimage in Limma under R 2.1.1 (I don't know the limma version, >>>but I >>>just downloaded using the R packages interface, and also used the >>>update, >>>so I presume this is the most recent. >> >>You should have limma 2.0.2. >> >>>Under R 2.0.1, there was no problem reading all the data in the >>>arrays using: >>> >>>RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt","45 08. >>>txt","4509.txt"),source="agilent" >>>) >>> >>>dim(RGf$R) >>>22575 6 >>> >>> >>>But under R 2.1.I, I get: >>> >>>RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt","45 08. >>>txt","4509.txt"),source="agilent" >>>) >>> >>>dim(RGf$R) >>>12956 6 >>> >>>The last line of RGf$R is all NA. >>> >>>The problem might be in RGf$genes. When I try to print any row up to >>>the >>>last one, everything looks normal. Trying to print the last row kills >>>R. The annotation for this gene appears to be exceptionally long. > >I have had problems with Agilent annotation files containing "special" >characters that cause similar "termination" of file reading. I would >look at the annotation for quotation marks, single quotes, # symbols >(no idea why this seems to affect things), and backslashes. I >typically write a little perl script to "clean" the files. I'm not >sure why this should vary from one version to the next, though. > >Sean > Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 18.8 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Recently we detected some problems with internal regexpr libraries in R v2.1.1. One of the symptoms was that R would crash on Windows, but also that the regular expression became corrupt in memory. This was partly fixed in the R v2.1.1 patched (2005-07-20). Note that this was introduced when the went from R v2.1.0 to v2.1.1, so this might be related to your problem. Cheers Henrik Naomi Altman wrote: > There are "\" and "#" before the offending line. I could not find any > other unusual characters in the offending line. > > --Naomi > > At 09:59 AM 7/26/2005, Sean Davis wrote: > > >>On Jul 26, 2005, at 8:13 AM, Gordon K Smyth wrote: >> >> >>>>Date: Mon, 25 Jul 2005 12:22:22 -0400 >>>>From: Naomi Altman <naomi at="" stat.psu.edu=""> >>>>Subject: [BioC] problem read.maimage("Agilent") -limma >>>>To: bioconductor at stat.math.ethz.ch >>>> >>>>I am having trouble reading the Agilent arabidopsis 22575 gene array >>>>using >>>>read.maimage in Limma under R 2.1.1 (I don't know the limma version, >>>>but I >>>>just downloaded using the R packages interface, and also used the >>>>update, >>>>so I presume this is the most recent. >>> >>>You should have limma 2.0.2. >>> >>> >>>>Under R 2.0.1, there was no problem reading all the data in the >>>>arrays using: >>>> >>>>RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt","4 508. >>>>txt","4509.txt"),source="agilent" >>>>) >>>> >>>>dim(RGf$R) >>>>22575 6 >>>> >>>> >>>>But under R 2.1.I, I get: >>>> >>>>RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt","4 508. >>>>txt","4509.txt"),source="agilent" >>>>) >>>> >>>>dim(RGf$R) >>>>12956 6 >>>> >>>>The last line of RGf$R is all NA. >>>> >>>>The problem might be in RGf$genes. When I try to print any row up to >>>>the >>>>last one, everything looks normal. Trying to print the last row kills >>>>R. The annotation for this gene appears to be exceptionally long. >> >>I have had problems with Agilent annotation files containing "special" >>characters that cause similar "termination" of file reading. I would >>look at the annotation for quotation marks, single quotes, # symbols >>(no idea why this seems to affect things), and backslashes. I >>typically write a little perl script to "clean" the files. I'm not >>sure why this should vary from one version to the next, though. >> >>Sean >> > > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > >

ADD REPLY • link 18.8 years ago Henrik Bengtsson ★ 2.4k

0

Entering edit mode

This is caused by an R bug introduced in R 2.1.1, which persists in R 2.1.1 patched. The function read.table() is now interpreting backslashes as C-style special characters. This change was supposed to affect scan() only, but apparently has spilled over into read.table() as well. The gene names in the AgilentFE export files contain strings such as \0, which is being matched as the null character. This is not only causing the file read to terminate premately, it is also causing a crash of R itself when the string is printed. At this moment, I can see no good work around apart from going back to an earlier version of R. I will take up the problem with R core for a fix. Martin? Gordon At 01:14 AM 27/07/2005, Henrik Bengtsson wrote: >Recently we detected some problems with internal regexpr libraries in R >v2.1.1. One of the symptoms was that R would crash on Windows, but also >that the regular expression became corrupt in memory. This was partly >fixed in the R v2.1.1 patched (2005-07-20). Note that this was introduced >when the went from R v2.1.0 to v2.1.1, so this might be related to your >problem. > >Cheers > >Henrik > >Naomi Altman wrote: >>There are "\" and "#" before the offending line. I could not find any >>other unusual characters in the offending line. >>--Naomi >>At 09:59 AM 7/26/2005, Sean Davis wrote: >> >>>On Jul 26, 2005, at 8:13 AM, Gordon K Smyth wrote: >>> >>> >>>>>Date: Mon, 25 Jul 2005 12:22:22 -0400 >>>>>From: Naomi Altman <naomi at="" stat.psu.edu=""> >>>>>Subject: [BioC] problem read.maimage("Agilent") -limma >>>>>To: bioconductor at stat.math.ethz.ch >>>>> >>>>>I am having trouble reading the Agilent arabidopsis 22575 gene array >>>>>using >>>>>read.maimage in Limma under R 2.1.1 (I don't know the limma version, >>>>>but I >>>>>just downloaded using the R packages interface, and also used the >>>>>update, >>>>>so I presume this is the most recent. >>>> >>>>You should have limma 2.0.2. >>>> >>>> >>>>>Under R 2.0.1, there was no problem reading all the data in the >>>>>arrays using: >>>>> >>>>>RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt"," 4508. >>>>>txt","4509.txt"),source="agilent" >>>>>) >>>>> >>>>>dim(RGf$R) >>>>>22575 6 >>>>> >>>>> >>>>>But under R 2.1.I, I get: >>>>> >>>>>RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt"," 4508. >>>>>txt","4509.txt"),source="agilent" >>>>>) >>>>> >>>>>dim(RGf$R) >>>>>12956 6 >>>>> >>>>>The last line of RGf$R is all NA. >>>>> >>>>>The problem might be in RGf$genes. When I try to print any row up to >>>>>the >>>>>last one, everything looks normal. Trying to print the last row kills >>>>>R. The annotation for this gene appears to be exceptionally long. >>> >>>I have had problems with Agilent annotation files containing "special" >>>characters that cause similar "termination" of file reading. I would >>>look at the annotation for quotation marks, single quotes, # symbols >>>(no idea why this seems to affect things), and backslashes. I >>>typically write a little perl script to "clean" the files. I'm not >>>sure why this should vary from one version to the next, though. >>> >>>Sean >> >>Naomi S. Altman 814-865-3791 (voice) >>Associate Professor >>Bioinformatics Consulting Center >>Dept. of Statistics 814-863-7114 (fax) >>Penn State University 814-865-1348 (Statistics) >>University Park, PA 16802-2111 >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 18.8 years ago Gordon Smyth 50k

0

Entering edit mode

On Wed, 2005-27-07 at 10:29 +1000, Gordon Smyth wrote: > The gene names in the AgilentFE export files contain strings such as \0, > which is being matched as the null character. This is not only causing the > file read to terminate premately, it is also causing a crash of R itself > when the string is printed. A quick and dirty solution would be to replace the \ with \\ using any text editor and load that into R. It's not very nice, but it should keep things working until a real solution is found. Francois

ADD REPLY • link 18.8 years ago Francois Pepin ▴ 60

0

Entering edit mode

On Wed, 2005-27-07 at 10:29 +1000, Gordon Smyth wrote: > The gene names in the AgilentFE export files contain strings such as \0, > which is being matched as the null character. This is not only causing the > file read to terminate premately, it is also causing a crash of R itself > when the string is printed. A quick and dirty solution would be to replace the \ with \\ using any text editor and load that into R. It's not very nice, but it should keep things working until a real solution is found. Francois

ADD REPLY • link 18.8 years ago Francois Pepin ★ 1.3k

0

Entering edit mode

Another quick and dirty solution is to keep R. 2.0.x around, read the files in and then bring .Rdata up in 2.1.1. Works fine for me. --Naomi At 08:51 PM 7/26/2005, Francois Pepin wrote: >On Wed, 2005-27-07 at 10:29 +1000, Gordon Smyth wrote: > > The gene names in the AgilentFE export files contain strings such as \0, > > which is being matched as the null character. This is not only causing the > > file read to terminate premately, it is also causing a crash of R itself > > when the string is printed. > >A quick and dirty solution would be to replace the \ with \\ using any >text editor and load that into R. > >It's not very nice, but it should keep things working until a real >solution is found. > >Francois > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 18.8 years ago Naomi Altman ★ 6.0k

Login before adding your answer.