Question: scan date information
0
10.2 years ago by
Robert Dunne10
Robert Dunne10 wrote:
Hi List, I apologise for what may be a very simple question. How can I retrieve the scan date information from cel files? I can find the information using some editors, kate under linux shows "a f f y m e t r i x - s c a n - d a t e ( 2 0 0 8 - 0 4 - 0 3" but I can't find it all all using vi or emacs. I suppose this is something to do with encoding. Also "string file.cel | grep "d a t e"" does not work. I have tried the affxparser library but readCelHeader("file.cel") does not pick up the date. Unfortunately in many experiments the scan date turns out to be the major effect. Bye Rob
affxparser • 647 views
modified 10.2 years ago by Saroj K Mohapatra400 • written 10.2 years ago by Robert Dunne10
0
10.2 years ago by
Saroj K Mohapatra400 wrote:
Hi Rob: I have a file called _16.CEL. I want to find out the date information in its header. The following gives me: $strings _16.CEL | grep DatHeader DatHeader=[2..65534] _16:CLS=7365 RWS=7365 XIN=1 YIN=1 VE=30 2.0 10/27/06 10:57:45 50207590 M10 I find a date 10/27/06. Is this what you are looking for? Best, Saroj Robert Dunne wrote: > Hi List, > > I apologise for what may be a very simple question. How can I retrieve > the scan date information from cel files? > > I can find the information using some editors, kate under linux shows > "a f f y m e t r i x - s c a n - d a t e ( 2 0 0 8 - 0 4 - 0 3" > but I can't find it all all using vi or emacs. I suppose this is > something to do with encoding. > Also "string file.cel | grep "d a t e"" does not work. > > I have tried the affxparser library but > readCelHeader("file.cel") > does not pick up the date. > > Unfortunately in many experiments the scan date turns out to be the > major effect. > > Bye > Rob > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > ADD COMMENTlink written 10.2 years ago by Saroj K Mohapatra400 Robert, The answer depends on which version of R and BioC are you using. If you are using R <= 2.9, BioC <= 2.4, you will need to devise your own method; one of which was given by Saroj. If you are using R-devel and BioC 2.5 (devel), the eSet abstract class and its derived classes such as ExpressionSet contain a new slot called protocolData that contains an AnnotatedDataFrame object. This slot is to be populated by metadata contained in microarray data files. In BioC 2.5 (devel) the read.affybatch from affy and read.celfile from affyio add a ScanData column to the protocolData slot with the metadata you are looking for. Cheers, Patrick Saroj K Mohapatra wrote: > Hi Rob: > > I have a file called _16.CEL. I want to find out the date information > in its header. The following gives me: > >$ strings _16.CEL | grep DatHeader > DatHeader=[2..65534] _16:CLS=7365 RWS=7365 XIN=1 YIN=1 VE=30 > 2.0 10/27/06 10:57:45 50207590 M10 > I find a date 10/27/06. Is this what you are looking for? > > Best, > > Saroj > > > Robert Dunne wrote: >> Hi List, >> >> I apologise for what may be a very simple question. How can I retrieve >> the scan date information from cel files? >> >> I can find the information using some editors, kate under linux shows >> "a f f y m e t r i x - s c a n - d a t e ( 2 0 0 8 - 0 4 - 0 3" >> but I can't find it all all using vi or emacs. I suppose this is >> something to do with encoding. >> Also "string file.cel | grep "d a t e"" does not work. >> >> I have tried the affxparser library but >> readCelHeader("file.cel") >> does not pick up the date. >> >> Unfortunately in many experiments the scan date turns out to be the >> major effect. >> >> Bye >> Rob >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -------------------------------------------------------------------- ---- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
Thanks for that. I am not using the development version yet but I will look out for the new slot. Saroj, your method doesn't work for me, perhaps your cel file is ascii? strings SB_20D.CEL | grep DatHeader However, I have found $grep --text d.a.t.e. SB_20D.CEL text/plainaffymetrix-scan-date(2008-04-03T04:45:53Z the "--text" option makes grep read a binary file as thought it was text. I am not sure why I need the dots in date. Bye Rob Patrick Aboyoun wrote: > Robert, > The answer depends on which version of R and BioC are you using. If you > are using R <= 2.9, BioC <= 2.4, you will need to devise your own > method; one of which was given by Saroj. If you are using R-devel and > BioC 2.5 (devel), the eSet abstract class and its derived classes such > as ExpressionSet contain a new slot called protocolData that contains an > AnnotatedDataFrame object. This slot is to be populated by metadata > contained in microarray data files. In BioC 2.5 (devel) the > read.affybatch from affy and read.celfile from affyio add a ScanData > column to the protocolData slot with the metadata you are looking for. > > > Cheers, > Patrick > > > Saroj K Mohapatra wrote: >> Hi Rob: >> >> I have a file called _16.CEL. I want to find out the date information >> in its header. The following gives me: >> >>$ strings _16.CEL | grep DatHeader >> DatHeader=[2..65534] _16:CLS=7365 RWS=7365 XIN=1 YIN=1 VE=30 >> 2.0 10/27/06 10:57:45 50207590 M10 I find a date 10/27/06. Is this >> what you are looking for? >> >> Best, >> >> Saroj >> >> >> Robert Dunne wrote: >>> Hi List, >>> >>> I apologise for what may be a very simple question. How can I retrieve >>> the scan date information from cel files? >>> >>> I can find the information using some editors, kate under linux shows >>> "a f f y m e t r i x - s c a n - d a t e ( 2 0 0 8 - 0 4 - 0 3" >>> but I can't find it all all using vi or emacs. I suppose this is >>> something to do with encoding. >>> Also "string file.cel | grep "d a t e"" does not work. >>> >>> I have tried the affxparser library but >>> readCelHeader("file.cel") >>> does not pick up the date. >>> >>> Unfortunately in many experiments the scan date turns out to be the >>> major effect. >>> >>> Bye >>> Rob >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> ------------------------------------------------------------------- ----- >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
Hi, These scripts work OK for me on OSX (only on TXT CEL files, not the latest binary ones). I haven't gotten around to writing a version that uses the Fusion SDK. Mark celDate.sh #!/bin/bash # # Determine the date that the CEL file was created, from the CEL file header # eg "06/05/08 12:05:36" # # Mark Cowley, 2008-07-28 # grep -m1 -a '^DatHeader' "$@" | egrep -o '[0-9]{2}/[0-9]{2}/[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}' -- or -- celDate.R # Extract the CEL file creation date stamp from within the CEL file header. # # Mark Cowley, 2008-07-29 celDate <- function(files) { stopifnot( all(file.exists(files)) ) files <- paste(squote(files), collapse=" ") cmd <- paste("grep -m1 -a '^DatHeader'", files, "| egrep -o '[0-9]{2}/[0-9]{2}/[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9] {2}'") dates <- system(cmd, intern=T) dates } ----------------------------------------------------- Mark Cowley, PhD Peter Wills Bioinformatics Centre Garvan Institute of Medical Research, Sydney, Australia ----------------------------------------------------- On 22/09/2009, at 8:42 AM, Rob Dunne wrote: > Thanks for that. > > I am not using the development version yet but I will look out for > the new slot. > > Saroj, your method doesn't work for me, perhaps your cel file is > ascii? > strings SB_20D.CEL | grep DatHeader > > However, I have found >$ grep --text d.a.t.e. SB_20D.CEL > text/plainaffymetrix-scan-date(2008-04-03T04:45:53Z > > the "--text" option makes grep read a binary file as thought it was > text. I am not sure why I need the dots in date. > > Bye > Rob > > > > Patrick Aboyoun wrote: >> Robert, >> The answer depends on which version of R and BioC are you using. If >> you are using R <= 2.9, BioC <= 2.4, you will need to devise your >> own method; one of which was given by Saroj. If you are using R- >> devel and BioC 2.5 (devel), the eSet abstract class and its derived >> classes such as ExpressionSet contain a new slot called >> protocolData that contains an AnnotatedDataFrame object. This slot >> is to be populated by metadata contained in microarray data files. >> In BioC 2.5 (devel) the read.affybatch from affy and read.celfile >> from affyio add a ScanData column to the protocolData slot with the >> metadata you are looking for. >> Cheers, >> Patrick >> Saroj K Mohapatra wrote: >>> Hi Rob: >>> >>> I have a file called _16.CEL. I want to find out the date >>> information in its header. The following gives me: >>> >>> \$ strings _16.CEL | grep DatHeader >>> DatHeader=[2..65534] _16:CLS=7365 RWS=7365 XIN=1 YIN=1 >>> VE=30 2.0 10/27/06 10:57:45 50207590 M10 I find a date >>> 10/27/06. Is this what you are looking for? >>> >>> Best, >>> >>> Saroj >>> >>> >>> Robert Dunne wrote: >>>> Hi List, >>>> >>>> I apologise for what may be a very simple question. How can I >>>> retrieve >>>> the scan date information from cel files? >>>> >>>> I can find the information using some editors, kate under linux >>>> shows >>>> "a f f y m e t r i x - s c a n - d a t e ( 2 0 0 8 - 0 4 - 0 3" >>>> but I can't find it all all using vi or emacs. I suppose this is >>>> something to do with encoding. >>>> Also "string file.cel | grep "d a t e"" does not work. >>>> >>>> I have tried the affxparser library but >>>> readCelHeader("file.cel") >>>> does not pick up the date. >>>> >>>> Unfortunately in many experiments the scan date turns out to be >>>> the major effect. >>>> >>>> Bye >>>> Rob >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> ------------------------------------------------------------------ ------ >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor