is it possible to find sample batch # in CEL files?
5
0
Entering edit mode
Brian Tsai ▴ 40
@brian-tsai-4504
Last seen 10.2 years ago
Hi, I've been downloading raw CEL files from the gene expression omnibus, and have been trying to process them -- i'd like to account for batch effect when computing differential expression, but the authors didn't provide the information explicitly in their annotations. Is this information stored/retrievable through the CEL files through Bioconductor? [[alternative HTML version deleted]]
PROcess PROcess • 2.2k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
Not a direct answer, but you might look at the sva package which does not rely on externally-defined batch effects. Sean On Thu, Jan 24, 2013 at 8:23 AM, Brian Tsai <btsai00 at="" gmail.com=""> wrote: > Hi, > > I've been downloading raw CEL files from the gene expression omnibus, and > have been trying to process them -- i'd like to account for batch effect > when computing differential expression, but the authors didn't provide the > information explicitly in their annotations. Is this information > stored/retrievable through the CEL files through Bioconductor? > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Guido Hooiveld ★ 4.1k
@guido-hooiveld-2020
Last seen 1 day ago
Wageningen University, Wageningen, the …
Hi, Some time ago I came across these lines of code, that could be of help: http://bios.ucdenver.edu/images/a/a1/Affy_headerinfo.txt Never used it myself, though. HTH, Guido -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of Brian Tsai Sent: Thursday, January 24, 2013 14:23 To: bioconductor at r-project.org Subject: [BioC] is it possible to find sample batch # in CEL files? Hi, I've been downloading raw CEL files from the gene expression omnibus, and have been trying to process them -- i'd like to account for batch effect when computing differential expression, but the authors didn't provide the information explicitly in their annotations. Is this information stored/retrievable through the CEL files through Bioconductor? [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
James F. Reid ▴ 120
@james-f-reid-2808
Last seen 10.2 years ago
Hi Brian, On 24/01/13 13:23, Brian Tsai wrote: > Hi, > > I've been downloading raw CEL files from the gene expression omnibus, and > have been trying to process them -- i'd like to account for batch effect > when computing differential expression, but the authors didn't provide the > information explicitly in their annotations. Is this information > stored/retrievable through the CEL files through Bioconductor? you should be able to access the date the chip was scanned using the readCelHeader function provided in the affxparser package. Look for the 'datheader' entry. James.
ADD COMMENT
0
Entering edit mode
Hi I haven't tried this in a while, but afaIcs the 'readAffy' function in the 'affy' package automatically populates the 'ScanDate' field in the resulting AffyBatch object, which you can access with syntax like protocolData(a)$ScanDate where I have assumed that 'a' is an AffyBatch. Best wishes Wolfgang Il giorno Jan 24, 2013, alle ore 2:43 PM, James F. Reid <reidjf at="" gmail.com=""> ha scritto: > Hi Brian, > > On 24/01/13 13:23, Brian Tsai wrote: >> Hi, >> >> I've been downloading raw CEL files from the gene expression omnibus, and >> have been trying to process them -- i'd like to account for batch effect >> when computing differential expression, but the authors didn't provide the >> information explicitly in their annotations. Is this information >> stored/retrievable through the CEL files through Bioconductor? > you should be able to access the date the chip was scanned using the readCelHeader function provided in the affxparser package. Look for the 'datheader' entry. > > James. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Rob Dunne ▴ 230
@rob-dunne-292
Last seen 10.2 years ago
Hi Brian, affxparser has a function called readCelHeader. library(affxparser) dates<-rep(0,length(files)) for (i in 1:length(files)){ datheader<-readCelHeader(ff[i])$datheader dd<-gsub(".*([0-9]{2,2}/[0-9]{2,2}/[0-9]{2,2}).*","\\1", datheader) dates[i]<-dd } Bye Rob ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] On Behalf Of Brian Tsai [btsai00@gmail.com] Sent: Friday, January 25, 2013 12:23 AM To: bioconductor at r-project.org Subject: [BioC] is it possible to find sample batch # in CEL files? Hi, I've been downloading raw CEL files from the gene expression omnibus, and have been trying to process them -- i'd like to account for batch effect when computing differential expression, but the authors didn't provide the information explicitly in their annotations. Is this information stored/retrievable through the CEL files through Bioconductor? [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
@suprunmaria-7729
Last seen 2.0 years ago
United States

We are getting batch date using the following code:

pData(protocolData(a)[sampleNames(a),])$ScanDate

Or this code to process all the samples: 

a$Batch <- sapply(pData(protocolData(a)[sampleNames(a),])$ScanDate, function(x){substr(x,1,10)})

If you only use protocolData(a)$ScanDate it might depend on the sorting and will assign some dates incorrectly.

ADD COMMENT

Login before adding your answer.

Traffic: 552 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6