Question

example of working from a database dump

0

Entering edit mode

Vincent J. Carey, Jr. 6.7k

@vincent-j-carey-jr-4

Last seen 6 weeks ago

United States

On Wed, 24 Sep 2003, Kaushik, Narendra K wrote: > I have file in this format: > > Sequence ID Gene_info Avag_diff1 Avg_diff2..............Avg_diif95 > fffffffff Gene1 45 60 56 > -------- ------ ------ ------- ...............----- > 9000 Gene9000 34 45 56 > > Avg_diff1--- Avg_diff95 etc values of gene expression data from N=95 chips. > This is not Affy hgu95av2 Chip. we have custom made chip oligo synthesize > in situ and are processed by PM-MM method. > > Narendra > there are certain conventions that need to be obeyed. suppose i edit your data snippet as follows SequenceID Gene_info Avag_diff1 Avg_diff2 Avg_diif95 fffffffff Gene1 45 60 56 9000 Gene9000 34 45 56 and now those three lines are the contents of file "tab". in R, the command > ERtab <- read.table("tab", h=TRUE) assigns the data to the object ERtab, which is a data.frame. now you can do some statistics: > apply(ERtab[,3:5],2,mean) Avag.diff1 Avg.diff2 Avg.diif95 39.5 52.5 56.0 so with a little bit of massaging of a non-standard data snippet and two R commands i have learned something about the data. you will need to learn some R. 1) no embedded blanks in variable (column) names 2) embedded underscores are translated to ".", 3) read.table is powerful and will distinguish between numeric and character data. you may have trouble reading in all 95 columns unless you have lots of RAM. once you have the matrix of numbers you can consider structuring the data in the exprSet class. clearly there are some interesting a priori distinctions among the 95 chips. these should be encoded in the phenoData component of the exprSet. read the Biobase and affy vignettes to learn more about this. there may also be facilities in limma and the marray* tools that can help you with your custom chips. suppose you simply don't have enough RAM to read in the data. you will have to divide and conquer in an appropriate way. it may be that you can cut the data up into chunks of genes, with 95 instances of all genes in each chunk. or you can cut it up into chunks of chips, with 9500 genes on all chips in the chunk. you need to think about filtering to make this manageable if your computing resources are insufficient to deal with the whole dataset. R has all the tools you need to do this -- you can work with scan, e.g., to get subsets of records in the file. or you may have operating system facilities that help with file decomposition.

hgu95av2 Biobase affy limma hgu95av2 Biobase affy limma • 718 views

ADD COMMENT • link 20.6 years ago Vincent J. Carey, Jr. 6.7k