I'm relatively new to coding and especially new to R and conducting RNA-seq. I've found a dataset (GSE97562) that I wanted to practice RNA-seq with. I've already tested code with other data and was able to successfully convert CEL files using the affy package. However, with this dataset, affy returns NA values in the count data and the warning message said to use oligo or xps.
Warning messages: 1: The affy package can process data from the Gene ST 1.x series of arrays, but you should consider using either the oligo or xps packages, which are specifically designed for these arrays. Example of data: 7892502 NA 7892503 NA 7892504 NA 7892505 NA 7892506 NA 7892507 NA 7892508 NA 7892509 NA 7892510 NA 7892511 NA 7892512 NA 7892513 NA 7892514 NA
I've been reading and looking for possible methods to convert the CEL file to txt but have resolved to asking on this forum. I will show about as far as I can get before getting stumped.
> library(oligo) > setwd("C:example") > grab<-getwd() > CELfile<-list.celfiles(path=grab) > rawdata<-read.celfiles(filenames=CELfile) Platform design info loaded. Reading in : GSM2572161_LMTR1414T0.CEL > help = rma(rawdata) Background correcting Normalizing Calculating Expression
It is here that I'm not sure what to do with the expression set. I feel that it might be obvious but I genuinely could not find what to do. The documentation confused me more. The result I want is the same as what occurs with the affy package, a raw count matrix with gene id's per row and counts down columns.
example of affy conversion: GSM397303.CEL GSM397304.CEL GSM397307.CEL GSM397308.CEL GSM397311.CEL GSM397312.CEL 1007_s_at 149.9375298 129.9978191 122.7255732 140.9597272 111.5337299 118.2913423 1053_at 2382.145666 2514.313855 2543.690337 2176.61948 2664.961451 2276.169195 117_at 106.0720674 113.4188407 107.8063683 201.4040755 116.231378 145.2329151 121_at 444.0947983 467.1769612 378.3758104 380.3314096 422.2909195 355.5245813
But I cannot find out how to convert this data properly. Writing it to csv directly splits the data along the columns in chaotic manner and returns processed data. Is there a way to keep the raw counts from CEL and just have the package convert it to txt or csv?