Hi,
I'm relatively new to coding and especially new to R and conducting RNA-seq. I've found a dataset (GSE97562) that I wanted to practice RNA-seq with. I've already tested code with other data and was able to successfully convert CEL files using the affy package. However, with this dataset, affy returns NA values in the count data and the warning message said to use oligo or xps.
Warning messages:
1:
The affy package can process data from the Gene ST 1.x series of arrays,
but you should consider using either the oligo or xps packages, which are specifically
designed for these arrays.
Example of data:
7892502 NA
7892503 NA
7892504 NA
7892505 NA
7892506 NA
7892507 NA
7892508 NA
7892509 NA
7892510 NA
7892511 NA
7892512 NA
7892513 NA
7892514 NA
I've been reading and looking for possible methods to convert the CEL file to txt but have resolved to asking on this forum. I will show about as far as I can get before getting stumped.
> library(oligo)
> setwd("C:example")
> grab<-getwd()
> CELfile<-list.celfiles(path=grab)
> rawdata<-read.celfiles(filenames=CELfile)
Platform design info loaded.
Reading in : GSM2572161_LMTR1414T0.CEL
> help = rma(rawdata)
Background correcting
Normalizing
Calculating Expression
It is here that I'm not sure what to do with the expression set. I feel that it might be obvious but I genuinely could not find what to do. The documentation confused me more. The result I want is the same as what occurs with the affy package, a raw count matrix with gene id's per row and counts down columns.
example of affy conversion:
GSM397303.CEL GSM397304.CEL GSM397307.CEL GSM397308.CEL GSM397311.CEL GSM397312.CEL
1007_s_at 149.9375298 129.9978191 122.7255732 140.9597272 111.5337299 118.2913423
1053_at 2382.145666 2514.313855 2543.690337 2176.61948 2664.961451 2276.169195
117_at 106.0720674 113.4188407 107.8063683 201.4040755 116.231378 145.2329151
121_at 444.0947983 467.1769612 378.3758104 380.3314096 422.2909195 355.5245813
But I cannot find out how to convert this data properly. Writing it to csv directly splits the data along the columns in chaotic manner and returns processed data. Is there a way to keep the raw counts from CEL and just have the package convert it to txt or csv?
Thanks!