Question

Using oligo or xps to read and convert RNA-seq data CEL file to txt file.

0

Entering edit mode

rodj5201 • 0

@1efd0df9

Last seen 20 months ago

United States

Hi,

I'm relatively new to coding and especially new to R and conducting RNA-seq. I've found a dataset (GSE97562) that I wanted to practice RNA-seq with. I've already tested code with other data and was able to successfully convert CEL files using the affy package. However, with this dataset, affy returns NA values in the count data and the warning message said to use oligo or xps.


Warning messages:
1: 
The affy package can process data from the Gene ST 1.x series of arrays,
but you should consider using either the oligo or xps packages, which are specifically
designed for these arrays.

Example of data: 
7892502 NA
7892503 NA
7892504 NA
7892505 NA
7892506 NA
7892507 NA
7892508 NA
7892509 NA
7892510 NA
7892511 NA
7892512 NA
7892513 NA
7892514 NA

I've been reading and looking for possible methods to convert the CEL file to txt but have resolved to asking on this forum. I will show about as far as I can get before getting stumped.

> library(oligo)
> setwd("C:example")
> grab<-getwd()
> CELfile<-list.celfiles(path=grab)
> rawdata<-read.celfiles(filenames=CELfile)
Platform design info loaded.
Reading in : GSM2572161_LMTR1414T0.CEL
> help = rma(rawdata)
Background correcting
Normalizing
Calculating Expression

It is here that I'm not sure what to do with the expression set. I feel that it might be obvious but I genuinely could not find what to do. The documentation confused me more. The result I want is the same as what occurs with the affy package, a raw count matrix with gene id's per row and counts down columns.

example of affy conversion:
GSM397303.CEL   GSM397304.CEL   GSM397307.CEL   GSM397308.CEL   GSM397311.CEL   GSM397312.CEL
1007_s_at   149.9375298 129.9978191 122.7255732 140.9597272 111.5337299 118.2913423
1053_at 2382.145666 2514.313855 2543.690337 2176.61948  2664.961451 2276.169195
117_at  106.0720674 113.4188407 107.8063683 201.4040755 116.231378  145.2329151
121_at  444.0947983 467.1769612 378.3758104 380.3314096 422.2909195 355.5245813

But I cannot find out how to convert this data properly. Writing it to csv directly splits the data along the columns in chaotic manner and returns processed data. Is there a way to keep the raw counts from CEL and just have the package convert it to txt or csv?

Thanks!

oligoData DataImport oligo RNASeqData convert • 1.5k views

ADD COMMENT • link updated 20 months ago by James W. MacDonald 68k • written 20 months ago by rodj5201 • 0

score 2 · Accepted Answer · 2023-07-06

When you do

help = rma(rawdata)

Which by the way is a poor choice for a variable name, you are doing the same thing as you would have done with the affy package. You can get the data matrix using the exprs function, just as you would have done with the affy package.

But do note that this matrix is not a 'raw count matrix'! This is not RNA-Seq, it's a microarray, and the rma function is background correcting, normalizing, and summarizing the raw data. What you have there are summarized data that you can then use to make comparisons.

There is also no profit in exporting these data, as the ExpressionFeatureSet you have created (your 'help' object) is ready for analysis using the limma package, which has its own very extensive vignette that you should peruse.