Affy loess normalization of RNA-seq data
1
0
Entering edit mode
JW • 0
@jw-10913
Last seen 7.8 years ago

I am trying to perform loess normalization on a collection of RNA-seq datasets.  Loven et al (Cell. 2012 October 26; 151(3): 476–482) claims to have used the affy package in R.  I have the affy package installed but can't figure out how to specify my data files for 'mat' or 'subset' and I have no experience working in R.  The affy manual provides instructions on how to load CEL files from microarray data, but my RNA-seq data is in the form of a tab delimited file on gene names and RPKM values.

Running the software without specifying the files gives the following:

> normalize.loess(mat, subset = sample(1:(dim(mat)[1]), min(c(5000,
+                      nrow(mat)))), epsilon = 10^-2, maxit = 1, log.it =
+                      TRUE, verbose = TRUE, span = 2/3, family.loess =
+ "symmetric")
Error in normalize.loess(mat, subset = sample(1:(dim(mat)[1]), min(c(5000,  :
  object 'mat' not found

 

How do I input my data/specify my data files?  Is there information on the format of these two files (mat and subset)? Is there a better software package for this?

Any help would be appreciated!

-J

 

affy RNAseq loess • 1.2k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 1 day ago
The city by the bay

Well, if you don't specify the files or the input data, how can you expect the software to do the right thing? R can do many things, but it can't read minds. Here's a three-step guide to help you on your way:

  1. Learn how to use R. By the sounds of it, you're trying to run before you can walk. A quick web search will reveal a number of useful tutorials, so get your foundations right (hint: read.table). Or even better, find a local bioinformatician or computational biologist and ask them to can help you out.
  2. Get the count matrix rather than the RPKM matrix. For loess normalization, I suspect that the RPKM matrix mightn't be the best to use. Any abundance-dependent trend will be distorted by the division by gene length, which will effectively shuffle genes along the x-axis. In any case, many downstream analyses of RNA-seq data will require the raw counts, so you'll need them sooner or later.
  3. Once you've got all that down, have a look at ?voom with normalize.method="cyclicloess" in the limma package.
ADD COMMENT

Login before adding your answer.

Traffic: 837 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6