When applying rma to featureset get error: 'package' must be of length 1
Entering edit mode
salamandra ▴ 20
Last seen 4 months ago

I am trying to apply rma from oligo package into data of file GSE22247_non-normalized_data.txt from this study: GSE22247 Did:

# read txt file into dataframe:
rawtxt <- read.delim(paste0(txtpath), sep='\t', skip = 4, header = T)
# convert dataframe into expression set:
rawData <- new("ExpressionFeatureSet", exprs = as.matrix(rawtxt)) 
# apply rma:
normData <- rma(rawData)

This gives the error:

Error in library(pdn, character.only = TRUE) : 
  'package' must be of length 1

Also, when printing rawData gives this error:

 ExpressionFeatureSet (storageMode: lockedEnvironment)
assayData: 48803 features, 13 samples 
  element names: exprs 
protocolData: none
phenoData: none
featureData: none
experimentData: use 'experimentData(object)'
Failed with error:  ‘'package' must be of length 1’
Attempting to obtain '' from BioConductor website.
Checking to see if your internet connection works...
Error in if (!pkgname %in% biocPkgs[, "Package"]) { : 
  argument is of length zero

How to solve this?

oligo biobase GEOquery • 796 views
Entering edit mode
Last seen 7 minutes ago
United States

The array you are trying to read in is an Illumina HT-12 array, which has one reporter molecule per gene. On the other hand, the oligo rma function is intended for (primarily) Affymetrix arrays, which tend to have multiple reporter molecules per gene, and for which the rma function normalizes and then summarizes the multiple reporter molecules into one measure per gene. These are not the same thing!

You could hypothetically import those data into a structure that is useful for analyzing the data, but are you sure you will be able to do something more awesome than the quantile normalization that the original authors did? If not, you can use getGEO to get the normalized data directly, and then go forward with whatever analysis you plan on doing.

Back in the old days (like mid 2000's) how to normalize microarray data was a hot topic, and people spent any number of hours trying to figure out what the best method is. But you know what? It's not possible to know that! Without knowing what the exact underlying expression of the genes was, all you can do is point to some other measures like proportion of detected genes at a given FDR or whatever, and claim that method A is clearly superior to method B, based on whatever ad hoc method you just dreamt up (presumably because it shows that your method A obviously wins over your competitor's method B).

In the end there isn't much profit in doing a bunch of fancy stuff, particularly for some study with two replicates at each time point. That's not enough replicates (IMO) for you to get any reasonable insight anyway, so the normalization is the least of your worries.

Entering edit mode

Thank you for your answer. The idea is to apply the same normalization method to several microarray raw data from several studies. I wanted to do it on affymetrix, illumina and agilent data. Do you know a normalization method that can be applied to these three platforms or at least applied to both affymetrix and illumina? If not, could you please tell me what kind of normalization/package is done on illuminia and agilent, and whether limma (diferential gene expression) can be applied to these three?

Entering edit mode

I think you are going about this all wrong. You seem to want to do stuff without having the requisite background knowledge to know if what you are doing is sensible or not. For example, you want to do 'the same normalization method' on three different array platforms without knowing what methods are available for those platforms!

It's not clear why you would want to do such a thing, unless you think that by using the same normalization method on the different platforms you could then combine or something? But that's not a thing. Normalizing data doesn't make data from completely different platforms comparable. It's intended to remove technical biases between arrays of the same type.

Anyway, you are getting tripped up on technical issues because you don't know enough about the arrays and how best to analyze. That's not something you get fixed by posting questions on this site. You fix that by doing your homework, learning about the different array platforms, and how they are analyzed. You could start with the affy, oligo, and limma vignettes (or User's guide in the case of limma). Somehow lumi doesn't have a vignette, so you could read the citations for that package.


Login before adding your answer.

Traffic: 1255 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6