Question

When applying rma to featureset get error: 'package' must be of length 1

0

Entering edit mode

salamandra ▴ 20

@salamandra-12825

Last seen 2.3 years ago

Portugal

I am trying to apply rma from oligo package into data of file GSE22247_non-normalized_data.txt from this study: GSE22247 Did:

# read txt file into dataframe:
rawtxt <- read.delim(paste0(txtpath), sep='\t', skip = 4, header = T)
# convert dataframe into expression set:
rawData <- new("ExpressionFeatureSet", exprs = as.matrix(rawtxt)) 
# apply rma:
normData <- rma(rawData)

This gives the error:

Error in library(pdn, character.only = TRUE) : 
  'package' must be of length 1

Also, when printing rawData gives this error:

 ExpressionFeatureSet (storageMode: lockedEnvironment)
assayData: 48803 features, 13 samples 
  element names: exprs 
protocolData: none
phenoData: none
featureData: none
experimentData: use 'experimentData(object)'
Annotation:  
character(0)
Failed with error:  ‘'package' must be of length 1’
Attempting to obtain '' from BioConductor website.
Checking to see if your internet connection works...
Error in if (!pkgname %in% biocPkgs[, "Package"]) { : 
  argument is of length zero

How to solve this?

oligo biobase GEOquery • 1.7k views

ADD COMMENT • link updated 4.5 years ago by James W. MacDonald 65k • written 4.5 years ago by salamandra ▴ 20

score 0 · Answer 1 · 2019-10-28

The array you are trying to read in is an Illumina HT-12 array, which has one reporter molecule per gene. On the other hand, the oligo rma function is intended for (primarily) Affymetrix arrays, which tend to have multiple reporter molecules per gene, and for which the rma function normalizes and then summarizes the multiple reporter molecules into one measure per gene. These are not the same thing!

You could hypothetically import those data into a structure that is useful for analyzing the data, but are you sure you will be able to do something more awesome than the quantile normalization that the original authors did? If not, you can use getGEO to get the normalized data directly, and then go forward with whatever analysis you plan on doing.

Back in the old days (like mid 2000's) how to normalize microarray data was a hot topic, and people spent any number of hours trying to figure out what the best method is. But you know what? It's not possible to know that! Without knowing what the exact underlying expression of the genes was, all you can do is point to some other measures like proportion of detected genes at a given FDR or whatever, and claim that method A is clearly superior to method B, based on whatever ad hoc method you just dreamt up (presumably because it shows that your method A obviously wins over your competitor's method B).

In the end there isn't much profit in doing a bunch of fancy stuff, particularly for some study with two replicates at each time point. That's not enough replicates (IMO) for you to get any reasonable insight anyway, so the normalization is the least of your worries.