Dear Bioconductors,
I recently started using Bioconductor (sorry to be a total newbie, but i'm just finding my feet with R...) and I would like to perform a geneset variation analysis (GSVA).
I have the following data:
RNA-expression-file with Gene Symbols in the first column and the corresponding expression values for the individual samples in the following columns (simplified sample):
NAME | sample 1 | sample 2 | sample 3 | sample 4 | sample 5 |
A1BG | 5.045152135 | 3.675982862 | 4.034678849 | 1.586550605 | 1.416436753 |
A2BP1 | 1.216242033 | 26.73442081 | 19.98573476 | 2.379825908 | 6.041536357 |
A2LD1 | 0.720736019 | 7.351965723 | 2.796126272 | 0.741539957 | 1.040647411 |
A2ML1 | 7.83800421 | 26.5761249 | 9.458037859 | 9.467568285 | 9.134571716 |
A2M | 793.6429722 | 914.6689604 | 1045.075652 | 619.8239331 | 624.4462602 |
A4GALT | 3.941525105 | 8.143445287 | 2.402041361 | 1.879717565 | 6.590766935 |
AAAS | 31.39706284 | 18.34473744 | 30.41960192 | 65.66939896 | 30.612378 |
Phenotype file (simplified sample):
xy_mutation | |
sample 1 | 1 |
sample 2 | 0 |
sample 3 | 1 |
sample 4 | 0 |
sample 5 | 1 |
Geneset (gmt) file (simplified sample):
geneset_1 | geneset_1 | A1BG | A2BP1 | A2LD1 | A2ML1 | A2M | A4GALT | AAAS |
geneset_2 | geneset_2 | A2LD1 | A2ML1 | A2M | A4GALT | AAAS | ||
geneset_3 | geneset_3 | A2BP1 | A2LD1 | A2ML1 |
The following code initially works fine for processing the above shown data:
#load mRNA expression data
dataDirectory <- system.file("extdata", package = "Biobase") exprsFile <- "C:/expression.txt" exprs <- as.matrix(read.table(exprsFile, header = TRUE, sep = "\t", row.names = 1, as.is = TRUE))
#load phenotype data pDataFile <- "C:/phenotype.txt" pData <- read.table(pDataFile, row.names=1, header=TRUE, sep="\t") phenoData <- new("AnnotatedDataFrame",data=pData)
#check all(rownames(pData)==colnames(exprs))
#load genesets from gmt file gsc <- getGmt("C:/GS.txt", geneIdType=SymbolIdentifier(), collectionType=NullCollection(), sep="\t" )
#build expressionset exampleSet <- ExpressionSet(assayData=exprs, phenoData=phenoData)
#perform gsva gsva(exampleSet, gsc, min.sz=4, max.sz=500, verbose=TRUE)
However here I get the following error:
Mapping identifiers between gene sets and feature names
Fehler in GSEABase::mapIdentifiers(gset.idx.list, GSEABase::AnnoOrEntrezIdentifier(annotation(expr))) :
Fehler bei der Auswertung des Argumentes 'to' bei der Methodenauswahl
für Funktion 'mapIdentifiers': Fehler in validObject(.Object) :
invalid class “ScalarCharacter” object: ScalarCharacter must have length one
So the problem starts when mapping the identifiers between gene sets and feature names - I guess the problem may be that I built my ExpressionSet object without the annotation argument. I thought this is the right way, since my expression file already contains gene symbols (and not a chip-ID (like Affy-ID or Illumina-ID)) which already corresponds to the gene symbols in my geneset. How can i fix this problem?
Thank you
Philipp