Good afternoon,
The aim of my analysis is to perform Limma in order to generate p values and fold changes for each protein between alive and dead patients. I have created 3 csv files with information on these proteins:
1) ExpressionMatrix: Contains protein abundance values as rows (1552). And each sample (alive and dead patients) as columns (81) 2) FeatureData: Contains each protein as rows (1552). And one other column (Description: This is a description of each protein) 3) PhenotypeData: Contains each sample as rows (81). And two other columns with the headings : Patient ID and State.
I understand that I need to create a Class to hold these different objects in other to make the Limma analysis more straightforward (I know this from using the Limma Tutorial from DataCamp). I did the boxplot function which uses the 3 different objects I'm using in other to confirm that all 3 objects are in the correct data format as the boxplot produced the expected result. But when I try to create a Class of all 3 objects, I keep getting the error: Error in validObject(.Object) :
invalid class “ExpressionSet” object: 1: sampleNames differ between assayData and phenoData
invalid class “ExpressionSet” object: 2: sampleNames differ between phenoData and protocolData
I figure the issue is with the way I have structured objects "e" and "p"(ExpressionMatrix and PhenotypeData) but I am not sure how to fix it? Below is the code I have used from start to finish. Thank you for you help in advance!
Shimon
>setwd("D:/sa825/Using Limma")
> x<-read.csv("ExpressionMatrix.csv", stringsAsFactors = FALSE)
> f<-read.csv("FeatureData.csv")
> p<-read.csv("PhenotypeData.csv")
> e<-as.matrix(x)
> typeof(e)
[1] "double"
> class(e)
[1] "matrix"
> boxplot(e[1 , ] ~ p$State, main = f[1 , "Description"])
> source("https://bioconductor.org/biocLite.R")
WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of
Rtools before proceeding:
https://cran.rstudio.com/bin/windows/Rtools/
Installing package into ‘C:/Users/User/Documents/R/win-library/3.5’
(as ‘lib’ is unspecified)
trying URL 'https://bioconductor.org/packages/3.7/bioc/bin/windows/contrib/3.5/BiocInstaller_1.30.0.zip'
Content type 'application/zip' length 102191 bytes (99 KB)
downloaded 99 KB
package ‘BiocInstaller’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\User\AppData\Local\Temp\RtmpkLuJuJ\downloaded_packages
Bioconductor version 3.7 (BiocInstaller 1.30.0),
?biocLite for help
A newer version of Bioconductor is available for this
version of R, ?BiocUpgrade for help
> biocLite("Biobase")
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.7 (BiocInstaller 1.30.0), R 3.5.3
(2019-03-11).
Installing package(s) ‘Biobase’
also installing the dependency ‘BiocGenerics’
trying URL 'https://bioconductor.org/packages/3.7/bioc/bin/windows/contrib/3.5/BiocGenerics_0.26.0.zip'
Content type 'application/zip' length 745077 bytes (727 KB)
downloaded 727 KB
trying URL 'https://bioconductor.org/packages/3.7/bioc/bin/windows/contrib/3.5/Biobase_2.40.0.zip'
Content type 'application/zip' length 2413751 bytes (2.3 MB)
downloaded 2.3 MB
package ‘BiocGenerics’ successfully unpacked and MD5 sums checked
package ‘Biobase’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\User\AppData\Local\Temp\RtmpkLuJuJ\downloaded_packages
installation path not writeable, unable to update
packages: boot, class, cluster, KernSmooth, lattice,
MASS, Matrix, mgcv, nlme, nnet, rpart, spatial,
survival
> library(Biobase)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind,
colMeans, colnames, colSums, dirname, do.call, duplicated,
eval, evalq, Filter, Find, get, grep, grepl, intersect,
is.unsorted, lapply, lengths, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position,
rank, rbind, Reduce, rowMeans, rownames, rowSums, sapply,
setdiff, sort, table, tapply, union, unique, unsplit,
which, which.max, which.min
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages
'citation("pkgname")'.
> eset <- ExpressionSet(assayData = e,
+ phenoData = AnnotatedDataFrame(p),
+ featureData = AnnotatedDataFrame(f))
Error in validObject(.Object) :
invalid class “ExpressionSet” object: 1: sampleNames differ between assayData and phenoData
invalid class “ExpressionSet” object: 2: sampleNames differ between phenoData and protocolData
> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets
[7] methods base
other attached packages:
[1] Biobase_2.40.0 BiocGenerics_0.26.0
loaded via a namespace (and not attached):
[1] compiler_3.5.3 tools_3.5.3
Firstly:
Since you are using R > 3.5.0 Could you try using BiocManager for updating packages instead of
biocLite( )
.biocLite( )
was deprecated in favor ofBiocManager::install( )
. Once this is installed, please check that all your packages are up-to-date and a valid version of Bioconductor.From R > 3.5.0 you should be using BiocManager.
Since we don't have access to your files,
You could check the rownames and colnames of the objects
e
,AnnotatedDataFrame(p)
andAnnotatedDataFrame(f)
. As the ERROR indicates the sampleNames need to be consistent so you can make sure none are missing/excluded or that a transpose might be necessary.Hi Shepherl,
Thank you so much for your response. I have used BiocManager as you instructed:
Then I re-tried the ExpressionSet function but it didn't work:
I used colnames and rownames as you suggested and I can see that the colnames and rownames are different. So I transposed the csv files that match
AnnotatedDataFrame(p)
&AnnotatedDataFrame(f)
to match "e". But with this new transposed files (p3 and f3), the boxplot function does not work and neither does the ExpressionSet function:I'm not really sure what to do. Can I send you access to the files I am working on? Or screenshots of how the data is structured
Thanks, Shimon