Hi all,
I am not very experience with working in R and am running into the following error:
I am trying to extract the table of beta values from the 450k Illumina assay data set GSE61380 on the Gene Expression Omnibus using the code below. It worked well for other data sets, but in this one I have five samples (GSM1503509, GSM1503517, GSM1503522, GSM1503524, GSM1503525) that only return NAs. Interestingly, those are all the female samples of the data set, so I guess that is not by coincidence but I donĀ“t see any difference in the organization of data from male and female samples. When I look into the data table within the GSMlist, the beta values are there, so I am "loosing" them in the process of creating a data frame from the $VALUE in the data table. I hope someone can help me further.
Thanks, Carolin
library(BiocManager)
library(GEOquery)
library(dplyr)
gse<- getGEO("GSE61380",GSEMatrix=FALSE) #GSEMatriX had to be set to FALSE for the following steps to work
##make sure that all of the GSMs are from the same platform:
gsmplatforms <- lapply(GSMList(gse),function(x) {Meta(x)$platform_id})
head(gsmplatforms)
##If they are you can proceed with
gsmlist <- GSMList(gse) #to get the list of all GSM
# get the probeset ordering
probesets <- Table(GPLList(gse)[[1]])$ID
# make the data matrix from the VALUE columns from each GSM
# being careful to match the order of the probesets in the platform with those in the GSMs
data.matrix <- do.call('cbind',lapply(gsmlist,function(x)
{tab <- Table(x)
mymatch <- match(probesets,tab$ID_REF)
return(tab$VALUE[mymatch])
}))
Results:
session info:
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server x64 (build 17763)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] lumi_2.50.0
[2] IlluminaHumanMethylation450kmanifest_0.4.0
[3] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.1
[4] minfi_1.42.0
[5] bumphunter_1.38.0
[6] locfit_1.5-9.6
[7] iterators_1.0.14
[8] foreach_1.5.2
[9] Biostrings_2.64.0
[10] XVector_0.36.0
[11] SummarizedExperiment_1.26.0
[12] MatrixGenerics_1.8.0
[13] matrixStats_0.62.0
[14] GenomicRanges_1.48.0
[15] GenomeInfoDb_1.34.2
[16] IRanges_2.30.0
[17] S4Vectors_0.34.0
[18] dplyr_1.0.10
[19] GEOquery_2.66.0
[20] Biobase_2.56.0
[21] BiocGenerics_0.44.0
[22] BiocManager_1.30.19
>head(data.matrix)
GSM1503499 GSM1503500 GSM1503501 GSM1503502 GSM1503503 GSM1503504 GSM1503505 GSM1503506 GSM1503507
[1,] 0.41355043 0.56418932 0.54025471 0.40691577 0.46972631 0.45607333 0.4643009 0.4931707 0.35833699
[2,] 0.84803068 0.81087147 0.85702235 0.82328461 0.81902802 0.86374916 0.8314105 0.8864755 0.82498740
[3,] 0.50388175 0.42545290 0.37409633 0.39843678 0.51132252 0.46674150 0.4676011 0.5228090 0.44926351
[4,] 0.86577685 0.83432248 0.85594714 0.88428414 0.84440467 0.86997741 0.8616151 0.8819407 0.84325473
[5,] 0.37064915 0.32756148 0.40436536 0.38826454 0.40114029 0.38934984 0.4060475 0.3258828 0.28145007
[6,] 0.09154394 0.06799166 0.09573691 0.05955089 0.06101147 0.08034284 0.1089532 0.0780580 0.08195993
GSM1503508 GSM1503509 GSM1503510 GSM1503511 GSM1503512 GSM1503513 GSM1503514 GSM1503515 GSM1503516
[1,] 0.44322170 NA 0.41800920 0.3976947 0.40166818 0.41311807 0.41223355 0.46943568 0.43233183
[2,] 0.86218374 NA 0.79261189 0.7885919 0.86425598 0.87240860 0.85175773 0.84363113 0.86930969
[3,] 0.43261736 NA 0.47306780 0.4288925 0.37944154 0.47241740 0.44480832 0.46066579 0.40216191
[4,] 0.80441305 NA 0.90317678 0.8395202 0.84382313 0.84923787 0.86118794 0.81881974 0.89227705
[5,] 0.27235286 NA 0.40867223 0.4015340 0.34978985 0.37300237 0.43426375 0.31597128 0.35896960
[6,] 0.07556178 NA 0.08236327 0.1906769 0.06753515 0.08765025 0.05392182 0.07266497 0.07478928
GSM1503517 GSM1503518 GSM1503519 GSM1503520 GSM1503521 GSM1503522 GSM1503523 GSM1503524 GSM1503525
[1,] NA 0.47431720 0.45687939 0.46646230 0.44409902 NA 0.4894551 NA NA
[2,] NA 0.89878732 0.84939845 0.87269076 0.87357208 NA 0.8854879 NA NA
[3,] NA 0.40335692 0.43550064 0.47369855 0.44548559 NA 0.4082833 NA NA
[4,] NA 0.86456979 0.87486284 0.84615118 0.85770545 NA 0.8189913 NA NA
[5,] NA 0.46743924 0.48145035 0.28816066 0.47723922 NA 0.3614606 NA NA
[6,] NA 0.06609402 0.05622663 0.07099919 0.08569245 NA 0.1019801 NA NA
GSM1503526 GSM1503527 GSM1503528 GSM1503529 GSM1503530 GSM1503531
[1,] 0.4508956 0.38191045 0.40482146 0.37511230 0.35584502 0.3536273
[2,] 0.8127178 0.83116632 0.88256683 0.80537954 0.88485828 0.8126514
[3,] 0.4346078 0.37101632 0.52398558 0.42709756 0.42487552 0.4842869
[4,] 0.8435249 0.83192070 0.90559256 0.85480326 0.88494896 0.9127295
[5,] 0.4263578 0.31819248 0.49592397 0.34929792 0.31813783 0.2563915
[6,] 0.2193310 0.08670669 0.04609017 0.07623453 0.07488773 0.1145267
>