The 'targets file' is just a file that contains relevant phenotypic data about your subjects that you might use to fit a linear model. If you are getting the data from GEO, you have to rely on what the submitter(s) gave you, rather than having your own file.
As a pedantic aside, you don't actually have to specify any arguments to a function if you are planning to use the defaults. For example
getGEO("GSE52919", GSEMatrix =TRUE, getGPL=TRUE)
## is identical to
getGEO("GSE52919")
## because you are specifying existing default values
I mean there's nothing wrong with that, except it sort of implies that you are doing something different than the usual when in fact you aren't. Anyway...
> library(GEOquery)
Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)
Warning message:
package 'GEOquery' was built under R version 4.0.3
> z <- getGEO("GSE52919")[[1]]
Found 1 file(s)
GSE52919_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE52nnn/GSE52919/matrix/GSE52919_series_matrix.txt.gz'
Content type 'application/x-gzip' length 3758846 bytes (3.6 MB)
downloaded 3.6 MB
-- Column specification --------------------------------------------------------
cols(
ID_REF = col_character(),
GSM1278195 = col_double(),
GSM1278196 = col_double(),
GSM1278197 = col_double(),
GSM1278198 = col_double(),
GSM1278199 = col_double(),
GSM1278200 = col_double(),
GSM1278201 = col_double(),
GSM1278202 = col_double(),
GSM1278203 = col_double(),
GSM1278204 = col_double(),
GSM1278205 = col_double(),
GSM1278206 = col_double(),
GSM1278207 = col_double(),
GSM1278208 = col_double(),
GSM1278209 = col_double()
)
File stored at:
C:\Users\Public\Documents\Wondershare\CreatorTemp\RtmpIr34Gz/GPL13252.soft
> z
ExpressionSet (storageMode: lockedEnvironment)
assayData: 50238 features, 15 samples
element names: exprs
protocolData: none
phenoData
sampleNames: GSM1278195 GSM1278196 ... GSM1278209 (15 total)
varLabels: title geo_accession ... gender:ch1 (35 total)
varMetadata: labelDescription
featureData
featureNames: GT_44k_23_P100001 GT_44k_23_P100011 ...
GT_u92_snmRNA_Homo_00007431 (50238 total)
fvarLabels: ID GeneName ... SPOT_ID (6 total)
fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
pubMedIds: 26083014
Annotation: GPL13252
## the important part here is to note that the phenoData slot contains the phenotypic data, and can
## be accessed using the pData function
## you don't just want to print all that out however, so let's proceed cautiously
> names(pData(z))
[1] "title" "geo_accession"
[3] "status" "submission_date"
[5] "last_update_date" "type"
[7] "channel_count" "source_name_ch1"
[9] "organism_ch1" "characteristics_ch1"
[11] "characteristics_ch1.1" "molecule_ch1"
[13] "extract_protocol_ch1" "label_ch1"
[15] "label_protocol_ch1" "taxid_ch1"
[17] "hyb_protocol" "scan_protocol"
[19] "description" "data_processing"
[21] "platform_id" "contact_name"
[23] "contact_email" "contact_phone"
[25] "contact_department" "contact_institute"
[27] "contact_address" "contact_city"
[29] "contact_state" "contact_zip/postal_code"
[31] "contact_country" "supplementary_file"
[33] "data_row_count" "age:ch1"
[35] "gender:ch1"
## most of that information is boring and unimportant for our uses, so let's look at a subset.
> pData(z)[,c(1,34,35)]
title age:ch1 gender:ch1
GSM1278195 drug resistant_Group 2 [A-06] 43y male
GSM1278196 drug resistant_Group 2 [A-07] 61y female
GSM1278197 drug resistant_Group 2 [A-08] 32y female
GSM1278198 drug resistant_Group 2 [A-09] 43y female
GSM1278199 drug resistant_Group 2 [A-10] 20y male
GSM1278200 sensitive to AraC_Group 1 [A-01] 43y female
GSM1278201 sensitive to AraC_Group 1 [A-02] 50y male
GSM1278202 sensitive to AraC_Group 1 [A-03] 52y male
GSM1278203 sensitive to AraC_Group 1 [A-04] 33y female
GSM1278204 sensitive to AraC_Group 1 [A-05] 44y female
GSM1278205 sensitive to Dnr_Group 3 [A-11] 50y male
GSM1278206 sensitive to Dnr_Group 3 [A-12] 18y female
GSM1278207 sensitive to Dnr_Group 3 [A-13] 44y female
GSM1278208 sensitive to Dnr_Group 3 [A-14] 44y male
GSM1278209 sensitive to Dnr_Group 3 [A-15] 21y male
Which looks like the extent of the data supplied. Obviously if you want to use that information you would have to clean up the first and second columns, as the first contains unique entries but should be repeated entries for each group, and the ages should be numeric instead of character.
Hi, I think for this dataset you will have to create the targets file yourself. The information you need is contained in the series matrix file : the sample specific information like age for this dataset is stored in the rows starting with "!Sample_"