Question

How to process Codelink UniSet Rat I Bioarray data

0

Entering edit mode

Agaz Hussain Wani ▴ 260

@agaz-hussain-wani-7620

Last seen 6.0 years ago

India

I was trying to process CodeLink UniSet Rat I Bioarray data from GSE59913. Previously I have processed UniSet Human 20K I Bioarray using Codelink R package by the following code.

 codset = readCodelinkSet(filename = codelinkpath[[i]])

 features <- readCodelink(codelinkpath[[i]])  ## An addition to get the ids

  #generate ids
ids <- features$name


codset = codCorrect(codset, method = "half", offset = 0)
codset = codNormalize(codset, method = "loess", weights = getWeight(codset),
                            loess.method = "fast")
exprs <- exprs(codset)

snr  <- getSNR(codset)

But the data from above mentioned data can't be processed by the shown code. I get the error saying

Warning: Error in .readCodelinkRaw: File /tmp/Rtmprqyiqc/6f094408eaf2e239e07be5ae/0.txt does not contain column Spot_mean (choose other type instead)

Which is due to the reason that the data does not have Spot_mean information present. Is it possible to generate ids, expression values and SNR values for such type of data.

EDIT:

Few lines shown from sample GSM1453524 available with [GSE59913](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE59913)

ICONIX Format Report for Slide (T00126807)
LAYOUT    EXP5280X2-584.22.ID
PROJECT    UNISET-RAT
EXPERIMENT    UNISET-RAT
Report( 1 ): Expression Report1
--------------------------------------------------------------------------------
#    Array No.    Probe Name    Probe Grid Row    Probe Grid Col    Probe Droplet Cx    Probe Droplet Cy    Probe Droplet Radius    Probe Value    Array Threshold    Array Sample Name    Spot QC
1    1    YJEK_PROBE3    173    38    1269.96354166666    4264.984375    16    0.0000    13.9830837630944    16274    2                                                  
2    1    LEUC_PROBE1    61    23    828    1566    16    0.0000    13.9830837630944    16274    2                                                  
3    1    U49235_PROBE1    106    34    1133.5    2647.5    17    0.0602    13.9830837630944    16274    2                                                  
4    1    ARAA_PROBE2    191    23    817.96875    4695.97395833333    16    0.1146    13.9830837630944    16274    2                                                  
5    1    RAFB_PROBE4    166    8    352    4095    18    0.4357    13.9830837630944    16274    2                                                  
6    1    RAFD_PROBE2    90    23    806.5    2261.5    19    0.6377    13.9830837630944    16274    2                                                  
7    1    FIXA_PROBE1    137    20    729.5    3399.5    19    0.6486    13.9830837630944    16274

I tried with type = "Raw"

codset = readCodelinkSet(filename = codelinkpath[[1]],
                         file.path("/media/hussain/A006D86A06D84348/data/GSE59907_RAW"),
                         type = "Raw")

Error in .readCodelinkRaw(files = filename, ...) :
  File /media/hussain/A006D86A06D84348/data/GSE59907_RAW/GSM1453331_T00141653_2002-10-01.txt does not contain column Raw_intensity (choose other type instead)

And with type = "Norm"

Error in .readCodelinkRaw(files = filename, ...) :
  File /media/hussain/A006D86A06D84348/data/GSE59907_RAW/GSM1453331_T00141653_2002-10-01.txt does not contain column Normalized_intensity (choose other type instead)

codelink • 1.4k views

ADD COMMENT • link updated 6.1 years ago by Diego Diez ▴ 760 • written 6.1 years ago by Agaz Hussain Wani ▴ 260

0

Entering edit mode

In addition to considering what James points out in his response, it is not clear why do you need to call readCodelinkSet() and readCodelink() in your code. Is this a publicly accessible dataset (e.g. GEO) for which you could share with us the URL to check it?

ADD REPLY • link 6.1 years ago Diego Diez ▴ 760

0

Entering edit mode

I use readCodelink() to get feature names. I have updated my question, please check the edit.

ADD REPLY • link 6.1 years ago Agaz Hussain Wani ▴ 260

score 1 · Answer 1 · 2018-03-21

1

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 22 hours ago

United States

Is there something particularly confusing about the warning message? It says

Warning: Error in .readCodelinkRaw: File /tmp/Rtmprqyiqc/6f094408eaf2e239e07be5ae/0.txt does not contain column Spot_mean (choose other type instead)

Did you look to see what column names there are? Did you try choosing a different column?

ADD COMMENT • link 6.1 years ago James W. MacDonald 65k

0

Entering edit mode

Confusing thing is that Spot_mean is missing and I think Codelink package needs Spot_mean, raw or normalized data to generate expression values and SNR, though not sure. I check with other types also such as "Raw" and "Norm", but no benefit. I have shown some lines of the sample above.

ADD REPLY • link 6.1 years ago Agaz Hussain Wani ▴ 260

score 1 · Answer 2 · 2018-03-22

The file format of the expression data is not in Codelink format. In particular, none of the columns expected to contain signal information are there. This may seem confusing but there is a distinction between the microarray platform (Codelink) and the software used to process the microarrays (?). This is more clear when looking at the header of the mentioned files:

ICONIX Format Report

This suggest the ICONIX software, which I am unfamiliar with, was used. Whereas for files formatted for use with the codelink package they usually have something along the lines:

CodeLink Expression Analysis 4.1.0.29054

What options do you have? I would use the GEOquery package. For example:

library(GEOquery)

x <- getGEO("GSE59913")
x

$`GSE59913-GPL5424_series_matrix.txt.gz`
ExpressionSet (storageMode: lockedEnvironment)
assayData: 8565 features, 82 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM1453524 GSM1453525 ... GSM1453605 (82 total)
  varLabels: title geo_accession ... vehicle:ch1 (50 total)
  varMetadata: labelDescription
featureData
  featureNames: AA799301_PROBE1 AA799313_PROBE1 ... Z96106_PROBE1 (8565 total)
  fvarLabels: ID GB_ACC ... SPOT_ID (8 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL5424 

$`GSE59913-GPL5425_series_matrix.txt.gz`
ExpressionSet (storageMode: lockedEnvironment)
assayData: 8565 features, 1049 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM1453606 GSM1453607 ... GSM1454654 (1049 total)
  varLabels: title geo_accession ... vehicle:ch1 (50 total)
  varMetadata: labelDescription
featureData
  featureNames: AA799301_PROBE1 AA799313_PROBE1 ... Z96106_PROBE1 (8565 total)
  fvarLabels: ID GB_ACC ... SPOT_ID (8 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL5425 

$`GSE59913-GPL5426_series_matrix.txt.gz`
ExpressionSet (storageMode: lockedEnvironment)
assayData: 8565 features, 1731 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM1454655 GSM1454656 ... GSM1456385 (1731 total)
  varLabels: title geo_accession ... vehicle:ch1 (50 total)
  varMetadata: labelDescription
featureData
  featureNames: AA799301_PROBE1 AA799313_PROBE1 ... Z96106_PROBE1 (8565 total)
  fvarLabels: ID GB_ACC ... SPOT_ID (8 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL5426

From this you obtain a list with three different ExpressionSet objects due to the fact that the GEO dataset has associated three different microarray layouts, as you can see in the GEO page. You have to figure out whether you can combine them together (depending on whether the number of probes, identity and order of them are equivalent). Then maybe perform normalization and if necessary log2 transformation. At least, most of the time, this is what I do nowadays for microarray data. I do not bother downloading the RAW data anymore unless there is some compelling reason to do so.

Check the ExpressionSet class documentation (?ExpressionSet). You can use the information about probes returned by featureData() to combine the datasets. For example:

head(pData(featureData(x[[1]])))
                             ID       GB_ACC GENE_SYMBOL                                                       GENE_TITLE X_COORDINATE
AA799301_PROBE1 AA799301_PROBE1     AA799301   LOC498225                                                   ligatin (DBSS)           20
AA799313_PROBE1 AA799313_PROBE1     AA799313      Siat10               ST3 beta-galactoside alpha-2,3-sialyltransferase 6           29
AA799329_PROBE1 AA799329_PROBE1     AA799329     Col10a1                       5'-nucleotidase domain containing 1 (DBSS)           12
AA799331_PROBE1 AA799331_PROBE1 NM_001007634        Pelo                                                   pelota homolog           35
AA799358_PROBE1 AA799358_PROBE1     BE110170       Cip98                                   CASK-interacting protein CIP98           18
AA799400_PROBE1 AA799400_PROBE1     BC079206     B3galt3 UDP-Gal:betaGlcNAc beta 1,3-galactosyltransferase, polypeptide 3           41
                Y_COORDINATE      TYPE SPOT_ID
AA799301_PROBE1          131 DISCOVERY        
AA799313_PROBE1           22 DISCOVERY        
AA799329_PROBE1           31 DISCOVERY        
AA799331_PROBE1           22 DISCOVERY        
AA799358_PROBE1           31 DISCOVERY        
AA799400_PROBE1           22 DISCOVERY