Entering edit mode
See http://www.bioconductor.org/docs/postingGuide.html.
Note that attachments are not permitted.
On Thu, 30 Oct 2008, Ming YI [Contr] wrote:
> Dear Gordon:
>
> Thanks a lot for your comments and suggestions. I already
successfully read
> all the data into limma objects based on your suggestion using the
generic
> method by using the attached target file I edited from their
annotation file
> as I sent to you earlier. I did assume that the Cy3 channel is the
common
> reference as you guessed.
>
> But the issue remained as you mentioned how actually they did the
experiment.
> Based on their E-NCMF-8.idf.txt file from arrayExpress, it appears
to be
> dye_swap_design, which is exactly what you guessed. So the data
appears to be
> collated by ArrayExpress into data matrices with the Cy3 and Cy5
intensities
> in the same file for each sample. But the concern is in the column
of "Label"
> in the file E-NCMF-8_sdrf.txt I sent to you in last email, what
does those
> Cy3 and Cy5 mean for each sample, it looks like this column may tell
for each
> sample (and corresponding raw data file), what is dye for the sample
and the
> other dye would be used for the common reference, which was not
mentioned in
> their annotation file. What do you think? if this is true, I may
need to
> change my target file coordinately to accommodate this information.
This
> assumption makes more sense at least to explain the repeated samples
in the
> dataset, which should be the dye-swapping data.
>
> I tried to contact with them for details of the experiment design,
that
> should help to sort this out.
>
> By the way, I am not sure why my post not go to the mailing list. I
changed a
> bit the address this time, hope it works.
>
> Thanks again for your help. Any additional suggestion would be
appreciated as
> well.
>
> Best regards,
>
> Ming
>
>
> At 09:25 PM 10/29/2008, Gordon K Smyth wrote:
>> Dear Ming,
>>
>> Thank you for mailing me example data sets and the annotation
spreadsheet
>> from ArrayExpress.
>>
>> You are assuming that the data from ArrayExpress are in ImaGene
format.
>> This is incorrect. The reason that limma gives a special treatment
to
>> ImaGene files is that, unlike other image analysis software,
ImaGene writes
>> the Cy3 and Cy5 channels into separate files. However ArrayExpress
has
>> collated the original data into data matrices with the Cy3 and Cy5
>> intensities in the same file for each sample. Therefore you should
ignore
>> all references to ImaGene in the limma manual, and instead use the
>> instructions for generic two-color platforms.
>>
>> The data sets you sent me can easily be read into limma using the
>> instructions in the limma User's Guide starting page 14 "What
should you do
>> if your image analysis program is not in the above list?" I
demonstrate
>> this below.
>>
>> Your emails suggest that you have not yet read any two-color data
into
>> limma. It is essential that you try some simple examples before
trying a
>> large dataset from ArrayExpress, which will have a complex
structure you
>> might not fully understand.
>>
>> I don't fully understand the sample annotation file from
ArrayExpress that
>> you sent me, but I doubt that you are interpretting it correctly.
It is
>> not in the format you need for a limma targets file. My guess is
that each
>> row of the file corresponds to one array, and that each array has
been
>> hybridized with a common reference that is not mentioned in the
annotation
>> file. This means that the repeated sample names you have noted do
not
>> represent matched Cy3 and Cy5 channels, but rather represent dye-
swap
>> technical replicates. That is, they are separate arrays.
>>
>> If my guess is correct, then a targets file would be something like
below.
>>
>> Let me emphasize that I do not offer a plug-in service to read
experimental
>> data posted to ArrayExpress. It is your responsibility to figure
out the
>> experimental design and the ArrayExpression data formats. I am just
>> guessing.
>>
>> Best wishes
>> Gordon
>>
>>
>> READING YOUR DATA FILES
>>
>>> f
>> [1] "E-NCMF-8-raw-data-1363346838.txt" "E-NCMF-8-raw-
data-1363346856.txt"
>>
>>> ann <- c("Database NCMF:DB:omadhuman","Database
>> ebi.ac.uk:Database:ens_trscrpt_id","Feature coordinates:
>> metaColumn","metaRow","column","row","Reporter
identifier","Reporter
>> sequence type")
>>
>>> columns <- list(Rf="ImaGene:Signal
Mean_Cy5",Rb="ImaGene:Background
>> Median_Cy5",Gf="ImaGene:Signal Mean_Cy3",Gb="ImaGene:Background
>> Median_Cy3")
>>
>>> RG <- read.maimages(files=f,annotation=ann,columns=columns)
>> Read E-NCMF-8-raw-data-1363346838.txt
>> Read E-NCMF-8-raw-data-1363346856.txt
>>
>>> dim(RG)
>> [1] 37632 2
>>
>>
>> A POSSIBLE TARGETS FILE
>>
>>> targets <- readTargets()
>>> targets
>> Source DiseaseState
ArrayDataMatrixFile
>> Cy3 Cy5
>> 1 3560 Squamous Cell Carcinoma
>> E-NCMF-8-raw-data-1363346838.txt Reference SCC3560
>> 2 reference pool of 61 HNSCC Squamous Cell Carcinoma
>> E-NCMF-8-raw-data-1363346856.txt Reference PoolHNSCC
>>
>>
>> On Wed, 29 Oct 2008, Ming YI [Contr] wrote:
>>
>>> Hi, Dear Gordon:
>>>
>>> I tried to use limma to deal with ImaGene dataset I downloaded
from
>>> ArrayExpress. I never deal with ImaGene data before and not
familiar with
>>> ImaGene data format except knowing that the Cy5 and Cy3 signals
are stored
>>> in two separate files for the same sample. I tried to read the
data into
>>> limma and normalize them in the context of limma. and I keep
running into
>>> issues and errors. and I wish you can help me with this regard:
>>>
>>> I did attach a file (E-NCMF-8_sdrf.txt) that was download from
>>> ArrayExpress can be potentially used for making the target file,
and also
>>> I attached two raw data files of the ImaGene dataset as examples.
The
>>> thing bothering me is as followed:
>>>
>>> Extract 3538 and Extract 3526 (see column "Extract Name" of
>>> E-NCMF-8_sdrf.txt file) , they do have one Cy5 and one matched Cy3
files,
>>> so that's fine with me. but in particular, for "Extract reference
pool of
>>> 61 HNSCC" (see E-NCMF-8_sdrf.txt file), there are multiple Cy3 and
Cy5 for
>>> such samples, how should we incorporate that into the target file?
>>>
>>> I intended to use the following code to deal with this ImaGene
data
>>>
>>> targets<-readTargets()
>>> files<-targets[,c("FileNameCy3", "FileNameCy5")'
>>> RG<-read.maimages(files, source="imagene")
>>>
>>> but I need the right target file to start with particularly with
the issue
>>> I mentioned above.
>>>
>>> Also for normalization, the
>>> RG<-backgroundCorrect(RG, method="normexp", offset=50) still
appropiate
>>> for ImaGene data?
>>>
>>> Thanks so much for your help!
>>>
>>> Ming Yi
>>> ABCC
>>> P.O.Box B, Bldg 430
>>> National Cancer Institute/SAIC-Frederick, Inc
>>> Frederick,Maryland
>>> USA
>