Problem reading Cel files

Problem reading Cel files - Oligo Package

0

Entering edit mode

Bade ▴ 310

@bade-5877

Last seen 3.4 years ago

Delaware

Hi All, I am trying to read four *.Cel files into oligo and getting this error: > celFiles <- list.celfiles() > celFiles [1] "Iris.CEL" "Liv1.CEL" "Liv2.CEL" "Liv3.CEL" > AF_data = read.celfiles(celFiles) All the CEL files must be of the same type. Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not TRUE Then I tried reading files separately (one by one) and found that one sample (Iris.CEL) shows annotation package as 'pd.huex.1.0.st.v1' while rest (Liv1,Liv2,Liv3) are 'pd.huex.1.0.st.v2'. I checked on GEO and found that though all the samples are from different studies but were generated using same chip - Human Exon 1.0 ST Arrays and the one which is giving error (Iris.cel )have 'HuEx-1_0-st-v2.r2.dt1.hg18.core.ps' mentioned under data processing description, that means it is also version2 of HuEx 1.0ST. So I explicitly mentioned annotation package 'pd.huex.1.0.st.v2' instead of the one recognized by oligo ('pd.huex.1.0.st.v1') and file is read without any problem: > celFiles <- list.celfiles() > celFiles [1] "Iris.CEL" > AF_data = read.celfiles(celFiles,pkgname='pd.huex.1.0.st.v2') Platform design info loaded. Reading in : Iris.CEL But if I add other files and try same thing, than the error is back: > celFiles <- list.celfiles() > celFiles [1] "Iris.CEL" "Liv1.CEL" "Liv2.CEL" "Liv3.CEL" > AF_data = read.celfiles(celFiles,pkgname='pd.huex.1.0.st.v2') All the CEL files must be of the same type. Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not TRUE Can anybody please tell me why annotation package for Iris.cel which is from HuEx 1.0ST v2 (from NCBI GEO description) is recognized as 'pd.huex.1.0.st.v1'? If explicitly mention package name pd.huex.1.0.st.v2 and try to read Iris.cel alone, it works. But if read with other cel files with same annotation (pd.huex.1.0.st.v2) it gives error?? NCBI GEO ID: Iris.cel - GSM1008547 Liv1/2/3 - GSM486433/GSM486434/GSM486435 Awaiting help. AK Session Info: > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] pd.huex.1.0.st.v2_3.8.0 RSQLite_0.11.4 DBI_0.2-7 oligo_1.24.2 Biobase_2.20.1 oligoClasses_1.22.0 [7] BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] affxparser_1.32.3 affyio_1.28.0 BiocInstaller_1.10.1 Biostrings_2.28.0 bit_1.1-10 codetools_0.2-8 [7] ff_2.2-11 foreach_1.4.0 GenomicRanges_1.12.4 IRanges_1.18.1 iterators_1.0.6 preprocessCore_1.22.0 [13] splines_3.0.1 stats4_3.0.1 tools_3.0.1 zlibbioc_1.6.0

Annotation oligo Annotation oligo • 4.1k views

ADD COMMENT • link updated 10.7 years ago by James W. MacDonald 65k • written 10.7 years ago by Bade ▴ 310

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 14 hours ago

United States

Hi Atul, On 8/27/2013 11:18 PM, Atul wrote: > Hi All, > > I am trying to read four *.Cel files into oligo and getting this error: > > > celFiles <- list.celfiles() > > celFiles > [1] "Iris.CEL" "Liv1.CEL" "Liv2.CEL" "Liv3.CEL" > > AF_data = read.celfiles(celFiles) > All the CEL files must be of the same type. > Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not TRUE > > Then I tried reading files separately (one by one) and found that one > sample (Iris.CEL) shows annotation package as 'pd.huex.1.0.st.v1' > while rest (Liv1,Liv2,Liv3) are 'pd.huex.1.0.st.v2'. I checked on GEO > and found that though all the samples are from different studies but > were generated using same chip - Human Exon 1.0 ST Arrays and the one > which is giving error (Iris.cel )have > 'HuEx-1_0-st-v2.r2.dt1.hg18.core.ps' mentioned under data processing > description, that means it is also version2 of HuEx 1.0ST. > > So I explicitly mentioned annotation package 'pd.huex.1.0.st.v2' > instead of the one recognized by oligo ('pd.huex.1.0.st.v1') and file > is read without any problem: > > > celFiles <- list.celfiles() > > celFiles > [1] "Iris.CEL" > > AF_data = read.celfiles(celFiles,pkgname='pd.huex.1.0.st.v2') > Platform design info loaded. > Reading in : Iris.CEL > > But if I add other files and try same thing, than the error is back: > > celFiles <- list.celfiles() > > celFiles > [1] "Iris.CEL" "Liv1.CEL" "Liv2.CEL" "Liv3.CEL" > > AF_data = read.celfiles(celFiles,pkgname='pd.huex.1.0.st.v2') > All the CEL files must be of the same type. > Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not TRUE > > > Can anybody please tell me why annotation package for Iris.cel which > is from HuEx 1.0ST v2 (from NCBI GEO description) is recognized as > 'pd.huex.1.0.st.v1'? If explicitly mention package name > pd.huex.1.0.st.v2 and try to read Iris.cel alone, it works. But if > read with other cel files with same annotation (pd.huex.1.0.st.v2) it > gives error?? The Iris.cel file is a HuEx-1_0-st-v1, according to the header in that file: > sapply(fls, oligo:::getCelChipType, useAffyio=T) GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz "HuEx-1_0-st-v1" GSM486433.CEL.gz "HuEx-1_0-st-v2" And the others you are trying to read are version 2. It doesn't really matter what GEO says, as the information on GEO come from the submitter, and they evidently made a mistake. I don't know what, if any, differences there are between the two versions. In addition, there isn't anything I can see on the Affy website that says what differences there may be. Certainly they have the same number of probes and the probe IDs are all the same. So you can combine: > fls <- dir(pattern = "CEL.gz") > dat1 <- read.celfiles(fls[1], pkgname="pd.huex.1.0.st.v2") Loading required package: pd.huex.1.0.st.v2 Loading required package: RSQLite Loading required package: DBI Platform design info loaded. Reading in : GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz > dat2 <- read.celfiles(fls[2]) ## note that you would use all three of the other celfiles for this step Platform design info loaded. Reading in : GSM486433.CEL.gz > dat <- combine(dat1, dat2) Warning messages: 1: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 1 string mismatch 2: data frame column 'exprs' levels not all.equal 3: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 1 string mismatch 4: data frame column 'dates' levels not all.equal > all.equal(featureNames(dat1), featureNames(dat2)) [1] TRUE > dat ExonFeatureSet (storageMode: lockedEnvironment) assayData: 6553600 features, 2 samples element names: exprs protocolData rowNames: GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz GSM486433.CEL.gz varLabels: exprs dates varMetadata: labelDescription channel phenoData rowNames: GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz GSM486433.CEL.gz varLabels: index varMetadata: labelDescription channel featureData: none experimentData: use 'experimentData(object)' Annotation: pd.huex.1.0.st.v2 You should note however that this isn't a recommendation on my part that you should do this. I don't know what these data are, nor what you are planning to do with them. In general combining data from two or more completely different experiments is a very tricky endeavor. Using something like fRMA (if there are frozen estimates for this chip type) might be a better way to go. Best, Jim > > NCBI GEO ID: > Iris.cel - GSM1008547 > Liv1/2/3 - GSM486433/GSM486434/GSM486435 > > Awaiting help. > > AK > > > Session Info: > > > sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 > [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C > LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] pd.huex.1.0.st.v2_3.8.0 RSQLite_0.11.4 DBI_0.2-7 oligo_1.24.2 > Biobase_2.20.1 oligoClasses_1.22.0 > [7] BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] affxparser_1.32.3 affyio_1.28.0 BiocInstaller_1.10.1 > Biostrings_2.28.0 bit_1.1-10 codetools_0.2-8 > [7] ff_2.2-11 foreach_1.4.0 GenomicRanges_1.12.4 > IRanges_1.18.1 iterators_1.0.6 preprocessCore_1.22.0 > [13] splines_3.0.1 stats4_3.0.1 tools_3.0.1 zlibbioc_1.6.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 10.7 years ago James W. MacDonald 65k

0

Entering edit mode

On Wed, Aug 28, 2013 at 7:44 AM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > Hi Atul, > > > On 8/27/2013 11:18 PM, Atul wrote: >> >> Hi All, >> >> I am trying to read four *.Cel files into oligo and getting this error: >> >> > celFiles <- list.celfiles() >> > celFiles >> [1] "Iris.CEL" "Liv1.CEL" "Liv2.CEL" "Liv3.CEL" >> > AF_data = read.celfiles(celFiles) >> All the CEL files must be of the same type. >> Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not TRUE >> >> Then I tried reading files separately (one by one) and found that one >> sample (Iris.CEL) shows annotation package as 'pd.huex.1.0.st.v1' while rest >> (Liv1,Liv2,Liv3) are 'pd.huex.1.0.st.v2'. I checked on GEO and found that >> though all the samples are from different studies but were generated using >> same chip - Human Exon 1.0 ST Arrays and the one which is giving error >> (Iris.cel )have 'HuEx-1_0-st-v2.r2.dt1.hg18.core.ps' mentioned under data >> processing description, that means it is also version2 of HuEx 1.0ST. >> >> So I explicitly mentioned annotation package 'pd.huex.1.0.st.v2' instead >> of the one recognized by oligo ('pd.huex.1.0.st.v1') and file is read >> without any problem: >> >> > celFiles <- list.celfiles() >> > celFiles >> [1] "Iris.CEL" >> > AF_data = read.celfiles(celFiles,pkgname='pd.huex.1.0.st.v2') >> Platform design info loaded. >> Reading in : Iris.CEL >> >> But if I add other files and try same thing, than the error is back: >> > celFiles <- list.celfiles() >> > celFiles >> [1] "Iris.CEL" "Liv1.CEL" "Liv2.CEL" "Liv3.CEL" >> > AF_data = read.celfiles(celFiles,pkgname='pd.huex.1.0.st.v2') >> All the CEL files must be of the same type. >> Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not TRUE >> >> >> Can anybody please tell me why annotation package for Iris.cel which is >> from HuEx 1.0ST v2 (from NCBI GEO description) is recognized as >> 'pd.huex.1.0.st.v1'? If explicitly mention package name pd.huex.1.0.st.v2 >> and try to read Iris.cel alone, it works. But if read with other cel files >> with same annotation (pd.huex.1.0.st.v2) it gives error?? > > > The Iris.cel file is a HuEx-1_0-st-v1, according to the header in that file: > >> sapply(fls, oligo:::getCelChipType, useAffyio=T) > GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz > "HuEx-1_0-st-v1" > GSM486433.CEL.gz > "HuEx-1_0-st-v2" > > And the others you are trying to read are version 2. It doesn't really > matter what GEO says, as the information on GEO come from the submitter, and > they evidently made a mistake. > > I don't know what, if any, differences there are between the two versions. > In addition, there isn't anything I can see on the Affy website that says > what differences there may be. Certainly they have the same number of probes > and the probe IDs are all the same. I have some old notes on this at http://aroma-project.org/chipTypes/HuEx-1_0-st-v2; "Note II: Older CEL files for this chip type, may be reported to have chip type 'HuEx-1_0-st-v1'. This chip is slightly different from the 'HuEx-1_0-st-v2' chip. According to Affymetrix support, the difference is only in the control probes; "There is only a minor difference between the v1 and the v2 library files and it has to do with the manufacturing controls on the array. There is no difference with the probes interrogating the exons between v1 and v2.", cf. Thread 'Discussion on affymetrix-defined-transcript-clusters' (Nov 25-Dec 2, 2008). We don't have details on the exact differences and we don't have access to the HuEx-1_0-st.v1.CDF (please fwd if you have it), but from Affymetrix' feedback it sounds like one could use the new HuEx-1_0-st-v2.CDF. " I guess one could compare the probe sequences for the two to ultimately find out how they differ. /Henrik > So you can combine: > >> fls <- dir(pattern = "CEL.gz") >> dat1 <- read.celfiles(fls[1], pkgname="pd.huex.1.0.st.v2") > Loading required package: pd.huex.1.0.st.v2 > Loading required package: RSQLite > Loading required package: DBI > Platform design info loaded. > Reading in : GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz >> dat2 <- read.celfiles(fls[2]) ## note that you would use all three of the >> other celfiles for this step > Platform design info loaded. > Reading in : GSM486433.CEL.gz >> dat <- combine(dat1, dat2) > Warning messages: > 1: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 1 string mismatch > 2: data frame column 'exprs' levels not all.equal > 3: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 1 string mismatch > 4: data frame column 'dates' levels not all.equal >> all.equal(featureNames(dat1), featureNames(dat2)) > [1] TRUE >> dat > ExonFeatureSet (storageMode: lockedEnvironment) > assayData: 6553600 features, 2 samples > element names: exprs > protocolData > rowNames: GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz > GSM486433.CEL.gz > varLabels: exprs dates > varMetadata: labelDescription channel > phenoData > rowNames: GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz > GSM486433.CEL.gz > varLabels: index > varMetadata: labelDescription channel > featureData: none > experimentData: use 'experimentData(object)' > Annotation: pd.huex.1.0.st.v2 > > You should note however that this isn't a recommendation on my part that you > should do this. I don't know what these data are, nor what you are planning > to do with them. In general combining data from two or more completely > different experiments is a very tricky endeavor. Using something like fRMA > (if there are frozen estimates for this chip type) might be a better way to > go. > > Best, > > Jim > > > >> >> NCBI GEO ID: >> Iris.cel - GSM1008547 >> Liv1/2/3 - GSM486433/GSM486434/GSM486435 >> >> Awaiting help. >> >> AK >> >> >> Session Info: >> >> > sessionInfo() >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 >> LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 >> [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C >> LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> base >> >> other attached packages: >> [1] pd.huex.1.0.st.v2_3.8.0 RSQLite_0.11.4 DBI_0.2-7 oligo_1.24.2 >> Biobase_2.20.1 oligoClasses_1.22.0 >> [7] BiocGenerics_0.6.0 >> >> loaded via a namespace (and not attached): >> [1] affxparser_1.32.3 affyio_1.28.0 BiocInstaller_1.10.1 >> Biostrings_2.28.0 bit_1.1-10 codetools_0.2-8 >> [7] ff_2.2-11 foreach_1.4.0 GenomicRanges_1.12.4 >> IRanges_1.18.1 iterators_1.0.6 preprocessCore_1.22.0 >> [13] splines_3.0.1 stats4_3.0.1 tools_3.0.1 zlibbioc_1.6.0 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 10.7 years ago Henrik Bengtsson ★ 2.4k

0

Entering edit mode

Hi Jim, sure. /Henrik On Wed, Aug 28, 2013 at 8:13 AM, Henrik Bengtsson <hb at="" biostat.ucsf.edu=""> wrote: > On Wed, Aug 28, 2013 at 7:44 AM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: >> Hi Atul, >> >> >> On 8/27/2013 11:18 PM, Atul wrote: >>> >>> Hi All, >>> >>> I am trying to read four *.Cel files into oligo and getting this error: >>> >>> > celFiles <- list.celfiles() >>> > celFiles >>> [1] "Iris.CEL" "Liv1.CEL" "Liv2.CEL" "Liv3.CEL" >>> > AF_data = read.celfiles(celFiles) >>> All the CEL files must be of the same type. >>> Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not TRUE >>> >>> Then I tried reading files separately (one by one) and found that one >>> sample (Iris.CEL) shows annotation package as 'pd.huex.1.0.st.v1' while rest >>> (Liv1,Liv2,Liv3) are 'pd.huex.1.0.st.v2'. I checked on GEO and found that >>> though all the samples are from different studies but were generated using >>> same chip - Human Exon 1.0 ST Arrays and the one which is giving error >>> (Iris.cel )have 'HuEx-1_0-st-v2.r2.dt1.hg18.core.ps' mentioned under data >>> processing description, that means it is also version2 of HuEx 1.0ST. >>> >>> So I explicitly mentioned annotation package 'pd.huex.1.0.st.v2' instead >>> of the one recognized by oligo ('pd.huex.1.0.st.v1') and file is read >>> without any problem: >>> >>> > celFiles <- list.celfiles() >>> > celFiles >>> [1] "Iris.CEL" >>> > AF_data = read.celfiles(celFiles,pkgname='pd.huex.1.0.st.v2') >>> Platform design info loaded. >>> Reading in : Iris.CEL >>> >>> But if I add other files and try same thing, than the error is back: >>> > celFiles <- list.celfiles() >>> > celFiles >>> [1] "Iris.CEL" "Liv1.CEL" "Liv2.CEL" "Liv3.CEL" >>> > AF_data = read.celfiles(celFiles,pkgname='pd.huex.1.0.st.v2') >>> All the CEL files must be of the same type. >>> Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not TRUE >>> >>> >>> Can anybody please tell me why annotation package for Iris.cel which is >>> from HuEx 1.0ST v2 (from NCBI GEO description) is recognized as >>> 'pd.huex.1.0.st.v1'? If explicitly mention package name pd.huex.1.0.st.v2 >>> and try to read Iris.cel alone, it works. But if read with other cel files >>> with same annotation (pd.huex.1.0.st.v2) it gives error?? >> >> >> The Iris.cel file is a HuEx-1_0-st-v1, according to the header in that file: >> >>> sapply(fls, oligo:::getCelChipType, useAffyio=T) >> GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz >> "HuEx-1_0-st-v1" >> GSM486433.CEL.gz >> "HuEx-1_0-st-v2" >> >> And the others you are trying to read are version 2. It doesn't really >> matter what GEO says, as the information on GEO come from the submitter, and >> they evidently made a mistake. >> >> I don't know what, if any, differences there are between the two versions. >> In addition, there isn't anything I can see on the Affy website that says >> what differences there may be. Certainly they have the same number of probes >> and the probe IDs are all the same. > > I have some old notes on this at > http://aroma-project.org/chipTypes/HuEx-1_0-st-v2; > > "Note II: Older CEL files for this chip type, may be reported to have > chip type 'HuEx-1_0-st-v1'. This chip is slightly different from the > 'HuEx-1_0-st-v2' chip. According to Affymetrix support, the > difference is only in the control probes; "There is only a minor > difference between the v1 and the v2 library files and it has to do > with the manufacturing controls on the array. There is no difference > with the probes interrogating the exons between v1 and v2.", cf. > Thread 'Discussion on affymetrix-defined-transcript-clusters' (Nov > 25-Dec 2, 2008). We don't have details on the exact differences and > we don't have access to the HuEx-1_0-st.v1.CDF (please fwd if you have > it), but from Affymetrix' feedback it sounds like one could use the > new HuEx-1_0-st-v2.CDF. " > > I guess one could compare the probe sequences for the two to > ultimately find out how they differ. > > /Henrik > >> So you can combine: >> >>> fls <- dir(pattern = "CEL.gz") >>> dat1 <- read.celfiles(fls[1], pkgname="pd.huex.1.0.st.v2") >> Loading required package: pd.huex.1.0.st.v2 >> Loading required package: RSQLite >> Loading required package: DBI >> Platform design info loaded. >> Reading in : GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz >>> dat2 <- read.celfiles(fls[2]) ## note that you would use all three of the >>> other celfiles for this step >> Platform design info loaded. >> Reading in : GSM486433.CEL.gz >>> dat <- combine(dat1, dat2) >> Warning messages: >> 1: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 1 string mismatch >> 2: data frame column 'exprs' levels not all.equal >> 3: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 1 string mismatch >> 4: data frame column 'dates' levels not all.equal >>> all.equal(featureNames(dat1), featureNames(dat2)) >> [1] TRUE >>> dat >> ExonFeatureSet (storageMode: lockedEnvironment) >> assayData: 6553600 features, 2 samples >> element names: exprs >> protocolData >> rowNames: GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz >> GSM486433.CEL.gz >> varLabels: exprs dates >> varMetadata: labelDescription channel >> phenoData >> rowNames: GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz >> GSM486433.CEL.gz >> varLabels: index >> varMetadata: labelDescription channel >> featureData: none >> experimentData: use 'experimentData(object)' >> Annotation: pd.huex.1.0.st.v2 >> >> You should note however that this isn't a recommendation on my part that you >> should do this. I don't know what these data are, nor what you are planning >> to do with them. In general combining data from two or more completely >> different experiments is a very tricky endeavor. Using something like fRMA >> (if there are frozen estimates for this chip type) might be a better way to >> go. >> >> Best, >> >> Jim >> >> >> >>> >>> NCBI GEO ID: >>> Iris.cel - GSM1008547 >>> Liv1/2/3 - GSM486433/GSM486434/GSM486435 >>> >>> Awaiting help. >>> >>> AK >>> >>> >>> Session Info: >>> >>> > sessionInfo() >>> R version 3.0.1 (2013-05-16) >>> Platform: x86_64-pc-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 >>> LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 >>> [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C >>> LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> base >>> >>> other attached packages: >>> [1] pd.huex.1.0.st.v2_3.8.0 RSQLite_0.11.4 DBI_0.2-7 oligo_1.24.2 >>> Biobase_2.20.1 oligoClasses_1.22.0 >>> [7] BiocGenerics_0.6.0 >>> >>> loaded via a namespace (and not attached): >>> [1] affxparser_1.32.3 affyio_1.28.0 BiocInstaller_1.10.1 >>> Biostrings_2.28.0 bit_1.1-10 codetools_0.2-8 >>> [7] ff_2.2-11 foreach_1.4.0 GenomicRanges_1.12.4 >>> IRanges_1.18.1 iterators_1.0.6 preprocessCore_1.22.0 >>> [13] splines_3.0.1 stats4_3.0.1 tools_3.0.1 zlibbioc_1.6.0 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 10.7 years ago Henrik Bengtsson ★ 2.4k

0

Entering edit mode

Hi James, Many thanks for suggestion. It worked perfectly. Best AK On 08/28/2013 10:44 AM, James W. MacDonald wrote: > Hi Atul, > > On 8/27/2013 11:18 PM, Atul wrote: >> Hi All, >> >> I am trying to read four *.Cel files into oligo and getting this error: >> >> > celFiles <- list.celfiles() >> > celFiles >> [1] "Iris.CEL" "Liv1.CEL" "Liv2.CEL" "Liv3.CEL" >> > AF_data = read.celfiles(celFiles) >> All the CEL files must be of the same type. >> Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not >> TRUE >> >> Then I tried reading files separately (one by one) and found that one >> sample (Iris.CEL) shows annotation package as 'pd.huex.1.0.st.v1' >> while rest (Liv1,Liv2,Liv3) are 'pd.huex.1.0.st.v2'. I checked on GEO >> and found that though all the samples are from different studies but >> were generated using same chip - Human Exon 1.0 ST Arrays and the one >> which is giving error (Iris.cel )have >> 'HuEx-1_0-st-v2.r2.dt1.hg18.core.ps' mentioned under data processing >> description, that means it is also version2 of HuEx 1.0ST. >> >> So I explicitly mentioned annotation package 'pd.huex.1.0.st.v2' >> instead of the one recognized by oligo ('pd.huex.1.0.st.v1') and file >> is read without any problem: >> >> > celFiles <- list.celfiles() >> > celFiles >> [1] "Iris.CEL" >> > AF_data = read.celfiles(celFiles,pkgname='pd.huex.1.0.st.v2') >> Platform design info loaded. >> Reading in : Iris.CEL >> >> But if I add other files and try same thing, than the error is back: >> > celFiles <- list.celfiles() >> > celFiles >> [1] "Iris.CEL" "Liv1.CEL" "Liv2.CEL" "Liv3.CEL" >> > AF_data = read.celfiles(celFiles,pkgname='pd.huex.1.0.st.v2') >> All the CEL files must be of the same type. >> Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not >> TRUE >> >> >> Can anybody please tell me why annotation package for Iris.cel which >> is from HuEx 1.0ST v2 (from NCBI GEO description) is recognized as >> 'pd.huex.1.0.st.v1'? If explicitly mention package name >> pd.huex.1.0.st.v2 and try to read Iris.cel alone, it works. But if >> read with other cel files with same annotation (pd.huex.1.0.st.v2) it >> gives error?? > > The Iris.cel file is a HuEx-1_0-st-v1, according to the header in that > file: > > > sapply(fls, oligo:::getCelChipType, useAffyio=T) > GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz > "HuEx-1_0-st-v1" > GSM486433.CEL.gz > "HuEx-1_0-st-v2" > > And the others you are trying to read are version 2. It doesn't really > matter what GEO says, as the information on GEO come from the > submitter, and they evidently made a mistake. > > I don't know what, if any, differences there are between the two > versions. In addition, there isn't anything I can see on the Affy > website that says what differences there may be. Certainly they have > the same number of probes and the probe IDs are all the same. So you > can combine: > > > fls <- dir(pattern = "CEL.gz") > > dat1 <- read.celfiles(fls[1], pkgname="pd.huex.1.0.st.v2") > Loading required package: pd.huex.1.0.st.v2 > Loading required package: RSQLite > Loading required package: DBI > Platform design info loaded. > Reading in : GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz > > dat2 <- read.celfiles(fls[2]) ## note that you would use all three > of the other celfiles for this step > Platform design info loaded. > Reading in : GSM486433.CEL.gz > > dat <- combine(dat1, dat2) > Warning messages: > 1: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 1 string mismatch > 2: data frame column 'exprs' levels not all.equal > 3: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 1 string mismatch > 4: data frame column 'dates' levels not all.equal > > all.equal(featureNames(dat1), featureNames(dat2)) > [1] TRUE > > dat > ExonFeatureSet (storageMode: lockedEnvironment) > assayData: 6553600 features, 2 samples > element names: exprs > protocolData > rowNames: GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz > GSM486433.CEL.gz > varLabels: exprs dates > varMetadata: labelDescription channel > phenoData > rowNames: GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz > GSM486433.CEL.gz > varLabels: index > varMetadata: labelDescription channel > featureData: none > experimentData: use 'experimentData(object)' > Annotation: pd.huex.1.0.st.v2 > > You should note however that this isn't a recommendation on my part > that you should do this. I don't know what these data are, nor what > you are planning to do with them. In general combining data from two > or more completely different experiments is a very tricky endeavor. > Using something like fRMA (if there are frozen estimates for this chip > type) might be a better way to go. > > Best, > > Jim > > >> >> NCBI GEO ID: >> Iris.cel - GSM1008547 >> Liv1/2/3 - GSM486433/GSM486434/GSM486435 >> >> Awaiting help. >> >> AK >> >> >> Session Info: >> >> > sessionInfo() >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> LC_MONETARY=en_US.UTF-8 >> [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C >> LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets >> methods base >> >> other attached packages: >> [1] pd.huex.1.0.st.v2_3.8.0 RSQLite_0.11.4 DBI_0.2-7 oligo_1.24.2 >> Biobase_2.20.1 oligoClasses_1.22.0 >> [7] BiocGenerics_0.6.0 >> >> loaded via a namespace (and not attached): >> [1] affxparser_1.32.3 affyio_1.28.0 BiocInstaller_1.10.1 >> Biostrings_2.28.0 bit_1.1-10 codetools_0.2-8 >> [7] ff_2.2-11 foreach_1.4.0 GenomicRanges_1.12.4 >> IRanges_1.18.1 iterators_1.0.6 preprocessCore_1.22.0 >> [13] splines_3.0.1 stats4_3.0.1 tools_3.0.1 zlibbioc_1.6.0 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 10.7 years ago Bade ▴ 310

Login before adding your answer.