Dear All members,
I need to analyze a GEO database dataset. The data was generated with
the platform
GPL1528<http: www.ncbi.nlm.nih.gov="" geo="" query="" acc.cgi?acc="GPL1528">:
NCI/ATC Hs-OperonV2. I should use hgu133plus2.db if the data was
generated by Affymetrix platform.
Can somebody advise me what R annotation package I should use to solve
my problem in this case?
Many Thanks
Jing
[[alternative HTML version deleted]]
Oops, pasted the wrong link before. You want this one:
http://www.bioconductor.org/packages/2.8/bioc/vignettes/AnnotationDbi/
inst/doc/SQLForge.pdf
Marc
On 08/22/2011 04:55 PM, Marc Carlson wrote:
> Hi Jing,
>
> If you need a chip package that is not presently hosted, you can 1)
> retrieve the probe to gene mappings from the people who made the
> platform and then 2) follow the instructions in this vignette to
> generate a custom package:
>
> http://www.bioconductor.org/packages/2.8/bioc/vignettes/AnnotationDb
i/inst/doc/SQLForge.R
>
>
>
> Marc
>
>
> On 08/22/2011 04:07 PM, Sean Davis wrote:
>> Hi, Jing.
>>
>> You could try:
>>
>> http://bioconductor.org/packages/release/data/annotation/html/Opero
nHumanV3.db.html
>>
>>
>> Note that this might not be right, but the Operon set was in common
>> use a few years ago.
>>
>> If this isn't what you need, you know that GEOquery automatically
>> grabs the annotation data from NCBI GEO? For example using a GSE
from
>> GPL1528, see below. You can use the AnnotationDbi package to make
>> your own annotation packages based on these annotations. In
>> particular, for GPL1528, the Unigene IDs are included.
>>
>> Hope that helps.
>>
>> Sean
>>
>>
>>
>>> library(GEOquery)
>> Loading required package: Biobase
>>
>> Welcome to Bioconductor
>>
>> Vignettes contain introductory material. To view, type
>> 'browseVignettes()'. To cite Bioconductor, see
>> 'citation("Biobase")' and for packages 'citation("pkgname")'.
>>
>> Setting options('download.file.method.GEOquery'='curl')
>>> gse = getGEO("GSE2020")
>> Found 1 file(s)
>> GSE2020_series_matrix.txt.gz
>> trying URL
>> 'ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE2020/GSE2020_s
eries_matrix.txt.gz'
>> ftp data connection made, file length 518963 bytes
>> opened URL
>> ==================================================
>> downloaded 506 Kb
>>
>> File stored at:
>> /tmp/Rtmpdgx7wJ/GPL1528.soft
>>
>>> gse
>> $GSE2020_series_matrix.txt.gz
>> ExpressionSet (storageMode: lockedEnvironment)
>> assayData: 21794 features, 10 samples
>> element names: exprs
>> protocolData: none
>> phenoData
>> sampleNames: GSM36482 GSM36483 ... GSM36491 (10 total)
>> varLabels: title geo_accession ... data_row_count (31 total)
>> varMetadata: labelDescription
>> featureData
>> featureNames: 1140849_1 1140850_1 ... 1298880_1 (21794 total)
>> fvarLabels: ID MADB_WELL_ID ... SPOT_ID (8 total)
>> fvarMetadata: Column Description labelDescription
>> experimentData: use 'experimentData(object)'
>> Annotation: GPL1528
>>
>>> head(fData(gse[[1]]))
>> ID MADB_WELL_ID OLIGO_ID GENE UNIGENE
>> 1140849_1 1140849_1 1140849 SptRpt-2a1
>> 1140850_1 1140850_1 1140850 SptRpt-2a2
>> 1140851_1 1140851_1 1140851 SptRpt-2a3
>> 1140852_1 1140852_1 1140852 SptRpt-2a4
>> 1140853_1 1140853_1 1140853 SptRpt-2a5
>> 1140854_1 1140854_1 1140854 SptRpt-2a6
>>
>> DESCRIPTION
>> 1140849_1 Human Beta-Actin PCR Product
>> Human Beta-Actin 100ng/ul
>> 1140850_1 PCR Product 1 (Cab) A. thaliana photosystem 1
>> chlorophyll a/b-binding protein
>> 1140851_1 PCR Product 5 (LTP6) A. thaliana
>> lipid transfer protien 6
>> 1140852_1
>> 3XSSC
>> 1140853_1 Oligonucleotide 1 (Cab) A. thaliana photosystem 1
>> chlorophyll a/b-binding protein
>> 1140854_1 Oligonucleotide 5 (LTP6) A. thaliana
>> lipid transfer protien 6
>> GB_LIST
>> 1140849_1
>> 1140850_1
>> 1140851_1
>> 1140852_1
>> 1140853_1
>> 1140854_1
>>
>> SPOT_ID
>> 1140849_1 Human Beta-Actin PCR Product
>> Human Beta-Actin 100ng/ul
>> 1140850_1 PCR Product 1 (Cab) A. thaliana photosystem 1
>> chlorophyll a/b-binding protein
>> 1140851_1 PCR Product 5 (LTP6) A. thaliana
>> lipid transfer protien 6
>> 1140852_1
>> 3XSSC
>> 1140853_1 Oligonucleotide 1 (Cab) A. thaliana photosystem 1
>> chlorophyll a/b-binding protein
>> 1140854_1 Oligonucleotide 5 (LTP6) A. thaliana
>> lipid transfer protien 6
>>
>>
>> On Mon, Aug 22, 2011 at 6:57 PM, Jing Huang<huangji at="" ohsu.edu="">
wrote:
>>> Dear All members,
>>>
>>> I need to analyze a GEO database dataset. The data was generated
>>> with the platform
>>>
GPL1528<http: www.ncbi.nlm.nih.gov="" geo="" query="" acc.cgi?acc="GPL1528">:
>>> NCI/ATC Hs-OperonV2. I should use hgu133plus2.db if the data was
>>> generated by Affymetrix platform.
>>>
>>> Can somebody advise me what R annotation package I should use to
>>> solve my problem in this case?
>>>
>>>
>>> Many Thanks
>>>
>>> Jing
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
Hi, Jing.
You could try:
http://bioconductor.org/packages/release/data/annotation/html/OperonHu
manV3.db.html
Note that this might not be right, but the Operon set was in common
use a few years ago.
If this isn't what you need, you know that GEOquery automatically
grabs the annotation data from NCBI GEO? For example using a GSE from
GPL1528, see below. You can use the AnnotationDbi package to make
your own annotation packages based on these annotations. In
particular, for GPL1528, the Unigene IDs are included.
Hope that helps.
Sean
> library(GEOquery)
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material. To view, type
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")' and for packages 'citation("pkgname")'.
Setting options('download.file.method.GEOquery'='curl')
> gse = getGEO("GSE2020")
Found 1 file(s)
GSE2020_series_matrix.txt.gz
trying URL 'ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE2020/G
SE2020_series_matrix.txt.gz'
ftp data connection made, file length 518963 bytes
opened URL
==================================================
downloaded 506 Kb
File stored at:
/tmp/Rtmpdgx7wJ/GPL1528.soft
> gse
$GSE2020_series_matrix.txt.gz
ExpressionSet (storageMode: lockedEnvironment)
assayData: 21794 features, 10 samples
element names: exprs
protocolData: none
phenoData
sampleNames: GSM36482 GSM36483 ... GSM36491 (10 total)
varLabels: title geo_accession ... data_row_count (31 total)
varMetadata: labelDescription
featureData
featureNames: 1140849_1 1140850_1 ... 1298880_1 (21794 total)
fvarLabels: ID MADB_WELL_ID ... SPOT_ID (8 total)
fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL1528
> head(fData(gse[[1]]))
ID MADB_WELL_ID OLIGO_ID GENE UNIGENE
1140849_1 1140849_1 1140849 SptRpt-2a1
1140850_1 1140850_1 1140850 SptRpt-2a2
1140851_1 1140851_1 1140851 SptRpt-2a3
1140852_1 1140852_1 1140852 SptRpt-2a4
1140853_1 1140853_1 1140853 SptRpt-2a5
1140854_1 1140854_1 1140854 SptRpt-2a6
DESCRIPTION
1140849_1 Human Beta-Actin PCR Product
Human Beta-Actin 100ng/ul
1140850_1 PCR Product 1 (Cab) A. thaliana photosystem 1
chlorophyll a/b-binding protein
1140851_1 PCR Product 5 (LTP6) A. thaliana
lipid transfer protien 6
1140852_1
3XSSC
1140853_1 Oligonucleotide 1 (Cab) A. thaliana photosystem 1
chlorophyll a/b-binding protein
1140854_1 Oligonucleotide 5 (LTP6) A. thaliana
lipid transfer protien 6
GB_LIST
1140849_1
1140850_1
1140851_1
1140852_1
1140853_1
1140854_1
SPOT_ID
1140849_1 Human Beta-Actin PCR Product
Human Beta-Actin 100ng/ul
1140850_1 PCR Product 1 (Cab) A. thaliana photosystem 1
chlorophyll a/b-binding protein
1140851_1 PCR Product 5 (LTP6) A. thaliana
lipid transfer protien 6
1140852_1
3XSSC
1140853_1 Oligonucleotide 1 (Cab) A. thaliana photosystem 1
chlorophyll a/b-binding protein
1140854_1 Oligonucleotide 5 (LTP6) A. thaliana
lipid transfer protien 6
On Mon, Aug 22, 2011 at 6:57 PM, Jing Huang <huangji at="" ohsu.edu="">
wrote:
> Dear All members,
>
> I need to analyze a GEO database dataset. The data was generated
with the platform
GPL1528<http: www.ncbi.nlm.nih.gov="" geo="" query="" acc.cgi?acc="GPL1528">:
NCI/ATC Hs-OperonV2. I should use hgu133plus2.db if the data was
generated by Affymetrix platform.
>
> Can somebody advise me what R annotation package I should use to
solve my problem in this case?
>
>
> Many Thanks
>
> Jing
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
Hi Jing,
If you need a chip package that is not presently hosted, you can 1)
retrieve the probe to gene mappings from the people who made the
platform and then 2) follow the instructions in this vignette to
generate a custom package:
http://www.bioconductor.org/packages/2.8/bioc/vignettes/AnnotationDbi/
inst/doc/SQLForge.R
Marc
On 08/22/2011 04:07 PM, Sean Davis wrote:
> Hi, Jing.
>
> You could try:
>
> http://bioconductor.org/packages/release/data/annotation/html/Operon
HumanV3.db.html
>
> Note that this might not be right, but the Operon set was in common
> use a few years ago.
>
> If this isn't what you need, you know that GEOquery automatically
> grabs the annotation data from NCBI GEO? For example using a GSE
from
> GPL1528, see below. You can use the AnnotationDbi package to make
> your own annotation packages based on these annotations. In
> particular, for GPL1528, the Unigene IDs are included.
>
> Hope that helps.
>
> Sean
>
>
>
>> library(GEOquery)
> Loading required package: Biobase
>
> Welcome to Bioconductor
>
> Vignettes contain introductory material. To view, type
> 'browseVignettes()'. To cite Bioconductor, see
> 'citation("Biobase")' and for packages 'citation("pkgname")'.
>
> Setting options('download.file.method.GEOquery'='curl')
>> gse = getGEO("GSE2020")
> Found 1 file(s)
> GSE2020_series_matrix.txt.gz
> trying URL 'ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE2020
/GSE2020_series_matrix.txt.gz'
> ftp data connection made, file length 518963 bytes
> opened URL
> ==================================================
> downloaded 506 Kb
>
> File stored at:
> /tmp/Rtmpdgx7wJ/GPL1528.soft
>
>> gse
> $GSE2020_series_matrix.txt.gz
> ExpressionSet (storageMode: lockedEnvironment)
> assayData: 21794 features, 10 samples
> element names: exprs
> protocolData: none
> phenoData
> sampleNames: GSM36482 GSM36483 ... GSM36491 (10 total)
> varLabels: title geo_accession ... data_row_count (31 total)
> varMetadata: labelDescription
> featureData
> featureNames: 1140849_1 1140850_1 ... 1298880_1 (21794 total)
> fvarLabels: ID MADB_WELL_ID ... SPOT_ID (8 total)
> fvarMetadata: Column Description labelDescription
> experimentData: use 'experimentData(object)'
> Annotation: GPL1528
>
>> head(fData(gse[[1]]))
> ID MADB_WELL_ID OLIGO_ID GENE UNIGENE
> 1140849_1 1140849_1 1140849 SptRpt-2a1
> 1140850_1 1140850_1 1140850 SptRpt-2a2
> 1140851_1 1140851_1 1140851 SptRpt-2a3
> 1140852_1 1140852_1 1140852 SptRpt-2a4
> 1140853_1 1140853_1 1140853 SptRpt-2a5
> 1140854_1 1140854_1 1140854 SptRpt-2a6
>
> DESCRIPTION
> 1140849_1 Human Beta-Actin PCR Product
> Human Beta-Actin 100ng/ul
> 1140850_1 PCR Product 1 (Cab) A. thaliana photosystem 1
> chlorophyll a/b-binding protein
> 1140851_1 PCR Product 5 (LTP6) A. thaliana
> lipid transfer protien 6
> 1140852_1
> 3XSSC
> 1140853_1 Oligonucleotide 1 (Cab) A. thaliana photosystem 1
> chlorophyll a/b-binding protein
> 1140854_1 Oligonucleotide 5 (LTP6) A. thaliana
> lipid transfer protien 6
> GB_LIST
> 1140849_1
> 1140850_1
> 1140851_1
> 1140852_1
> 1140853_1
> 1140854_1
>
> SPOT_ID
> 1140849_1 Human Beta-Actin PCR Product
> Human Beta-Actin 100ng/ul
> 1140850_1 PCR Product 1 (Cab) A. thaliana photosystem 1
> chlorophyll a/b-binding protein
> 1140851_1 PCR Product 5 (LTP6) A. thaliana
> lipid transfer protien 6
> 1140852_1
> 3XSSC
> 1140853_1 Oligonucleotide 1 (Cab) A. thaliana photosystem 1
> chlorophyll a/b-binding protein
> 1140854_1 Oligonucleotide 5 (LTP6) A. thaliana
> lipid transfer protien 6
>
>
> On Mon, Aug 22, 2011 at 6:57 PM, Jing Huang<huangji at="" ohsu.edu="">
wrote:
>> Dear All members,
>>
>> I need to analyze a GEO database dataset. The data was generated
with the platform
GPL1528<http: www.ncbi.nlm.nih.gov="" geo="" query="" acc.cgi?acc="GPL1528">:
NCI/ATC Hs-OperonV2. I should use hgu133plus2.db if the data was
generated by Affymetrix platform.
>>
>> Can somebody advise me what R annotation package I should use to
solve my problem in this case?
>>
>>
>> Many Thanks
>>
>> Jing
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
Thank You Sean. It really helps!
Jing
-----Original Message-----
From: seandavi@gmail.com [mailto:seandavi@gmail.com] On Behalf Of Sean
Davis
Sent: Monday, August 22, 2011 4:07 PM
To: Jing Huang
Cc: bioconductor at r-project.org
Subject: Re: [BioC] annotation package ?
Hi, Jing.
You could try:
http://bioconductor.org/packages/release/data/annotation/html/OperonHu
manV3.db.html
Note that this might not be right, but the Operon set was in common
use a few years ago.
If this isn't what you need, you know that GEOquery automatically
grabs the annotation data from NCBI GEO? For example using a GSE from
GPL1528, see below. You can use the AnnotationDbi package to make
your own annotation packages based on these annotations. In
particular, for GPL1528, the Unigene IDs are included.
Hope that helps.
Sean
> library(GEOquery)
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material. To view, type
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")' and for packages 'citation("pkgname")'.
Setting options('download.file.method.GEOquery'='curl')
> gse = getGEO("GSE2020")
Found 1 file(s)
GSE2020_series_matrix.txt.gz
trying URL 'ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE2020/G
SE2020_series_matrix.txt.gz'
ftp data connection made, file length 518963 bytes
opened URL
==================================================
downloaded 506 Kb
File stored at:
/tmp/Rtmpdgx7wJ/GPL1528.soft
> gse
$GSE2020_series_matrix.txt.gz
ExpressionSet (storageMode: lockedEnvironment)
assayData: 21794 features, 10 samples
element names: exprs
protocolData: none
phenoData
sampleNames: GSM36482 GSM36483 ... GSM36491 (10 total)
varLabels: title geo_accession ... data_row_count (31 total)
varMetadata: labelDescription
featureData
featureNames: 1140849_1 1140850_1 ... 1298880_1 (21794 total)
fvarLabels: ID MADB_WELL_ID ... SPOT_ID (8 total)
fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL1528
> head(fData(gse[[1]]))
ID MADB_WELL_ID OLIGO_ID GENE UNIGENE
1140849_1 1140849_1 1140849 SptRpt-2a1
1140850_1 1140850_1 1140850 SptRpt-2a2
1140851_1 1140851_1 1140851 SptRpt-2a3
1140852_1 1140852_1 1140852 SptRpt-2a4
1140853_1 1140853_1 1140853 SptRpt-2a5
1140854_1 1140854_1 1140854 SptRpt-2a6
DESCRIPTION
1140849_1 Human Beta-Actin PCR Product
Human Beta-Actin 100ng/ul
1140850_1 PCR Product 1 (Cab) A. thaliana photosystem 1
chlorophyll a/b-binding protein
1140851_1 PCR Product 5 (LTP6) A. thaliana
lipid transfer protien 6
1140852_1
3XSSC
1140853_1 Oligonucleotide 1 (Cab) A. thaliana photosystem 1
chlorophyll a/b-binding protein
1140854_1 Oligonucleotide 5 (LTP6) A. thaliana
lipid transfer protien 6
GB_LIST
1140849_1
1140850_1
1140851_1
1140852_1
1140853_1
1140854_1
SPOT_ID
1140849_1 Human Beta-Actin PCR Product
Human Beta-Actin 100ng/ul
1140850_1 PCR Product 1 (Cab) A. thaliana photosystem 1
chlorophyll a/b-binding protein
1140851_1 PCR Product 5 (LTP6) A. thaliana
lipid transfer protien 6
1140852_1
3XSSC
1140853_1 Oligonucleotide 1 (Cab) A. thaliana photosystem 1
chlorophyll a/b-binding protein
1140854_1 Oligonucleotide 5 (LTP6) A. thaliana
lipid transfer protien 6
On Mon, Aug 22, 2011 at 6:57 PM, Jing Huang <huangji at="" ohsu.edu="">
wrote:
> Dear All members,
>
> I need to analyze a GEO database dataset. The data was generated
with the platform
GPL1528<http: www.ncbi.nlm.nih.gov="" geo="" query="" acc.cgi?acc="GPL1528">:
NCI/ATC Hs-OperonV2. I should use hgu133plus2.db if the data was
generated by Affymetrix platform.
>
> Can somebody advise me what R annotation package I should use to
solve my problem in this case?
>
>
> Many Thanks
>
> Jing
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>