How can I obtain gene name from chromosome location?

0

Entering edit mode

Yoo, Seungyeul ▴ 110

@yoo-seungyeul-5323

Last seen 9.6 years ago

Hi all, I'm working on DNA-methylation data of Lung Genomes. I'm using CHARM packages for the analysis of differentially methylated regions. I can have a list of chromosomal locations indicating genes but I don't know how I map this location into specific gene names. > head(pns) [1] "chr19:4205395-4220723" "chr16:73793547-73835933" [3] "chr22:18115791-18146966" "chr19:60540822-60563218" [5] "chr16:14630202-14638324" "chr19:49197954-49200178" Because I also have gene expression dataset, I want to integrate dna methylation data so obtaining genename is very critical. Please let me have any advices. Best regards, Seungyeul > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] grid stats graphics grDevices utils datasets methods [8] base other attached packages: [1] gplots_2.11.0 MASS_7.3-17 [3] KernSmooth_2.23-7 caTools_1.13 [5] bitops_1.0-4.1 gdata_2.11.0 [7] gtools_2.7.0 BSgenome.Hsapiens.UCSC.hg18_1.3.17 [9] BSgenome_1.24.0 Biostrings_2.24.1 [11] GenomicRanges_1.8.7 IRanges_1.14.3 [13] pd.feinberg.hg18.me.hx1_0.99.2 oligo_1.20.3 [15] oligoClasses_1.18.0 RSQLite_0.11.1 [17] DBI_0.2-5 charm_2.2.0 [19] genefilter_1.38.0 RColorBrewer_1.0-5 [21] fields_6.6.3 spam_0.29-1 [23] SQN_1.0.4 nor1mix_1.1-3 [25] mclust_3.4.11 Biobase_2.16.0 [27] BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] AnnotationDbi_1.18.1 BiocInstaller_1.4.7 affxparser_1.28.0 [4] affyio_1.24.0 annotate_1.34.0 bit_1.1-8 [7] codetools_0.2-8 ff_2.2-7 foreach_1.4.0 [10] iterators_1.0.6 limma_3.12.1 multtest_2.12.0 [13] parallel_2.15.0 preprocessCore_1.18.0 siggenes_1.30.0 [16] splines_2.15.0 stats4_2.15.0 survival_2.36-12 [19] sva_3.2.1 xtable_1.7-0 zlibbioc_1.2.0

Lung charm genomes Lung charm genomes • 1.5k views

ADD COMMENT • link updated 11.8 years ago by Steve Lianoglou ★ 13k • written 11.8 years ago by Yoo, Seungyeul ▴ 110

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 14 months ago

United States

Hi Seungyeul, On Mon, Jul 9, 2012 at 3:28 PM, Yoo, Seungyeul <seungyeul.yoo at="" mssm.edu=""> wrote: > Hi all, > > I'm working on DNA-methylation data of Lung Genomes. > > I'm using CHARM packages for the analysis of differentially methylated regions. > > I can have a list of chromosomal locations indicating genes but I don't know how I map this location into specific gene names. > >> head(pns) > [1] "chr19:4205395-4220723" "chr16:73793547-73835933" > [3] "chr22:18115791-18146966" "chr19:60540822-60563218" > [5] "chr16:14630202-14638324" "chr19:49197954-49200178" > > Because I also have gene expression dataset, I want to integrate dna methylation data so obtaining genename is very critical. > > Please let me have any advices. I'll just point you towards the way, and leave the (important) task of learning how to use these packages up to you (or another poster who feels that given you the exact commands is the best way to help you ;-) (1) Use the GenomicFeatures package to build a TranscriptDb for your organism and annotation source of choice (refseq, ensembl, ucsc known genes): http://bioconductor.org/packages/2.10/bioc/html/GenomicFeatures.html (2) Represent your ranges (chr22:XXX-YYY) as a GenomicRanges object: http://bioconductor.org/packages/2.10/bioc/html/GenomicRanges.html (3) Extract the "transcripts" from your TranscriptDb object using the `transcripts` function (4) Use findOverlaps and friends (eg. subsetByOverlaps) to find which transcripts overlap which transcripts. The GenomicFeatures, GenomicRanges, and (if you really want to master your craft) IRanges packages each have pretty extensive documentation in terms of vignettes and API documentation that are worth your time to read -- once you do so, using those packages to perform the tasks outlined above will be rather straightforward. HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD COMMENT • link 11.8 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

The original poster did not specify whether these are promoter regions or genic regions; if they are the former, flank() will be useful. On Mon, Jul 9, 2012 at 12:43 PM, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> wrote: > Hi Seungyeul, > > On Mon, Jul 9, 2012 at 3:28 PM, Yoo, Seungyeul <seungyeul.yoo at="" mssm.edu=""> wrote: >> Hi all, >> >> I'm working on DNA-methylation data of Lung Genomes. >> >> I'm using CHARM packages for the analysis of differentially methylated regions. >> >> I can have a list of chromosomal locations indicating genes but I don't know how I map this location into specific gene names. >> >>> head(pns) >> [1] "chr19:4205395-4220723" "chr16:73793547-73835933" >> [3] "chr22:18115791-18146966" "chr19:60540822-60563218" >> [5] "chr16:14630202-14638324" "chr19:49197954-49200178" >> >> Because I also have gene expression dataset, I want to integrate dna methylation data so obtaining genename is very critical. >> >> Please let me have any advices. > > I'll just point you towards the way, and leave the (important) task of > learning how to use these packages up to you (or another poster who > feels that given you the exact commands is the best way to help you > ;-) > > (1) Use the GenomicFeatures package to build a TranscriptDb for your > organism and annotation source of choice (refseq, ensembl, ucsc known > genes): > > http://bioconductor.org/packages/2.10/bioc/html/GenomicFeatures.html > > (2) Represent your ranges (chr22:XXX-YYY) as a GenomicRanges object: > > http://bioconductor.org/packages/2.10/bioc/html/GenomicRanges.html > > (3) Extract the "transcripts" from your TranscriptDb object using the > `transcripts` function > > (4) Use findOverlaps and friends (eg. subsetByOverlaps) to find which > transcripts overlap which transcripts. > > The GenomicFeatures, GenomicRanges, and (if you really want to master > your craft) IRanges packages each have pretty extensive documentation > in terms of vignettes and API documentation that are worth your time > to read -- once you do so, using those packages to perform the tasks > outlined above will be rather straightforward. > > HTH, > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- A model is a lie that helps you see the truth. Howard Skipper

ADD REPLY • link 11.8 years ago Tim Triche ★ 4.2k

0

Entering edit mode

Hi Tim, Thank you for your advices. I'm sorry for another naive questions, but how can I know whether the chromosome location are promoter regions or not from the rawdata? I'm reading rawdata of DNA methylation which is a pair of untreated and methylated .xys files like followings. pd<-read.table("CTRL_sample.txt",header=TRUE,sep="\t") res<-validatePd(pd) rawData<-readCharm(pd$filename,path="/projects/zhuj05a/Lung_Dataset/LG RC/Raw/Charm/3_CTRL",sampleKey=pd) ctrlind<-getControlIndex(rawData,subject=Hsapiens) grp<-pData(rawData)$tissue p<-methp(rawData,controlIndex=ctrlind,plotDensity="density_CTRL.pdf",p lotDensityGroups=grp) rownames(p)<-pns(rawData) colnames(p)<-unique(pd$sampleID) I want the rownames of the matrix p is the genename rather than chromosome locations. I will try to use "flanks()" as you suggested and also try other advices from Steve and Brian. Thanks, Seungyeul Yoo Postdoctoral Fellow Department of Genetics and Genomic Sciences Institute of Genomics and Multiscale Biology Mount Sinai School of Medicine (office) 212-659-6877 On Jul 9, 2012, at 4:56 PM, Tim Triche, Jr. wrote: > The original poster did not specify whether these are promoter regions > or genic regions; if they are the former, flank() will be useful. > > > On Mon, Jul 9, 2012 at 12:43 PM, Steve Lianoglou > <mailinglist.honeypot at="" gmail.com=""> wrote: >> Hi Seungyeul, >> >> On Mon, Jul 9, 2012 at 3:28 PM, Yoo, Seungyeul <seungyeul.yoo at="" mssm.edu=""> wrote: >>> Hi all, >>> >>> I'm working on DNA-methylation data of Lung Genomes. >>> >>> I'm using CHARM packages for the analysis of differentially methylated regions. >>> >>> I can have a list of chromosomal locations indicating genes but I don't know how I map this location into specific gene names. >>> >>>> head(pns) >>> [1] "chr19:4205395-4220723" "chr16:73793547-73835933" >>> [3] "chr22:18115791-18146966" "chr19:60540822-60563218" >>> [5] "chr16:14630202-14638324" "chr19:49197954-49200178" >>> >>> Because I also have gene expression dataset, I want to integrate dna methylation data so obtaining genename is very critical. >>> >>> Please let me have any advices. >> >> I'll just point you towards the way, and leave the (important) task of >> learning how to use these packages up to you (or another poster who >> feels that given you the exact commands is the best way to help you >> ;-) >> >> (1) Use the GenomicFeatures package to build a TranscriptDb for your >> organism and annotation source of choice (refseq, ensembl, ucsc known >> genes): >> >> http://bioconductor.org/packages/2.10/bioc/html/GenomicFeatures.html >> >> (2) Represent your ranges (chr22:XXX-YYY) as a GenomicRanges object: >> >> http://bioconductor.org/packages/2.10/bioc/html/GenomicRanges.html >> >> (3) Extract the "transcripts" from your TranscriptDb object using the >> `transcripts` function >> >> (4) Use findOverlaps and friends (eg. subsetByOverlaps) to find which >> transcripts overlap which transcripts. >> >> The GenomicFeatures, GenomicRanges, and (if you really want to master >> your craft) IRanges packages each have pretty extensive documentation >> in terms of vignettes and API documentation that are worth your time >> to read -- once you do so, using those packages to perform the tasks >> outlined above will be rather straightforward. >> >> HTH, >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> | Memorial Sloan-Kettering Cancer Center >> | Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > A model is a lie that helps you see the truth. > > Howard Skipper

ADD REPLY • link 11.8 years ago Yoo, Seungyeul ▴ 110

0

Entering edit mode

Hi Steve, I really appreciate your comments. The pipeline was what I want to know. I will try to follow the steps and post another questions if I'm stuck somewhere else. I'm a newbie in the genomics but really enjoy learning all these stuffs. Thanks again. Seungyeul Yoo Postdoctoral Fellow Department of Genetics and Genomic Sciences Institute of Genomics and Multiscale Biology Mount Sinai School of Medicine (office) 212-659-6877 On Jul 9, 2012, at 3:43 PM, Steve Lianoglou wrote: Hi Seungyeul, On Mon, Jul 9, 2012 at 3:28 PM, Yoo, Seungyeul <seungyeul.yoo@mssm.edu<mailto:seungyeul.yoo@mssm.edu>> wrote: Hi all, I'm working on DNA-methylation data of Lung Genomes. I'm using CHARM packages for the analysis of differentially methylated regions. I can have a list of chromosomal locations indicating genes but I don't know how I map this location into specific gene names. head(pns) [1] "chr19:4205395-4220723" "chr16:73793547-73835933" [3] "chr22:18115791-18146966" "chr19:60540822-60563218" [5] "chr16:14630202-14638324" "chr19:49197954-49200178" Because I also have gene expression dataset, I want to integrate dna methylation data so obtaining genename is very critical. Please let me have any advices. I'll just point you towards the way, and leave the (important) task of learning how to use these packages up to you (or another poster who feels that given you the exact commands is the best way to help you ;-) (1) Use the GenomicFeatures package to build a TranscriptDb for your organism and annotation source of choice (refseq, ensembl, ucsc known genes): http://bioconductor.org/packages/2.10/bioc/html/GenomicFeatures.html (2) Represent your ranges (chr22:XXX-YYY) as a GenomicRanges object: http://bioconductor.org/packages/2.10/bioc/html/GenomicRanges.html (3) Extract the "transcripts" from your TranscriptDb object using the `transcripts` function (4) Use findOverlaps and friends (eg. subsetByOverlaps) to find which transcripts overlap which transcripts. The GenomicFeatures, GenomicRanges, and (if you really want to master your craft) IRanges packages each have pretty extensive documentation in terms of vignettes and API documentation that are worth your time to read -- once you do so, using those packages to perform the tasks outlined above will be rather straightforward. HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]]

ADD REPLY • link 11.8 years ago Yoo, Seungyeul ▴ 110

Login before adding your answer.