How can I obtain gene name from chromosome location?
1
0
Entering edit mode
@yoo-seungyeul-5323
Last seen 10.2 years ago
Hi all, I'm working on DNA-methylation data of Lung Genomes. I'm using CHARM packages for the analysis of differentially methylated regions. I can have a list of chromosomal locations indicating genes but I don't know how I map this location into specific gene names. > head(pns) [1] "chr19:4205395-4220723" "chr16:73793547-73835933" [3] "chr22:18115791-18146966" "chr19:60540822-60563218" [5] "chr16:14630202-14638324" "chr19:49197954-49200178" Because I also have gene expression dataset, I want to integrate dna methylation data so obtaining genename is very critical. Please let me have any advices. Best regards, Seungyeul > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] grid stats graphics grDevices utils datasets methods [8] base other attached packages: [1] gplots_2.11.0 MASS_7.3-17 [3] KernSmooth_2.23-7 caTools_1.13 [5] bitops_1.0-4.1 gdata_2.11.0 [7] gtools_2.7.0 BSgenome.Hsapiens.UCSC.hg18_1.3.17 [9] BSgenome_1.24.0 Biostrings_2.24.1 [11] GenomicRanges_1.8.7 IRanges_1.14.3 [13] pd.feinberg.hg18.me.hx1_0.99.2 oligo_1.20.3 [15] oligoClasses_1.18.0 RSQLite_0.11.1 [17] DBI_0.2-5 charm_2.2.0 [19] genefilter_1.38.0 RColorBrewer_1.0-5 [21] fields_6.6.3 spam_0.29-1 [23] SQN_1.0.4 nor1mix_1.1-3 [25] mclust_3.4.11 Biobase_2.16.0 [27] BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] AnnotationDbi_1.18.1 BiocInstaller_1.4.7 affxparser_1.28.0 [4] affyio_1.24.0 annotate_1.34.0 bit_1.1-8 [7] codetools_0.2-8 ff_2.2-7 foreach_1.4.0 [10] iterators_1.0.6 limma_3.12.1 multtest_2.12.0 [13] parallel_2.15.0 preprocessCore_1.18.0 siggenes_1.30.0 [16] splines_2.15.0 stats4_2.15.0 survival_2.36-12 [19] sva_3.2.1 xtable_1.7-0 zlibbioc_1.2.0
Lung charm genomes Lung charm genomes • 1.8k views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 21 months ago
United States
Hi Seungyeul, On Mon, Jul 9, 2012 at 3:28 PM, Yoo, Seungyeul <seungyeul.yoo at="" mssm.edu=""> wrote: > Hi all, > > I'm working on DNA-methylation data of Lung Genomes. > > I'm using CHARM packages for the analysis of differentially methylated regions. > > I can have a list of chromosomal locations indicating genes but I don't know how I map this location into specific gene names. > >> head(pns) > [1] "chr19:4205395-4220723" "chr16:73793547-73835933" > [3] "chr22:18115791-18146966" "chr19:60540822-60563218" > [5] "chr16:14630202-14638324" "chr19:49197954-49200178" > > Because I also have gene expression dataset, I want to integrate dna methylation data so obtaining genename is very critical. > > Please let me have any advices. I'll just point you towards the way, and leave the (important) task of learning how to use these packages up to you (or another poster who feels that given you the exact commands is the best way to help you ;-) (1) Use the GenomicFeatures package to build a TranscriptDb for your organism and annotation source of choice (refseq, ensembl, ucsc known genes): http://bioconductor.org/packages/2.10/bioc/html/GenomicFeatures.html (2) Represent your ranges (chr22:XXX-YYY) as a GenomicRanges object: http://bioconductor.org/packages/2.10/bioc/html/GenomicRanges.html (3) Extract the "transcripts" from your TranscriptDb object using the `transcripts` function (4) Use findOverlaps and friends (eg. subsetByOverlaps) to find which transcripts overlap which transcripts. The GenomicFeatures, GenomicRanges, and (if you really want to master your craft) IRanges packages each have pretty extensive documentation in terms of vignettes and API documentation that are worth your time to read -- once you do so, using those packages to perform the tasks outlined above will be rather straightforward. HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD COMMENT
0
Entering edit mode
The original poster did not specify whether these are promoter regions or genic regions; if they are the former, flank() will be useful. On Mon, Jul 9, 2012 at 12:43 PM, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> wrote: > Hi Seungyeul, > > On Mon, Jul 9, 2012 at 3:28 PM, Yoo, Seungyeul <seungyeul.yoo at="" mssm.edu=""> wrote: >> Hi all, >> >> I'm working on DNA-methylation data of Lung Genomes. >> >> I'm using CHARM packages for the analysis of differentially methylated regions. >> >> I can have a list of chromosomal locations indicating genes but I don't know how I map this location into specific gene names. >> >>> head(pns) >> [1] "chr19:4205395-4220723" "chr16:73793547-73835933" >> [3] "chr22:18115791-18146966" "chr19:60540822-60563218" >> [5] "chr16:14630202-14638324" "chr19:49197954-49200178" >> >> Because I also have gene expression dataset, I want to integrate dna methylation data so obtaining genename is very critical. >> >> Please let me have any advices. > > I'll just point you towards the way, and leave the (important) task of > learning how to use these packages up to you (or another poster who > feels that given you the exact commands is the best way to help you > ;-) > > (1) Use the GenomicFeatures package to build a TranscriptDb for your > organism and annotation source of choice (refseq, ensembl, ucsc known > genes): > > http://bioconductor.org/packages/2.10/bioc/html/GenomicFeatures.html > > (2) Represent your ranges (chr22:XXX-YYY) as a GenomicRanges object: > > http://bioconductor.org/packages/2.10/bioc/html/GenomicRanges.html > > (3) Extract the "transcripts" from your TranscriptDb object using the > `transcripts` function > > (4) Use findOverlaps and friends (eg. subsetByOverlaps) to find which > transcripts overlap which transcripts. > > The GenomicFeatures, GenomicRanges, and (if you really want to master > your craft) IRanges packages each have pretty extensive documentation > in terms of vignettes and API documentation that are worth your time > to read -- once you do so, using those packages to perform the tasks > outlined above will be rather straightforward. > > HTH, > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- A model is a lie that helps you see the truth. Howard Skipper
ADD REPLY
0
Entering edit mode
Hi Tim, Thank you for your advices. I'm sorry for another naive questions, but how can I know whether the chromosome location are promoter regions or not from the rawdata? I'm reading rawdata of DNA methylation which is a pair of untreated and methylated .xys files like followings. pd<-read.table("CTRL_sample.txt",header=TRUE,sep="\t") res<-validatePd(pd) rawData<-readCharm(pd$filename,path="/projects/zhuj05a/Lung_Dataset/LG RC/Raw/Charm/3_CTRL",sampleKey=pd) ctrlind<-getControlIndex(rawData,subject=Hsapiens) grp<-pData(rawData)$tissue p<-methp(rawData,controlIndex=ctrlind,plotDensity="density_CTRL.pdf",p lotDensityGroups=grp) rownames(p)<-pns(rawData) colnames(p)<-unique(pd$sampleID) I want the rownames of the matrix p is the genename rather than chromosome locations. I will try to use "flanks()" as you suggested and also try other advices from Steve and Brian. Thanks, Seungyeul Yoo Postdoctoral Fellow Department of Genetics and Genomic Sciences Institute of Genomics and Multiscale Biology Mount Sinai School of Medicine (office) 212-659-6877 On Jul 9, 2012, at 4:56 PM, Tim Triche, Jr. wrote: > The original poster did not specify whether these are promoter regions > or genic regions; if they are the former, flank() will be useful. > > > On Mon, Jul 9, 2012 at 12:43 PM, Steve Lianoglou > <mailinglist.honeypot at="" gmail.com=""> wrote: >> Hi Seungyeul, >> >> On Mon, Jul 9, 2012 at 3:28 PM, Yoo, Seungyeul <seungyeul.yoo at="" mssm.edu=""> wrote: >>> Hi all, >>> >>> I'm working on DNA-methylation data of Lung Genomes. >>> >>> I'm using CHARM packages for the analysis of differentially methylated regions. >>> >>> I can have a list of chromosomal locations indicating genes but I don't know how I map this location into specific gene names. >>> >>>> head(pns) >>> [1] "chr19:4205395-4220723" "chr16:73793547-73835933" >>> [3] "chr22:18115791-18146966" "chr19:60540822-60563218" >>> [5] "chr16:14630202-14638324" "chr19:49197954-49200178" >>> >>> Because I also have gene expression dataset, I want to integrate dna methylation data so obtaining genename is very critical. >>> >>> Please let me have any advices. >> >> I'll just point you towards the way, and leave the (important) task of >> learning how to use these packages up to you (or another poster who >> feels that given you the exact commands is the best way to help you >> ;-) >> >> (1) Use the GenomicFeatures package to build a TranscriptDb for your >> organism and annotation source of choice (refseq, ensembl, ucsc known >> genes): >> >> http://bioconductor.org/packages/2.10/bioc/html/GenomicFeatures.html >> >> (2) Represent your ranges (chr22:XXX-YYY) as a GenomicRanges object: >> >> http://bioconductor.org/packages/2.10/bioc/html/GenomicRanges.html >> >> (3) Extract the "transcripts" from your TranscriptDb object using the >> `transcripts` function >> >> (4) Use findOverlaps and friends (eg. subsetByOverlaps) to find which >> transcripts overlap which transcripts. >> >> The GenomicFeatures, GenomicRanges, and (if you really want to master >> your craft) IRanges packages each have pretty extensive documentation >> in terms of vignettes and API documentation that are worth your time >> to read -- once you do so, using those packages to perform the tasks >> outlined above will be rather straightforward. >> >> HTH, >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> | Memorial Sloan-Kettering Cancer Center >> | Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > A model is a lie that helps you see the truth. > > Howard Skipper
ADD REPLY
0
Entering edit mode
Hi Steve, I really appreciate your comments. The pipeline was what I want to know. I will try to follow the steps and post another questions if I'm stuck somewhere else. I'm a newbie in the genomics but really enjoy learning all these stuffs. Thanks again. Seungyeul Yoo Postdoctoral Fellow Department of Genetics and Genomic Sciences Institute of Genomics and Multiscale Biology Mount Sinai School of Medicine (office) 212-659-6877 On Jul 9, 2012, at 3:43 PM, Steve Lianoglou wrote: Hi Seungyeul, On Mon, Jul 9, 2012 at 3:28 PM, Yoo, Seungyeul <seungyeul.yoo@mssm.edu<mailto:seungyeul.yoo@mssm.edu>> wrote: Hi all, I'm working on DNA-methylation data of Lung Genomes. I'm using CHARM packages for the analysis of differentially methylated regions. I can have a list of chromosomal locations indicating genes but I don't know how I map this location into specific gene names. head(pns) [1] "chr19:4205395-4220723" "chr16:73793547-73835933" [3] "chr22:18115791-18146966" "chr19:60540822-60563218" [5] "chr16:14630202-14638324" "chr19:49197954-49200178" Because I also have gene expression dataset, I want to integrate dna methylation data so obtaining genename is very critical. Please let me have any advices. I'll just point you towards the way, and leave the (important) task of learning how to use these packages up to you (or another poster who feels that given you the exact commands is the best way to help you ;-) (1) Use the GenomicFeatures package to build a TranscriptDb for your organism and annotation source of choice (refseq, ensembl, ucsc known genes): http://bioconductor.org/packages/2.10/bioc/html/GenomicFeatures.html (2) Represent your ranges (chr22:XXX-YYY) as a GenomicRanges object: http://bioconductor.org/packages/2.10/bioc/html/GenomicRanges.html (3) Extract the "transcripts" from your TranscriptDb object using the `transcripts` function (4) Use findOverlaps and friends (eg. subsetByOverlaps) to find which transcripts overlap which transcripts. The GenomicFeatures, GenomicRanges, and (if you really want to master your craft) IRanges packages each have pretty extensive documentation in terms of vignettes and API documentation that are worth your time to read -- once you do so, using those packages to perform the tasks outlined above will be rather straightforward. HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6