annotating reads acording with position on mapping
2
0
Entering edit mode
@andreia-fonseca-3796
Last seen 7.8 years ago
Dear List, I have a file with the hits of my sequences of small RNA (18-30bp) in the human genome and I have downloaded the all the annotation of the human genome from UCSC. What I want is to annotate my sequences by finding ovelaping between the positions of my sequences the the information available from the tables I have downloaded from UCSC. So in the file which maps my sequences (produced using microRazers) in the human genome I have the folowing structure: sequence sequence length strand chromosome start end score alignment length I don't want to do this with biomart, because it will be too slow making all the queries. However I have found the package IRanges, which has the overlap function, but I am not understanding how the two tables - the query and the target tables - should be stored and how to make the overlapping. Can someone give me a hint? With kind regards, Andreia -- -------------------------------------------- Andreia J. Amaral Unidade de Imunologia ClĂ­nica Instituto de Medicina Molecular Universidade de Lisboa email: andreiaamaral@fm.ul.pt andreia.fonseca@gmail.com [[alternative HTML version deleted]]
Annotation annotate biomaRt IRanges Annotation annotate biomaRt IRanges • 979 views
ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.3 years ago
United States
Hi Andreia, You might want to have a look at the GenomicFeatures package and the GenomicRanges Package. If you read the corresponding vignettes, you should find examples that I think do a lot of what you are talking about here. http://www.bioconductor.org/packages/devel/bioc/html/GenomicFeatures.h tml http://www.bioconductor.org/packages/devel/bioc/html/GenomicRanges.htm l Marc On 05/14/2010 05:43 AM, Andreia Fonseca wrote: > Dear List, > > I have a file with the hits of my sequences of small RNA (18-30bp) in the > human genome and I have downloaded the all the annotation of the human > genome from UCSC. What I want is to annotate my sequences by finding > ovelaping between the positions of my sequences the the information > available from the tables I have downloaded from UCSC. So in the file which > maps my sequences (produced using microRazers) in the human genome I have > the folowing structure: > > sequence sequence length strand chromosome start end score alignment length > > I don't want to do this with biomart, because it will be too slow making all > the queries. However I have found the package IRanges, which has the overlap > function, but I am not understanding how the two tables - the query and the > target tables - should be stored and how to make the overlapping. Can > someone give me a hint? > With kind regards, > Andreia > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@joern-toedling-3465
Last seen 10.2 years ago
Hello, I leave it to the IRanges developers to point out the quickest way how to find such overlaps using IRanges, but my guess is that you need to create 'RangedData' objects and use the function findOverlaps then. However, sorry for the shameless plug, the package 'girafe' from the latest Bioconductor release can also be used to answer such kinds of questions. Have a look at the vignette for some use cases. Basically you need to create two objects: 1. an object of class 'AlignedGenomeIntervals' from your aligned sequences. the manual page of that class and the vignette show how to do this, but it's easy given the data.frame that you already have when you read your table into R using read.table. 2. an object of class 'Genome_intervals_stranded' of your genomic annotation. For example, the function 'readGff3' from package 'genomeIntervals' can be used to create such an object from a gff (version 3) file containing such annotation. When you have those two objects, the function 'interval_overlap' will give you overlaps of any kind (>= 1nt) between those two, and 'fracOverlap' can be used to get overlaps based on additional restrictions that you specify. How to use 'girafe' for finding overlaps is also shown in the vignette. And there is also a coercion method between AlignedGenomeIntervals objects and RangedData for using IRanges methods later on. Hope that helps, Joern PS: There is an additional mailing list 'bioc-sig-sequencing' which may be more appropriate for this kind of question. On Fri, 14 May 2010 13:43:15 +0100, Andreia Fonseca wrote > Dear List, > > I have a file with the hits of my sequences of small RNA (18-30bp) > in the human genome and I have downloaded the all the annotation of > the human genome from UCSC. What I want is to annotate my sequences > by finding ovelaping between the positions of my sequences the the information > available from the tables I have downloaded from UCSC. So in the > file which maps my sequences (produced using microRazers) in the > human genome I have the folowing structure: > > sequence sequence length strand chromosome start end score alignment > length > > I don't want to do this with biomart, because it will be too slow > making all the queries. However I have found the package IRanges, > which has the overlap function, but I am not understanding how the > two tables - the query and the target tables - should be stored and > how to make the overlapping. Can someone give me a hint? With kind > regards, Andreia > > -- > -------------------------------------------- > Andreia J. Amaral > Unidade de Imunologia Cl?nica > Instituto de Medicina Molecular > Universidade de Lisboa > email: andreiaamaral at fm.ul.pt > andreia.fonseca at gmail.com > > [[alternative HTML version deleted]] --- Joern Toedling Institut Curie -- U900 26 rue d'Ulm, 75005 Paris, FRANCE Tel. +33 (0)156246927
ADD COMMENT

Login before adding your answer.

Traffic: 859 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6