Dear List,
I have a file with the hits of my sequences of small RNA (18-30bp) in
the
human genome and I have downloaded the all the annotation of the human
genome from UCSC. What I want is to annotate my sequences by finding
ovelaping between the positions of my sequences the the information
available from the tables I have downloaded from UCSC. So in the file
which
maps my sequences (produced using microRazers) in the human genome I
have
the folowing structure:
sequence sequence length strand chromosome start end score alignment
length
I don't want to do this with biomart, because it will be too slow
making all
the queries. However I have found the package IRanges, which has the
overlap
function, but I am not understanding how the two tables - the query
and the
target tables - should be stored and how to make the overlapping. Can
someone give me a hint?
With kind regards,
Andreia
--
--------------------------------------------
Andreia J. Amaral
Unidade de Imunologia ClĂnica
Instituto de Medicina Molecular
Universidade de Lisboa
email: andreiaamaral@fm.ul.pt
andreia.fonseca@gmail.com
[[alternative HTML version deleted]]
Hi Andreia,
You might want to have a look at the GenomicFeatures package and the
GenomicRanges Package. If you read the corresponding vignettes, you
should find examples that I think do a lot of what you are talking
about
here.
http://www.bioconductor.org/packages/devel/bioc/html/GenomicFeatures.h
tml
http://www.bioconductor.org/packages/devel/bioc/html/GenomicRanges.htm
l
Marc
On 05/14/2010 05:43 AM, Andreia Fonseca wrote:
> Dear List,
>
> I have a file with the hits of my sequences of small RNA (18-30bp)
in the
> human genome and I have downloaded the all the annotation of the
human
> genome from UCSC. What I want is to annotate my sequences by finding
> ovelaping between the positions of my sequences the the information
> available from the tables I have downloaded from UCSC. So in the
file which
> maps my sequences (produced using microRazers) in the human genome I
have
> the folowing structure:
>
> sequence sequence length strand chromosome start end score alignment
length
>
> I don't want to do this with biomart, because it will be too slow
making all
> the queries. However I have found the package IRanges, which has the
overlap
> function, but I am not understanding how the two tables - the query
and the
> target tables - should be stored and how to make the overlapping.
Can
> someone give me a hint?
> With kind regards,
> Andreia
>
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
[[alternative HTML version deleted]]
Hello,
I leave it to the IRanges developers to point out the quickest way how
to find
such overlaps using IRanges, but my guess is that you need to create
'RangedData' objects and use the function findOverlaps then.
However, sorry for the shameless plug, the package 'girafe' from the
latest
Bioconductor release can also be used to answer such kinds of
questions. Have
a look at the vignette for some use cases. Basically you need to
create two
objects:
1. an object of class 'AlignedGenomeIntervals' from your aligned
sequences.
the manual page of that class and the vignette show how to do this,
but it's
easy given the data.frame that you already have when you read your
table into
R using read.table.
2. an object of class 'Genome_intervals_stranded' of your genomic
annotation.
For example, the function 'readGff3' from package 'genomeIntervals'
can be
used to create such an object from a gff (version 3) file containing
such
annotation.
When you have those two objects, the function 'interval_overlap' will
give you
overlaps of any kind (>= 1nt) between those two, and 'fracOverlap' can
be used
to get overlaps based on additional restrictions that you specify.
How to use 'girafe' for finding overlaps is also shown in the
vignette.
And there is also a coercion method between AlignedGenomeIntervals
objects and
RangedData for using IRanges methods later on.
Hope that helps,
Joern
PS: There is an additional mailing list 'bioc-sig-sequencing' which
may be
more appropriate for this kind of question.
On Fri, 14 May 2010 13:43:15 +0100, Andreia Fonseca wrote
> Dear List,
>
> I have a file with the hits of my sequences of small RNA (18-30bp)
> in the human genome and I have downloaded the all the annotation of
> the human genome from UCSC. What I want is to annotate my sequences
> by finding ovelaping between the positions of my sequences the the
information
> available from the tables I have downloaded from UCSC. So in the
> file which maps my sequences (produced using microRazers) in the
> human genome I have the folowing structure:
>
> sequence sequence length strand chromosome start end score alignment
> length
>
> I don't want to do this with biomart, because it will be too slow
> making all the queries. However I have found the package IRanges,
> which has the overlap function, but I am not understanding how the
> two tables - the query and the target tables - should be stored and
> how to make the overlapping. Can someone give me a hint? With kind
> regards, Andreia
>
> --
> --------------------------------------------
> Andreia J. Amaral
> Unidade de Imunologia Cl?nica
> Instituto de Medicina Molecular
> Universidade de Lisboa
> email: andreiaamaral at fm.ul.pt
> andreia.fonseca at gmail.com
>
> [[alternative HTML version deleted]]
---
Joern Toedling
Institut Curie -- U900
26 rue d'Ulm, 75005 Paris, FRANCE
Tel. +33 (0)156246927