Entering edit mode
Andreia Fonseca
▴
810
@andreia-fonseca-3796
Last seen 7.8 years ago
Dear Joern,
I decided to follow your suggestion, and I am trying to use girafe. As
the
alignment file that I have as one line for each hit, I have prepared
another
file which has an extra column which is the number of matches and only
has
one row for each sequence.
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 TTTCTAATGAGCCCAGGGAGGGCTAGA 27 + 5 43076512 43076539 100 27
1
2 ATAACTGTAGAGGCAAGC 18 - 8 141533705 141533723 100 18 1
3 CTGAAGGGTGGATAAATTGG 20 - 22 40929788 40929808 100 20
1
4 ACTGATTGGGCTAGG 15 - 16 28567043 28567058 100 15
1
5 GCGCGTCGCCATGGAGCCCGACG 23 - 19 36605745 36605768 100 23
1
6 TGTTCTGGAACGGGCCGAGC 20 + 15 83680515 83680535 100 20
1
I have created an object called data using read.table and then
converted it
into an AlignedGenomeIntervals object using the following command:
A<-AlignedGenomeIntervals(start=data$V5, end =data$V6,
chromosome=data$V4,
strand=data$V3, sequence=as.character(data$V1), matches=data$V9)
organism(A)<-"Hs"
and then I called the reduce
B<-reduce(A)
and then I am getting the following error:
Error in .local(x, ...) :
invalid consensus matrix 'x' (some columns do not sum to 1).
Please make sure 'x' was obtained by a call to consensusMatrix(...,
as.prob=TRUE)
can you explain me what this error means?
with kind regards,
Andreia
On Fri, May 14, 2010 at 4:31 PM, Joern Toedling
<joern.toedling@curie.fr>wrote:
> Hello,
>
> I leave it to the IRanges developers to point out the quickest way
how to
> find
> such overlaps using IRanges, but my guess is that you need to create
> 'RangedData' objects and use the function findOverlaps then.
>
> However, sorry for the shameless plug, the package 'girafe' from the
latest
> Bioconductor release can also be used to answer such kinds of
questions.
> Have
> a look at the vignette for some use cases. Basically you need to
create two
> objects:
> 1. an object of class 'AlignedGenomeIntervals' from your aligned
sequences.
> the manual page of that class and the vignette show how to do this,
but
> it's
> easy given the data.frame that you already have when you read your
table
> into
> R using read.table.
> 2. an object of class 'Genome_intervals_stranded' of your genomic
> annotation.
> For example, the function 'readGff3' from package 'genomeIntervals'
can be
> used to create such an object from a gff (version 3) file containing
such
> annotation.
> When you have those two objects, the function 'interval_overlap'
will give
> you
> overlaps of any kind (>= 1nt) between those two, and 'fracOverlap'
can be
> used
> to get overlaps based on additional restrictions that you specify.
> How to use 'girafe' for finding overlaps is also shown in the
vignette.
> And there is also a coercion method between AlignedGenomeIntervals
objects
> and
> RangedData for using IRanges methods later on.
>
> Hope that helps,
> Joern
>
> PS: There is an additional mailing list 'bioc-sig-sequencing' which
may be
> more appropriate for this kind of question.
>
> On Fri, 14 May 2010 13:43:15 +0100, Andreia Fonseca wrote
> > Dear List,
> >
> > I have a file with the hits of my sequences of small RNA (18-30bp)
> > in the human genome and I have downloaded the all the annotation
of
> > the human genome from UCSC. What I want is to annotate my
sequences
> > by finding ovelaping between the positions of my sequences the the
> information
> > available from the tables I have downloaded from UCSC. So in the
> > file which maps my sequences (produced using microRazers) in the
> > human genome I have the folowing structure:
> >
> > sequence sequence length strand chromosome start end score
alignment
> > length
> >
> > I don't want to do this with biomart, because it will be too slow
> > making all the queries. However I have found the package IRanges,
> > which has the overlap function, but I am not understanding how
the
> > two tables - the query and the target tables - should be stored
and
> > how to make the overlapping. Can someone give me a hint? With kind
> > regards, Andreia
> >
> > --
> > --------------------------------------------
> > Andreia J. Amaral
> > Unidade de Imunologia Clínica
> > Instituto de Medicina Molecular
> > Universidade de Lisboa
> > email: andreiaamaral@fm.ul.pt
> > andreia.fonseca@gmail.com
> >
> > [[alternative HTML version deleted]]
>
>
> ---
> Joern Toedling
> Institut Curie -- U900
> 26 rue d'Ulm, 75005 Paris, FRANCE
> Tel. +33 (0)156246927
>
>
--
--------------------------------------------
Andreia J. Amaral
Unidade de Imunologia Clínica
Instituto de Medicina Molecular
Universidade de Lisboa
email: andreiaamaral@fm.ul.pt
andreia.fonseca@gmail.com
[[alternative HTML version deleted]]