Entering edit mode
Hi Yuval,
Sorry for the delay.
On 01/19/2012 07:33 AM, Yuval Itan wrote:
> Dear Herve,
>
> My name is Yuval, I am a postdoc at the Rockefeller University. I am
trying to use Bioconductor for analyzing my RNA-seq data, and I would
be grateful for your advice as my R level is a bit basic and I got
stuck. I need to count the number of reads per gene and my fastq data
was aligned to chromosomes named "1", "2" etc. while
makeTranscriptFromUCSC provided with "chr1" etc. names that made the
overlap check impossible. Is there a way to convert the chromorome
names returned from the make that the TranscriptFromUCSC (or if
available the full gene) chromosome names will not include "chr" (or
any way that they would match)?
>
Please have a look at ?seqlevels
I would suggest that you use seqlevels() on your reads (you probably
have them in a GappedAlignments object), not on your transcripts
(stored
in a TranscriptDb object). Also it's important to realize that it's
not
enough to fix the chromosome names so that they match: the reference
genome used to align your fastq data must be the same as the reference
genome your annotations are based on (i.e. the genome used when
makeTranscriptFromUCSC() was called to make the TranscriptDb object).
Otherwise, even though technically you'll be able to do
findOverlaps()/countOverlaps(), the result you'll get could be
partially or totally meaningless.
Finally note that the bioconductor mailing list (cc'ed) is a better
place to ask this kind of questions as many subscribers there might
be able to help.
Cheers,
H.
> Many thanks,
>
> Yuval
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319