Question: strand consistency with readAligned, coverage and transcriptsBy
0
8.2 years ago by
Tefina Paloma220
Tefina Paloma220 wrote:
Hi, I have a question regarding consistency of strand. Although reading the documentation and several tutorials I am not totally sure if I have understood everything right. Lets say you read in a fastq file from bowtie with readAligned. x = readAligned("fastq-file", type = "Bowtie"). For each read you have the corresponding strand info. If you then compute the coverage, cov = coverage(x) Is this the coverage on the + strand? Reads that aligned to the - strand are reverse complemented and counted for the + strand? Then you load a transcript database: yeastDb = makeTranscriptDbFromBiomart(biomart = "ensembl", dataset = "scerevisiae_gene_ensembl") tx = transcriptsBy(yeastDb) Here again, tx provides some strand info. If I now want to extract the per base coverage of a transcript that lies on the - strand. How do I do that? I have the coordinates of the transcript via coord(tx). But how do I assure that get the correct info? I hope I have made myself clear and I would really appreciate any comments. Probably is working fine out of the box, I just want to make sure that I dont make a mistake in the very first steps. Best, tefina
coverage • 478 views
modified 8.2 years ago by Steve Lianoglou12k • written 8.2 years ago by Tefina Paloma220
0
8.2 years ago by
Denali
Steve Lianoglou12k wrote:
Hi Tefina, On Tue, Sep 13, 2011 at 10:16 AM, tefina <tefina.paloma at="" gmail.com=""> wrote: > Hi, > > I have a question regarding consistency of strand. > Although reading the documentation and several tutorials I am not totally sure > if I have understood everything right. > > Lets say you read in a fastq file from bowtie with readAligned. > > x = readAligned("fastq-file", type = "Bowtie"). > > For each read you have the corresponding strand info. > > If you then compute the coverage, > > cov = coverage(x) > > Is this the coverage on the + strand? Reads that aligned to the - strand are > reverse complemented and counted for the + strand? I can't seem to get the data included in ShortRead to use an example correctly, but by looking at the code for coverage,alignedRead it looks as if the coverage is computed from all of the reads together, which is to say that the strand of the reads is ignored here. > Then you load a transcript database: > > yeastDb = makeTranscriptDbFromBiomart(biomart = "ensembl", dataset = > "scerevisiae_gene_ensembl") > tx = transcriptsBy(yeastDb) > > Here again, tx provides some strand info. If I now want to extract the per base > coverage of a transcript that lies on the - strand. > How do I do that? I have the coordinates of the transcript via coord(tx). But > how do I assure that get the correct info? For some reason I can't find this coord method of which you speak. I've got GenomicFeatures, GenomicRanges, IRanges, ShortRead (and friends) all loaded up, but trying to find a function called coord isn't working for me (and I don't see it in the latest (from svn) version of these packages). What type of object does coord return? It sounds like you are doing some RNA-seq analysis? One thing you should confirm is whether or not you should be taking strand information (of where your reads align) into consideration at all. If I'm not mistaken, most of the "standard" rna-seq protocols in use today do not keep strand information of the their reads. You should check with the people who did your experiments, but here's an easy way to see if yours does, perhaps: Try coercing your alignedRead object to a GRanges object (sorry, I know how to work with these better -- maybe this step isn't necessary). Assuming your alignedRead object is in a var called aln: R> gr <- as(aln, "GRanges") Now pick one of your favorite genes that you know has a good amount of coverage in your experiment out from your tx GRangesList ... let's say it's at position 100 R> gene <- tx[[100]] R> gene.reads <- subsetByOverlaps(gr, gene, ignore.strand=TRUE) Then look at the distro of the strands from the GRanges in gene.reads ... is it a 50/50 split, ie:? R> table(strand(gene.reads)) + - 59 57 In the toy example I whipped up for this exercise, it is ... how about yours? You should try that with a few genes to see if it's the same (also, avoid genes that have other genes annotated on the opposite strand, obviously). HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact