R for normalizing gene length for next-gene sequencing data
1
0
Entering edit mode
Andrew Wang ▴ 20
@andrew-wang-4438
Last seen 10.1 years ago
Hello, everyone I am wondering how to use R packages to generate a count table with samples as columns and tags as rows. In addition, how to normalize the counts to the length of each gene. That is, all gene counts should be normalized from 0 to 1 in gene length and then draw a distribution of counts. Finally, how to access these R objects that store these data and to manipulate them using R commands/scripts. Thanks. Best wishes, Andrew [[alternative HTML version deleted]]
• 1.6k views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 19 months ago
United States
Hi, On Fri, Apr 22, 2011 at 7:49 PM, Andrew Wang <andrew.wang.2010.2011 at="" gmail.com=""> wrote: > Hello, everyone > > I am wondering how to use R packages to generate a count table with > samples as columns and tags as rows. In addition, how to normalize > the counts to the length of each gene. That is, all gene counts > should be normalized from 0 to 1 in gene length and then draw a > distribution of counts. Finally, how to access these R objects that > store these data and to manipulate them using R commands/scripts. Thanks. You will want to get very comfortable with the following packages: * IRanges and GenomicRanges Use the data structures in these packages (IRanges or GRanges) to store and manipulate your reads. * GenomicFeatures Provides functionality to access gene/transcript info from different annotation sources (refseq, ucsc, etc) and exposes them as GRanges objects. This makes it easy to quantify which reads overlap which genes/exons/etc (assuming you are storing you reads in I/GRanges objects (use GRanges)) * Maybe Rsamtools to query your BAM files and load them into appropriate data structures Reads through the vignettes in these packages You will be able to do all the things you are asking for once you get comfortable with the three packages above. Also * The Biostrings and BSgenome.* packages will be your friends. Read through this stuff, too: http://www.bioconductor.org/help/workflows/high-throughput-sequencing/ Tutorial/course material here: http://www.bioconductor.org/help/course-materials/2010/EMBL2010/ -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD COMMENT

Login before adding your answer.

Traffic: 762 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6