Regarding multiple hits of same read
1
0
Entering edit mode
@deepika-lakhwani-5470
Last seen 9.6 years ago
Hello, i have been trying to find out differential expression of gene using different R packages. I have rice illumina sequencing data (pair end) with 100 bp. i mapped the data n rice genome using tophat now i got accepted_hit. bam file in which details of mapping is available. Now i am confused because it can be possible that a single read can align on multiple position. When we count the reads for differential analysis then same read is present in two different genes. So i have a question that is correct or not? and i am reading genomic features R package for counting the reads in libraries. Can anyone explain the summarizeOverlaps function? i read manual but what is basic function of it. Thanking you regards deepika [[alternative HTML version deleted]]
Sequencing Sequencing • 1.4k views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 13 months ago
United States
Hi, On Tue, Sep 24, 2013 at 4:25 AM, deepika lakhwani <lakhwanideepika at="" gmail.com=""> wrote: > Hello, > > i have been trying to find out differential expression of gene using > different R packages. I have rice illumina sequencing data (pair end) with > 100 bp. i mapped the data n rice genome using tophat now i got > accepted_hit. bam file in which details of mapping is available. > > Now i am confused because it can be possible that a single read can align > on multiple position. One way to deal with reads that align to multiple (genomic) positions is to not deal with them at all. Many people only use reads that align uniquely to the genome. > When we count the reads for differential analysis > then same read is present in two different genes. This is different than what you mention above. It is possible that: (1) One read aligns to multiple places in the genome. These reads are often called "multimapped" (multimappers, etc.) and as I mentioned above, it is rather common to ignore these and to only count reads that align to a unique position in the genome. (2) It is possible for two different genes to share the same genomic locus as each other, so even though a read maps to one position in the genome, there is more than one gene that it can be assigned to. > So i have a question that > is correct or not? Can you clarify in greater detail what you are asking "correctness" for? > and i am reading genomic features R package for counting > the reads in libraries. Can anyone explain the summarizeOverlaps function? Please read through the copious documentation made available in the GenomicRanges package: http://bioconductor.org/packages/2.12/bioc/html/GenomicRanges.html There are five PDF files available there under the "Documentation" section and all of them are worth your close attention. If you still have more specific questions after reading through those, please ask those specific ones here. A generic question like "explain the summarizeOverlaps" function isn't helpful, as it is explained in multiple places in the documentation -- if there is something specific about it that is confusing, we can help you to address that. > i read manual but what is basic function of it. So what part is unclear? You'll likely also want to read through the vignette for the parathyroidSE package: http://bioconductor.org/packages/release/data/experiment/vignettes/par athyroidSE/inst/doc/parathyroidSE.pdf It shows in great detail how to go from aligned reads to "counted" genes and exons. HTH, -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech
ADD COMMENT
0
Entering edit mode
Thank You...Steve for answering. Ok...I am reading all the pdfs very carefully and I am satisfy with your answer. But please clear my some basic question... which type of reads are selected in differential expression analysis? I understand that only unique mapped reads use for differential expression analysis but I think multimapped reads also have an important role in differential expression analysis because genome has so many duplicated/paralogous genes. if I am wrong then please tell me. regards deepika ---------- Forwarded message ---------- From: Steve Lianoglou <lianoglou.steve@gene.com> Date: Tue, Sep 24, 2013 at 12:30 PM Subject: Re: [BioC] Regarding multiple hits of same read To: deepika lakhwani <lakhwanideepika@gmail.com> Cc: "bioconductor@r-project.org list" <bioconductor@r-project.org> Hi, On Tue, Sep 24, 2013 at 4:25 AM, deepika lakhwani <lakhwanideepika@gmail.com> wrote: > Hello, > > i have been trying to find out differential expression of gene using > different R packages. I have rice illumina sequencing data (pair end) with > 100 bp. i mapped the data n rice genome using tophat now i got > accepted_hit. bam file in which details of mapping is available. > > Now i am confused because it can be possible that a single read can align > on multiple position. One way to deal with reads that align to multiple (genomic) positions is to not deal with them at all. Many people only use reads that align uniquely to the genome. > When we count the reads for differential analysis > then same read is present in two different genes. This is different than what you mention above. It is possible that: (1) One read aligns to multiple places in the genome. These reads are often called "multimapped" (multimappers, etc.) and as I mentioned above, it is rather common to ignore these and to only count reads that align to a unique position in the genome. (2) It is possible for two different genes to share the same genomic locus as each other, so even though a read maps to one position in the genome, there is more than one gene that it can be assigned to. > So i have a question that > is correct or not? Can you clarify in greater detail what you are asking "correctness" for? > and i am reading genomic features R package for counting > the reads in libraries. Can anyone explain the summarizeOverlaps function? Please read through the copious documentation made available in the GenomicRanges package: http://bioconductor.org/packages/2.12/bioc/html/GenomicRanges.html There are five PDF files available there under the "Documentation" section and all of them are worth your close attention. If you still have more specific questions after reading through those, please ask those specific ones here. A generic question like "explain the summarizeOverlaps" function isn't helpful, as it is explained in multiple places in the documentation -- if there is something specific about it that is confusing, we can help you to address that. > i read manual but what is basic function of it. So what part is unclear? You'll likely also want to read through the vignette for the parathyroidSE package: http://bioconductor.org/packages/release/data/experiment/vignettes/par athyroidSE/inst/doc/parathyroidSE.pdf It shows in great detail how to go from aligned reads to "counted" genes and exons. HTH, -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi, On Tue, Sep 24, 2013 at 11:04 AM, deepika lakhwani <lakhwanideepika at="" gmail.com=""> wrote: > Thank You...Steve for answering. > > Ok...I am reading all the pdfs very carefully and I am satisfy with your > answer. > But please clear my some basic question... > which type of reads are selected in differential expression analysis? > I understand that only unique mapped reads use for differential expression > analysis but I think multimapped reads also have an important role > in differential expression analysis because genome has so many > duplicated/paralogous genes. if I am wrong then please tell me. You are not wrong, multimapped reads are obviously important since they are "real" -- ie. they are transcribed from somewhere, and if you could know exactly where it would be a good thing. It is still an open question as to how to use them best. Although I haven't used it myself, RSEM comes up often enough that it seems like many people think its use is a good idea, so you can start there: http://www.biomedcentral.com/1471-2105/12/323 Using the references there as a seed to do a more thorough literature search and finding papers that cite RSEM should be helpful, as well as your "normal" google searching mojo. HTH, -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech
ADD REPLY
0
Entering edit mode
Thank You Steve... On Wed, Sep 25, 2013 at 12:35 AM, Steve Lianoglou <lianoglou.steve@gene.com>wrote: > Hi, > > On Tue, Sep 24, 2013 at 11:04 AM, deepika lakhwani > <lakhwanideepika@gmail.com> wrote: > > Thank You...Steve for answering. > > > > Ok...I am reading all the pdfs very carefully and I am satisfy with your > > answer. > > But please clear my some basic question... > > which type of reads are selected in differential expression analysis? > > I understand that only unique mapped reads use for differential > expression > > analysis but I think multimapped reads also have an important role > > in differential expression analysis because genome has so many > > duplicated/paralogous genes. if I am wrong then please tell me. > > You are not wrong, multimapped reads are obviously important since > they are "real" -- ie. they are transcribed from somewhere, and if you > could know exactly where it would be a good thing. > > It is still an open question as to how to use them best. Although I > haven't used it myself, RSEM comes up often enough that it seems like > many people think its use is a good idea, so you can start there: > > http://www.biomedcentral.com/1471-2105/12/323 > > Using the references there as a seed to do a more thorough literature > search and finding papers that cite RSEM should be helpful, as well as > your "normal" google searching mojo. > > > HTH, > -steve > > -- > Steve Lianoglou > Computational Biologist > Bioinformatics and Computational Biology > Genentech > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 769 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6