Coverage by base
9
0
Entering edit mode
rohan bareja ▴ 200
@rohan-bareja-4905
Last seen 7.3 years ago
Hi, I am doing RNA seq data analysis and I want to count the coverage by base,so that I could get the read counts and the intervals and the genes to which they belong. My results are below and I want to do it at gene level SimpleRleViewsList of length 1 $chr17_gl000204_random Views on a 81310-length Rle subject views: start end width [1] 1128 1205 78 [2 3 4 4 4 5 7 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 ...] [2] 6608 6748 141 [ 1 2 2 3 4 5 9 11 16 16 17 18 18 19 19 19 ...] [3] 7117 7193 77 [1 1 1 1 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 ...] [4] 7608 7678 71 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...] [5] 7739 7809 71 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...] [6] 8366 8436 71 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...] [7] 8584 8654 71 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...] [8] 8780 8892 113 [1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ...] [9] 11565 11635 71 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...] I would really appreciate any kind of help. Thanks,Rohan [[alternative HTML version deleted]]
Coverage Coverage • 2.5k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States
Are you saying you want to sum the coverage on a per-gene basis? Or count the number of reads mapping to each gene? On Fri, Oct 7, 2011 at 3:00 PM, rohan bareja <rohan_1925@yahoo.co.in> wrote: > > Hi, > I am doing RNA seq data analysis and I want to count the coverage by > base,so that I could get the read counts and the intervals and the genes to > which they belong. > My results are below and I want to do it at gene level > SimpleRleViewsList of length 1 > $chr17_gl000204_random > Views on a 81310-length Rle subject > > views: > start end width > [1] 1128 1205 78 [2 3 4 4 4 5 7 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 > ...] > [2] 6608 6748 141 [ 1 2 2 3 4 5 9 11 16 16 17 18 18 19 19 19 > ...] > [3] 7117 7193 77 [1 1 1 1 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 > ...] > [4] 7608 7678 71 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > ...] > [5] 7739 7809 71 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > ...] > [6] 8366 8436 71 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > ...] > [7] 8584 8654 71 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > ...] > [8] 8780 8892 113 [1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 > ...] > [9] 11565 11635 71 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > ...] > > > I would really appreciate any kind of help. > Thanks,Rohan > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
rohan bareja ▴ 200
@rohan-bareja-4905
Last seen 7.3 years ago
Hi, I want to sum the coverage on a per-gene basis, so that I could get the total number of reads in those intervals and the genes to which they belong.I have used viewSums which gives me the total number of reads,but I would like to get the information about genes. Thanks, Rohan [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Rohan, On Mon, Oct 10, 2011 at 10:28 AM, rohan bareja <rohan_1925 at="" yahoo.co.in=""> wrote: > Hi, > > > > I want to sum the coverage on a per-gene basis, so that I > could get the total number of reads in those intervals and the genes to which > they belong.I have used viewSums which gives me the total number of reads,but I > would like to get the information about genes. You should familiarize yourself with the GenomicFeatures package and build yourself a TranscriptDb for your organism & gene annotation combination of interest. You can then get all of the annotated genes/transcripts/etc. for your organism into different flavors of GenomicRanges objects (easiest, I guess, is a GRangesList). If your reads are stored in a GRanges, IRanges, or similar data structure, you can use the "countOverlaps" function with your transcript GRangesList obect and your reads object to get what you are after. I'm deliberately being a bit vague (as in, not giving you exact code) in order to encourage you to get more familiar with these packages yourself, so you can better answer different flavors of these types of questions as they continue to pop up in your analysis. Hope that helps, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLY
0
Entering edit mode
rohan bareja ▴ 200
@rohan-bareja-4905
Last seen 7.3 years ago
Hi, I have used the countOverlaps to count the reads on per gene basis(by exons,transcripts )But thats not what I want .I want to get the coverage base by base. reads <- readBamGappedAlignments("Galaxy Ctrl.bam")cover <- coverage(reads)islands <- slice(cover, lower = 1) I have got the islands now  which look like this views: start end width [1] 1128 1205 78 [2 3 4 4 4 5 7 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 ...] [2] 6608 6748 141 [ 1 2 2 3 4 5 9 11 16 16 17 18 18 19 19 19 ...] [3] 7117 7193 77 [1 1 1 1 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 ...] [4] 7608 7678 71 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...] [5] 7739 7809 71 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...] [6] 8366 8436 71 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...] [7] 8584 8654 71 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...] [8] 8780 8892 113 [1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ...] [9] 11565 11635 71 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...] Now I would want these intervals grouped by gene and sum up the reads interval-wiseAs countOverlaps would not give the intervals of the reads and not even the reads base by base. Thanks,Rohan [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
So, just to be clear You want, say, this row: > ? ? start ? end width > ?[1] ?1128 ?1205 ? ?78 [2 3 4 4 4 5 7 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 ...] To be something like: start end sum 1128 1205 (2 + 3 + 4 + 4 + ...) = sum(this vector here)? type of thing? and maybe tack on a "gene-id" column so you know which row belongs to which gene? -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLY
0
Entering edit mode
rohan bareja ▴ 200
@rohan-bareja-4905
Last seen 7.3 years ago
Hi, Ya thats correct..I have summed up the reads using viewSums but I want to get the gene information . Thanks,Rohan [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
rohan bareja ▴ 200
@rohan-bareja-4905
Last seen 7.3 years ago
Hi, I found this archive from bioconductor-sig-sequencing mailing list where they have discussed about cov_by_gene <- Views(cov, genes) viewSums(cov_by_gene)https://stat.ethz.ch/pipermail/bioc-sig- sequencing/attachments/20110723/b9f2c69c/attachment.pl But I am not sure what is the object "genes" that has been used along with the coverage.Thats exactly what I would like to do. Rohan Hi, Ya thats correct..I have summed up the reads using viewSums but I want to get the gene information . Thanks,Rohan [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
rohan bareja ▴ 200
@rohan-bareja-4905
Last seen 7.3 years ago
Hi, Ya thats correct..I have summed up the reads using viewSums but I want to get the gene information .Hi, I found this archive from bioconductor-sig-sequencing mailing list where they have discussed about cov_by_gene <- Views(cov, genes) viewSums(cov_by_gene)https://stat.ethz.ch/pipermail/bioc-sig- sequencing/attachments/20110723/b9f2c69c/attachment.pl But I am not sure what is the object "genes" that has been used along with the coverage.Thats exactly what I would like to do. Thanks,Rohan [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Rohan, On 11-10-11 11:49 AM, rohan bareja wrote: > Hi, > Ya thats correct..I have summed up the > reads using viewSums but I want to get the gene information .Hi, > I found this archive from bioconductor-sig-sequencing mailing list where they have discussed about cov_by_gene<- Views(cov, genes) > viewSums(cov_by_gene)https://stat.ethz.ch/pipermail/bioc-sig- sequencing/attachments/20110723/b9f2c69c/attachment.pl > But I am not sure what is the object "genes" that has been used along with the coverage.Thats exactly what I would like to do. I think 'genes' is a GRanges object object that was obtained with something like (pseudo-code): library(GenomicFeatures) txdb <- makeTranscriptDbFromUCSC(...) gene_ranges <- range(transcriptsBy(txdb, by="gene")) genes <- unlist(gene_ranges) Before the unlist() it's always good to make sure that there is exactly one range per gene e.g. with table(elementLengths(gene_ranges)). If that's the case then unlisting will preserve the nb of elements. Cheers, H. > > Thanks,Rohan > [[alternative HTML version deleted]] > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Hi, 2011/10/12 Hervé Pagès <hpages at="" fhcrc.org="">: > Hi Rohan, > > On 11-10-11 11:49 AM, rohan bareja wrote: >> >> Hi, >> Ya thats correct..I have summed up the >> ?reads using viewSums but I want to get the gene information .Hi, >> I found this archive from bioconductor-sig-sequencing mailing list where >> they have discussed about cov_by_gene<- Views(cov, genes) >> >> viewSums(cov_by_gene)https://stat.ethz.ch/pipermail/bioc-sig- sequencing/attachments/20110723/b9f2c69c/attachment.pl >> But I am not sure what is the object "genes" that has been used along with >> the coverage.Thats exactly what I would like to do. > > I think 'genes' is a GRanges object object that was obtained with > something like (pseudo-code): > > ?library(GenomicFeatures) > ?txdb <- makeTranscriptDbFromUCSC(...) > ?gene_ranges <- range(transcriptsBy(txdb, by="gene")) > ?genes <- unlist(gene_ranges) I'm not so sure? I can't find any Views(*) methods defined for GRanges objects, but maybe I'm missing something. If that's the case, I'd just go the manual route. Rohan: let's assume you have a SimpleRleList named `cvr` which represents the coverage vectors for each chromosome in your bam file, something like: R> cvr SimpleRleList of length 25 $chr1 'integer' Rle of length 249250621 with 249520 runs Lengths: 14362 4 23 1 5 ... 25 974 26 12758 Values : 0 4 5 6 5 ... 1 0 1 0 $chr2 'integer' Rle of length 243199373 with 179635 runs Lengths: 10103 31 3803 43 21102 ... 27 4543 23 19174 Values : 0 1 0 1 0 ... 1 0 1 0 And say that you've done what Herve has suggested where you have a GRanges object names `genes` that has all of your gene info: R> genes seqnames ranges strand | <rle> <iranges> <rle> | 1.1 chr19 [ 58858172, 58864865] - | 10.2 chr8 [ 18248755, 18258723] + | 100.3 chr20 [ 43248163, 43280376] - | 1000.4 chr18 [ 25530930, 25757445] - | 10000.5 chr1 [243651535, 244006886] - | 100008586.6 chrX [ 49217771, 49342266] + | ... I'd then loop over the `names(cvr)` and do your counting as you like, maybe: R> info <- lapply(names(cvr), function(chr) { g <- genes[seqnames(genes) == chr] v <- Views(cvr[[chr]], ranges(g)) viewSums(v) }) I'd imagine you will want to do more book-keeping than what I'm showing, but here's a start. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLY
0
Entering edit mode
2011/10/12 Steve Lianoglou <mailinglist.honeypot@gmail.com> > Hi, > > 2011/10/12 Hervé Pagès <hpages@fhcrc.org>: > > Hi Rohan, > > > > On 11-10-11 11:49 AM, rohan bareja wrote: > >> > >> Hi, > >> Ya thats correct..I have summed up the > >> reads using viewSums but I want to get the gene information .Hi, > >> I found this archive from bioconductor-sig-sequencing mailing list where > >> they have discussed about cov_by_gene<- Views(cov, genes) > >> > >> viewSums(cov_by_gene) > https://stat.ethz.ch/pipermail/bioc-sig- sequencing/attachments/20110723/b9f2c69c/attachment.pl > >> But I am not sure what is the object "genes" that has been used along > with > >> the coverage.Thats exactly what I would like to do. > > > > I think 'genes' is a GRanges object object that was obtained with > > something like (pseudo-code): > > > > library(GenomicFeatures) > > txdb <- makeTranscriptDbFromUCSC(...) > > gene_ranges <- range(transcriptsBy(txdb, by="gene")) > > genes <- unlist(gene_ranges) > > I'm not so sure? > > I can't find any Views(*) methods defined for GRanges objects, but > maybe I'm missing something. > > If that's the case, I'd just go the manual route. > > Rohan: let's assume you have a SimpleRleList named `cvr` which > represents the coverage vectors for each chromosome in your bam file, > something like: > > R> cvr > SimpleRleList of length 25 > $chr1 > 'integer' Rle of length 249250621 with 249520 runs > Lengths: 14362 4 23 1 5 ... 25 974 26 12758 > Values : 0 4 5 6 5 ... 1 0 1 0 > > $chr2 > 'integer' Rle of length 243199373 with 179635 runs > Lengths: 10103 31 3803 43 21102 ... 27 4543 23 19174 > Values : 0 1 0 1 0 ... 1 0 1 0 > > And say that you've done what Herve has suggested where you have a > GRanges object names `genes` that has all of your gene info: > > R> genes > seqnames ranges strand | > <rle> <iranges> <rle> | > 1.1 chr19 [ 58858172, 58864865] - | > 10.2 chr8 [ 18248755, 18258723] + | > 100.3 chr20 [ 43248163, 43280376] - | > 1000.4 chr18 [ 25530930, 25757445] - | > 10000.5 chr1 [243651535, 244006886] - | > 100008586.6 chrX [ 49217771, 49342266] + | > ... > > I'd then loop over the `names(cvr)` and do your counting as you like, > maybe: > > R> info <- lapply(names(cvr), function(chr) { > g <- genes[seqnames(genes) == chr] > v <- Views(cvr[[chr]], ranges(g)) > viewSums(v) > }) > > I'd imagine you will want to do more book-keeping than what I'm > showing, but here's a start. > > Coerce the GRanges to a RangesList and use Views(cvr, genes_rl), which would give you an RleViewsList. > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Steve, On 11-10-12 09:35 AM, Steve Lianoglou wrote: > Hi, > > 2011/10/12 Hervé Pagès<hpages at="" fhcrc.org="">: >> Hi Rohan, >> >> On 11-10-11 11:49 AM, rohan bareja wrote: >>> >>> Hi, >>> Ya thats correct..I have summed up the >>> reads using viewSums but I want to get the gene information .Hi, >>> I found this archive from bioconductor-sig-sequencing mailing list where >>> they have discussed about cov_by_gene<- Views(cov, genes) >>> >>> viewSums(cov_by_gene)https://stat.ethz.ch/pipermail/bioc-sig- sequencing/attachments/20110723/b9f2c69c/attachment.pl >>> But I am not sure what is the object "genes" that has been used along with >>> the coverage.Thats exactly what I would like to do. >> >> I think 'genes' is a GRanges object object that was obtained with >> something like (pseudo-code): >> >> library(GenomicFeatures) >> txdb<- makeTranscriptDbFromUCSC(...) >> gene_ranges<- range(transcriptsBy(txdb, by="gene")) >> genes<- unlist(gene_ranges) > > I'm not so sure? > > I can't find any Views(*) methods defined for GRanges objects, but > maybe I'm missing something. Right. It needs to be coerced to RangesList first: > genes GRanges with 3 ranges and 0 elementMetadata values: seqnames ranges strand <rle> <iranges> <rle> geneA chr1 [3, 6] * geneB chr2 [4, 7] * geneC chr1 [5, 8] * --- seqlengths: chr1 chr2 NA NA > genes_rgl <- as(genes, "RangesList") > genes_rgl CompressedIRangesList of length 2 $chr1 IRanges of length 2 start end width names [1] 3 6 4 geneA [2] 5 8 4 geneC $chr2 IRanges of length 1 start end width names [1] 4 7 4 geneB Then you can use Views() on your 'cvr' object below: > vl <- Views(cvr, genes_rgl) > vl SimpleRleViewsList of length 2 names(2): chr1 chr2 viewSums() and the other summarization functions (see ?viewSums) work on this SimpleRleViewsList object: > viewSums(vl) SimpleIntegerList of length 2 [["chr1"]] 42 15 [["chr2"]] 18 and the names of the genes are still here: > viewSums(vl)[["chr1"]] geneA geneC 42 15 > > If that's the case, I'd just go the manual route. > > Rohan: let's assume you have a SimpleRleList named `cvr` which > represents the coverage vectors for each chromosome in your bam file, > something like: > > R> cvr > SimpleRleList of length 25 > $chr1 > 'integer' Rle of length 249250621 with 249520 runs > Lengths: 14362 4 23 1 5 ... 25 974 26 12758 > Values : 0 4 5 6 5 ... 1 0 1 0 > > $chr2 > 'integer' Rle of length 243199373 with 179635 runs > Lengths: 10103 31 3803 43 21102 ... 27 4543 23 19174 > Values : 0 1 0 1 0 ... 1 0 1 0 > > And say that you've done what Herve has suggested where you have a > GRanges object names `genes` that has all of your gene info: > > R> genes > seqnames ranges strand | > <rle> <iranges> <rle> | > 1.1 chr19 [ 58858172, 58864865] - | > 10.2 chr8 [ 18248755, 18258723] + | > 100.3 chr20 [ 43248163, 43280376] - | > 1000.4 chr18 [ 25530930, 25757445] - | > 10000.5 chr1 [243651535, 244006886] - | > 100008586.6 chrX [ 49217771, 49342266] + | > ... Forgot to say: get rid of those ugly/silly names with: genes <- unlist(gene_ranges, use.names=FALSE) names(genes) <- names(gene_ranges) This is one of the reasons why it's important to make sure that 'gene_ranges' contains exactly one range per gene. Cheers, H. > > I'd then loop over the `names(cvr)` and do your counting as you like, maybe: > > R> info<- lapply(names(cvr), function(chr) { > g<- genes[seqnames(genes) == chr] > v<- Views(cvr[[chr]], ranges(g)) > viewSums(v) > }) > > I'd imagine you will want to do more book-keeping than what I'm > showing, but here's a start. > > -steve > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Hi Rohan, On 11-10-11 11:49 AM, rohan bareja wrote: > Hi, > Ya thats correct..I have summed up the > reads using viewSums but I want to get the gene information .Hi, > I found this archive from bioconductor-sig-sequencing mailing list where they have discussed about cov_by_gene<- Views(cov, genes) > viewSums(cov_by_gene)https://stat.ethz.ch/pipermail/bioc-sig- sequencing/attachments/20110723/b9f2c69c/attachment.pl > But I am not sure what is the object "genes" that has been used along with the coverage.Thats exactly what I would like to do. I think 'genes' is a GRanges object object that was obtained with something like (pseudo-code): library(GenomicFeatures) txdb <- makeTranscriptDbFromUCSC(...) gene_ranges <- range(transcriptsBy(txdb, by="gene")) genes <- unlist(gene_ranges) Before the unlist() it's always good to make sure that there is exactly one range per gene e.g. with table(elementLengths(gene_ranges)). If that's the case then unlisting will preserve the nb of elements. Cheers, H. > > Thanks,Rohan > [[alternative HTML version deleted]] > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
rohan bareja ▴ 200
@rohan-bareja-4905
Last seen 7.3 years ago
Hi Herve, Thanks Herve,Steve and Michael..I really appreciate for the help.I have been able to get the summaries as explained by you. As you can see below my result >viewSums(vl) SimpleIntegerList of length 93 [["chr1"]] 556640 0 0 0 0 1085 3337 ... 209237 781 72987 154846 52256 168230[["chr10"]] 0 0 0 0 327 0 0 1065 0 0 0 ... 122 0 24495 29181 0 0 0 3337 0 14239  >viewSums(vl)[["chr1"]]    [1]  556640       0       0       0       0    1085    3337     355 0  [10]     497     445       0   17466    1065   34932    1065 27665    5251 However,while getting the names of the genes from gene_ranges,I am getting the following error.Do you have any idea about this?  > genes<- unlist(gene_ranges,use.names=FALSE) genesGRanges with 24774 ranges and 0 elementMetadata values seqnames                 ranges strand   |                 <rle> <iranges>  <rle>   |    [1]          chr19 [ 58858172,  58864865] -   |    [2]           chr8 [ 18248755,  18258723]      +   |    [3] chr20 [ 43248163,  43280376]      -   |    [4]          chr18 [ 25530930,  25757445]      -   |    [5]           chr1 [243651535, 244006886]      -   |   >names(genes) <- names(gene_ranges) Error in `rownames<-`(`*tmp*`, value = c("1", "10", "100", "1000", "10000",  :   missing values not allowed in rownames Thanks,Rohan [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
On 11-10-12 02:08 PM, rohan bareja wrote: > Hi Herve, > > > Thanks Herve,Steve and Michael..I really appreciate for the help.I have been able to get the summaries as explained by you. > As you can see below my result>viewSums(vl) > SimpleIntegerList of length 93 > [["chr1"]] 556640 0 0 0 0 1085 3337 ... 209237 781 72987 154846 52256 168230[["chr10"]] 0 0 0 0 327 0 0 1065 0 0 0 ... 122 0 24495 29181 0 0 0 3337 0 14239 > > > >viewSums(vl)[["chr1"]] > [1] 556640 0 0 0 0 1085 3337 355 0 [10] 497 445 0 17466 1065 34932 1065 27665 5251 > However,while getting the names of the genes from gene_ranges,I am getting the following error.Do you have any idea about this? > > > genes<- unlist(gene_ranges,use.names=FALSE) > genesGRanges with 24774 ranges and 0 elementMetadata values seqnames ranges strand |<rle> <iranges> <rle> | [1] chr19 [ 58858172, 58864865] - | [2] chr8 [ 18248755, 18258723] + | [3] chr20 [ 43248163, 43280376] - | [4] chr18 [ 25530930, 25757445] - | [5] chr1 [243651535, 244006886] - | > > >names(genes)<- names(gene_ranges) > Error in `rownames<-`(`*tmp*`, value = c("1", "10", "100", "1000", "10000", : missing values not allowed in rownames sessionInfo and reproducible example please? In particular, how did you generate gene_ranges? As I said previously, you need to make sure that every element in your GRangesList object 'gene_ranges' contains only 1 range. Otherwise you cannot assume 1-to-1 correspondence between its elements and the elements of the GRanges object you get after unlisting: > grl <- GRangesList(GRanges("chr1", IRanges(3, 9)), GRanges("chr2", IRanges(2:1, 5))) > names(grl) <- c("geneA", "geneB") > gr <- unlist(grl) > names(gr) <- names(grl) Error in `rownames<-`(`*tmp*`, value = c("geneA", "geneB", NA)) : missing values not allowed in rownames Yeah the error message is not helping a lot :-/ For example, if you use makeTranscriptDbFromUCSC() to get those genes, it's very likely that you will face this problem: > txdb <- makeTranscriptDbFromUCSC(genome="hg19", tablename="refGene") > txbygene <- transcriptsBy(txdb, by="gene") > gene_ranges <- range(txbygene) > table(elementLengths(gene_ranges)) 1 2 3 4 5 6 7 8 22721 311 15 7 18 40 73 66 > gene_ranges[which(elementLengths(gene_ranges) == 8L)[1L]] GRangesList of length 1 $100129636 GRanges with 8 ranges and 0 elementMetadata values seqnames ranges strand | <rle> <iranges> <rle> | [1] chr6 [29004000, 29044517] + | [2] chr6_ssto_hap7 [ 344880, 385401] + | [3] chr6_mcf_hap5 [ 307555, 348067] + | [4] chr6_cox_hap2 [ 522820, 563332] + | [5] chr6_mann_hap4 [ 307401, 347917] + | [6] chr6_apd_hap1 [ 307442, 315573] + | [7] chr6_qbl_hap6 [ 307397, 347916] + | [8] chr6_dbb_hap3 [ 307411, 347926] + | seqlengths chr1 chr2 ... chr18_gl000207_random 249250621 243199373 ... 4262 This just means that Entrez Gene ID 100129636 has transcripts on reference sequences that are not the main chromosomes. So you need to take extra steps to address this. One approach is to unlist anyway and propagate the names of the genes by repeating them: > genes <- unlist(gene_ranges, use.names=FALSE) > names(genes) <- rep.int(names(gene_ranges), elementLengths(gene_ranges)) (Note that the above will only work with the devel version of GenomicFeatures, which requires R 2.14, because the current release version of the package doesn't allow duplicate names in a GRanges or GRangesList object). Then subset 'genes' to get rid of the ranges that are not on the main chromosomes: > main_chroms <- paste("chr", c(1:22, "X", "Y", "M"), sep="") > genes0 <- genes[seqnames(genes) %in% main_chroms] Then check that the gene names are now unique (with any(duplicated(names(genes0)))). If they are not, then again you'll need to choose a reasonable strategy to get rid of the duplicates. Note that it's not a strict requirement to have 1 range per gene, it'll just be more convenient to summarize things at the gene level e.g. when you do viewSums() on your SimpleRleViewsList object. Cheers, H. > > > > Thanks,Rohan > > > > > [[alternative HTML version deleted]] > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
rohan bareja ▴ 200
@rohan-bareja-4905
Last seen 7.3 years ago
Hi, I am trying to get the coverage in the regions like 3'UTR which is similar to doing the earlier analysis except for different annotation object.Below is my code: >reads <- readBamGappedAlignments("Galaxy Ctrl.bam") >cover=coverage(reads) >cov <- cover[1]#subsetting coverage for chromosome 1 >utr=range(threeUTRsByTranscript(txdb,use.names=FALSE)) >table(elementLengths(utr))##every element in  GRangesList object 'utr' contains only 1 range here ...thankfully as compared to previous case  1 33381 >tr<- unlist(utr,use.names=TRUE)  >names(tr) <- names(utr) >tx_rgl<- as(tr,"RangesList") #tx_rgl[1] subsetting ranges list for chromosome 1>vl <- Views(cov,tx_rgl[1]) ## So the Views object looks like this SimpleRleViewsList of length 1$chr1Views on a 249250621-length Rle subject views:           start       end  width   [1]  67208779  67210768 1990 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...]   [2]  33585784 33585995    212 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...]   [3] 48998527  48999844   1318 [ 32  44  52  72  79  92 104 115 129 ...] [4]  16785386  16786584   1199 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...]   [5]  16785492  16786584   1093 [2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ...]   [6]   8404074   8404227    154 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...]   [7]  25167429  25170815   3387 [8 8 8 8 6 6 4 4 2 1 1 1 1 1 0 0 0 0 0 ...]   [8]  16785386  16786584   1199 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...]   [9]  92145900  92149295   3396 [  3   4 5   9  16  30  46  53  67 ...]   ...       ...       ...    ... ...[3276] 248202509 248202607     99 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...][3277] 248902717 248902911    195 [ 0  0  0  0  1  2  6 11 13 15 19 19 20 ...][3278] 249104651 249106098 1448 [ 65 100 139 174 206 243 277 294 308 ...][3279] 249142833 249143714    882 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...][3280] 249144203 249144408    206 [  1   1   2   3  14  38  68 109 164 ...][3281] 249144203 249144408    206 [  1   1   2   3  14  38  68 109 164 ...][3282] 249144203 249144408    206 [  1   1   2   3  14  38  68 109 164 ...][3283] 249212563 249213345    783 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...][3284] 249212563 249213345    783 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...] Now I would like to get how many islands fall in 3'UTR with continuous coverage of :mimimum coverage of 1:no of regionsmimimum coverage of 2:no. of regions... Also,we could go the alternate way of getting the islands which I previously wrote as well: >reads <- readBamGappedAlignments("Galaxy Ctrl.bam")>cover <- coverage(reads)>islands <- slice(cover, lower = 1) Then could we get many islands fall in 3'UTR?Similarly,for continuous coverage of 2..islands <- slice(cover, lower = 2) Any ideas on this??? [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
rohan bareja ▴ 200
@rohan-bareja-4905
Last seen 7.3 years ago
Hi again, I think the first method which I described from 3'UTR annotation object can't be used to get the islands with minimum continuous coverage of 1 ,2,3 and so on... The second method in which we can get islands would be much better suited to get the regions falling in 3'UTR #no of regions  with minimum coverage of 3islands <- slice(cov, lower = 3) but how can we get how many of those fall in 3'UTR??? Thanks,Rohan [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 684 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6