Question: RNASeq:- getting Zero Count
0
gravatar for Reema Singh
6.3 years ago by
Reema Singh570
Reema Singh570 wrote:
Dear All, I am trying to extract the read count from three .bam files. But I am getting Zero count entries. I am using Mycobacterium Tuberculosis H37Rv gtf file ( ftp://ftp.ensemblgenomes.org/pub/release-19/bacteria//gtf/bacteria_1_c ollection/mycobacterium_tuberculosis_h37rv/Mycobacterium_tuberculosis_ h37rv.GCA_000277735.1.19.gtf.gz) and RNASeq data used here were downloaded from ( http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE40846) and aligned with bowtie2. library(GenomicFeatures) txdb <- makeTranscriptDbFromGFF(file="Mycobacterium_tuberculosis_h37rv.GCA_000 277735.1.19.gtf",format="gtf") saveDb(txdb,file="MycoTubeH37Rv.sqlite") load("MycoTubeH37Rv.sqlite") gnModel <- exonsBy(txdb,"gene") ### *also tried with "transcripts", "cds", but getting same * bamFiles <- list.files(".", "bam$", full=TRUE) names(bamFiles) <- sub("\\..*","",basename(bamFiles)) counter <- function(fl, gnModel){ aln <- GenomicRanges::readGappedAlignments(fl) strand(aln) hits <- countOverlaps(aln,gnModel) counts <- countOverlaps(gnModel,aln[hits==5]) names(counts) <- names(gnModel) counts } counts <- sapply(bamFiles,counter,gnModel) Note: method with signature ‘Vector#GRangesList’ chosen for function ‘countOverlaps’, target signature ‘GappedAlignments#GRangesList’. "GappedAlignments#Vector" would also be valid Note: method with signature ‘GRangesList#Vector’ chosen for function ‘countOverlaps’, target signature ‘GRangesList#GappedAlignments’. "Vector#GappedAlignments" would also be valid Warning messages: 1: In .Seqinfo.mergexy(x, y) : Each of the 2 combined objects has sequence levels not in the other: - in 'x': gi|448814763|ref|NC_000962.3| - in 'y': Chromosome Make sure to always combine/compare objects based on the same reference genome (use suppressWarnings() to suppress this warning). 2: In .Seqinfo.mergexy(x, y) : Each of the 2 combined objects has sequence levels not in the other: - in 'x': Chromosome - in 'y': gi|448814763|ref|NC_000962.3| Make sure to always combine/compare objects based on the same reference genome (use suppressWarnings() to suppress this warning). 3: In .Seqinfo.mergexy(x, y) : Each of the 2 combined objects has sequence levels not in the other: - in 'x': gi|448814763|ref|NC_000962.3| - in 'y': Chromosome Make sure to always combine/compare objects based on the same reference genome (use suppressWarnings() to suppress this warning). 4: In .Seqinfo.mergexy(x, y) : Each of the 2 combined objects has sequence levels not in the other: - in 'x': Chromosome - in 'y': gi|448814763|ref|NC_000962.3| Make sure to always combine/compare objects based on the same reference genome (use suppressWarnings() to suppress this warning). 5: In .Seqinfo.mergexy(x, y) : Each of the 2 combined objects has sequence levels not in the other: - in 'x': gi|448814763|ref|NC_000962.3| - in 'y': Chromosome Make sure to always combine/compare objects based on the same reference genome (use suppressWarnings() to suppress this warning). 6: In .Seqinfo.mergexy(x, y) : Each of the 2 combined objects has sequence levels not in the other: - in 'x': Chromosome - in 'y': gi|448814763|ref|NC_000962.3| Make sure to always combine/compare objects based on the same reference genome (use suppressWarnings() to suppress this warning). head(counts) SRR568038 SRR568039 SRR568040 RVBD_0001 0 0 0 RVBD_0002 0 0 0 RVBD_0003 0 0 0 RVBD_0004 0 0 0 RVBD_0005 0 0 0 RVBD_0006 0 0 0 > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Rsamtools_1.12.3 Biostrings_2.28.0 GenomicFeatures_1.12.3 [4] AnnotationDbi_1.22.6 Biobase_2.20.1 GenomicRanges_1.12.4 [7] IRanges_1.18.2 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] biomaRt_2.16.0 bitops_1.0-5 BSgenome_1.28.0 DBI_0.2-6 [5] RCurl_1.95-4.1 RSQLite_0.11.3 rtracklayer_1.20.4 stats4_3.0.1 [9] tools_3.0.1 XML_3.96-1.1 zlibbioc_1.6.0 Kind regards -- Reema Singh PhD Scholar Computational Biology and Bioinformatics School of Computational and Integrative Sciences Jawaharlal Nehru University New Delhi-110067 INDIA [[alternative HTML version deleted]]
rnaseq • 2.2k views
ADD COMMENTlink modified 6.3 years ago by Valerie Obenchain6.7k • written 6.3 years ago by Reema Singh570
Answer: RNASeq:- getting Zero Count
0
gravatar for Valerie Obenchain
6.3 years ago by
United States
Valerie Obenchain6.7k wrote:
Hi Reema, To perform overlap or matching operations the seqlevels (chromosome names) of the objects must match. The error message is telling you that some of these do not match. It's reasonable that a few names may not match (maybe a chromosome is present in one object and not the other) but the majority should. Check the seqlevels: seqlevels(aln) seqlevels(gnModel) Which names are common to both: intersect(seqlevels(gnModel), seqlevels(aln)) You can rename seqlevels in several different ways. See ?renameSeqlevels or ?seqlevels for examples. Valerie On 08/04/2013 06:35 AM, Reema Singh wrote: > Dear All, > > I am trying to extract the read count from three .bam files. But I am > getting Zero count entries. > > I am using Mycobacterium Tuberculosis H37Rv gtf file ( > ftp://ftp.ensemblgenomes.org/pub/release-19/bacteria//gtf/bacteria_1 _collection/mycobacterium_tuberculosis_h37rv/Mycobacterium_tuberculosi s_h37rv.GCA_000277735.1.19.gtf.gz) > and RNASeq data used here were downloaded from ( > http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE40846) and aligned > with bowtie2. > > > library(GenomicFeatures) > txdb <- > makeTranscriptDbFromGFF(file="Mycobacterium_tuberculosis_h37rv.GCA_0 00277735.1.19.gtf",format="gtf") > saveDb(txdb,file="MycoTubeH37Rv.sqlite") > load("MycoTubeH37Rv.sqlite") > gnModel <- exonsBy(txdb,"gene") ### *also tried with "transcripts", "cds", > but getting same * > > bamFiles <- list.files(".", "bam$", full=TRUE) > names(bamFiles) <- sub("\\..*","",basename(bamFiles)) > counter <- function(fl, gnModel){ > aln <- GenomicRanges::readGappedAlignments(fl) > strand(aln) > hits <- countOverlaps(aln,gnModel) > counts <- countOverlaps(gnModel,aln[hits==5]) > names(counts) <- names(gnModel) > counts > } > > counts <- sapply(bamFiles,counter,gnModel) > > Note: method with signature ?Vector#GRangesList? chosen for function > ?countOverlaps?, > target signature ?GappedAlignments#GRangesList?. > "GappedAlignments#Vector" would also be valid > Note: method with signature ?GRangesList#Vector? chosen for function > ?countOverlaps?, > target signature ?GRangesList#GappedAlignments?. > "Vector#GappedAlignments" would also be valid > Warning messages: > 1: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels not in the other: > - in 'x': gi|448814763|ref|NC_000962.3| > - in 'y': Chromosome > Make sure to always combine/compare objects based on the same reference > genome (use suppressWarnings() to suppress this warning). > 2: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels not in the other: > - in 'x': Chromosome > - in 'y': gi|448814763|ref|NC_000962.3| > Make sure to always combine/compare objects based on the same reference > genome (use suppressWarnings() to suppress this warning). > 3: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels not in the other: > - in 'x': gi|448814763|ref|NC_000962.3| > - in 'y': Chromosome > Make sure to always combine/compare objects based on the same reference > genome (use suppressWarnings() to suppress this warning). > 4: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels not in the other: > - in 'x': Chromosome > - in 'y': gi|448814763|ref|NC_000962.3| > Make sure to always combine/compare objects based on the same reference > genome (use suppressWarnings() to suppress this warning). > 5: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels not in the other: > - in 'x': gi|448814763|ref|NC_000962.3| > - in 'y': Chromosome > Make sure to always combine/compare objects based on the same reference > genome (use suppressWarnings() to suppress this warning). > 6: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels not in the other: > - in 'x': Chromosome > - in 'y': gi|448814763|ref|NC_000962.3| > Make sure to always combine/compare objects based on the same reference > genome (use suppressWarnings() to suppress this warning). > > head(counts) > > SRR568038 SRR568039 SRR568040 > RVBD_0001 0 0 0 > RVBD_0002 0 0 0 > RVBD_0003 0 0 0 > RVBD_0004 0 0 0 > RVBD_0005 0 0 0 > RVBD_0006 0 0 0 > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 > [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] Rsamtools_1.12.3 Biostrings_2.28.0 GenomicFeatures_1.12.3 > [4] AnnotationDbi_1.22.6 Biobase_2.20.1 GenomicRanges_1.12.4 > [7] IRanges_1.18.2 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.16.0 bitops_1.0-5 BSgenome_1.28.0 DBI_0.2-6 > > [5] RCurl_1.95-4.1 RSQLite_0.11.3 rtracklayer_1.20.4 stats4_3.0.1 > > [9] tools_3.0.1 XML_3.96-1.1 zlibbioc_1.6.0 > > > Kind regards > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENTlink written 6.3 years ago by Valerie Obenchain6.7k
Hi Valerie, Thank you so much for the reply. After checking the seqlevels, I am able to get rid off the error, but still getting the zero count entries. Is there any another way of doing this? KInd Regards On Mon, Aug 5, 2013 at 9:21 PM, Valerie Obenchain <vobencha@fhcrc.org>wrote: > Hi Reema, > > To perform overlap or matching operations the seqlevels (chromosome names) > of the objects must match. The error message is telling you that some of > these do not match. It's reasonable that a few names may not match (maybe a > chromosome is present in one object and not the other) but the majority > should. > > Check the seqlevels: > seqlevels(aln) > seqlevels(gnModel) > > Which names are common to both: > intersect(seqlevels(gnModel), seqlevels(aln)) > > You can rename seqlevels in several different ways. See ?renameSeqlevels > or ?seqlevels for examples. > > Valerie > > > On 08/04/2013 06:35 AM, Reema Singh wrote: > >> Dear All, >> >> I am trying to extract the read count from three .bam files. But I am >> getting Zero count entries. >> >> I am using Mycobacterium Tuberculosis H37Rv gtf file ( >> ftp://ftp.ensemblgenomes.org/**pub/release-19/bacteria//gtf/** >> bacteria_1_collection/**mycobacterium_tuberculosis_**h37rv/Mycobact erium_ >> **tuberculosis_h37rv.GCA_**000277735.1.19.gtf.gz<ftp: ftp.ensemblg="" enomes.org="" pub="" release-19="" bacteria="" gtf="" bacteria_1_collection="" mycobact="" erium_tuberculosis_h37rv="" mycobacterium_tuberculosis_h37rv.gca_00027773="" 5.1.19.gtf.gz=""> >> ) >> and RNASeq data used here were downloaded from ( >> http://www.ncbi.nlm.nih.gov/**geo/query/acc.cgi?acc=GSE40846<http: www.ncbi.nlm.nih.gov="" geo="" query="" acc.cgi?acc="GSE40846"> >> **) and aligned >> with bowtie2. >> >> >> library(GenomicFeatures) >> txdb <- >> makeTranscriptDbFromGFF(file="**Mycobacterium_tuberculosis_** >> h37rv.GCA_000277735.1.19.gtf",**format="gtf") >> saveDb(txdb,file="**MycoTubeH37Rv.sqlite") >> load("MycoTubeH37Rv.sqlite") >> gnModel <- exonsBy(txdb,"gene") ### *also tried with "transcripts", "cds", >> but getting same * >> >> >> bamFiles <- list.files(".", "bam$", full=TRUE) >> names(bamFiles) <- sub("\\..*","",basename(**bamFiles)) >> counter <- function(fl, gnModel){ >> aln <- GenomicRanges::**readGappedAlignments(fl) >> strand(aln) >> hits <- countOverlaps(aln,gnModel) >> counts <- countOverlaps(gnModel,aln[**hits==5]) >> names(counts) <- names(gnModel) >> counts >> } >> >> counts <- sapply(bamFiles,counter,**gnModel) >> >> Note: method with signature ‘Vector#GRangesList’ chosen for function >> ‘countOverlaps’, >> target signature ‘GappedAlignments#GRangesList’**. >> "GappedAlignments#Vector" would also be valid >> Note: method with signature ‘GRangesList#Vector’ chosen for function >> ‘countOverlaps’, >> target signature ‘GRangesList#GappedAlignments’**. >> "Vector#GappedAlignments" would also be valid >> Warning messages: >> 1: In .Seqinfo.mergexy(x, y) : >> Each of the 2 combined objects has sequence levels not in the other: >> - in 'x': gi|448814763|ref|NC_000962.3| >> - in 'y': Chromosome >> Make sure to always combine/compare objects based on the same reference >> genome (use suppressWarnings() to suppress this warning). >> 2: In .Seqinfo.mergexy(x, y) : >> Each of the 2 combined objects has sequence levels not in the other: >> - in 'x': Chromosome >> - in 'y': gi|448814763|ref|NC_000962.3| >> Make sure to always combine/compare objects based on the same reference >> genome (use suppressWarnings() to suppress this warning). >> 3: In .Seqinfo.mergexy(x, y) : >> Each of the 2 combined objects has sequence levels not in the other: >> - in 'x': gi|448814763|ref|NC_000962.3| >> - in 'y': Chromosome >> Make sure to always combine/compare objects based on the same reference >> genome (use suppressWarnings() to suppress this warning). >> 4: In .Seqinfo.mergexy(x, y) : >> Each of the 2 combined objects has sequence levels not in the other: >> - in 'x': Chromosome >> - in 'y': gi|448814763|ref|NC_000962.3| >> Make sure to always combine/compare objects based on the same reference >> genome (use suppressWarnings() to suppress this warning). >> 5: In .Seqinfo.mergexy(x, y) : >> Each of the 2 combined objects has sequence levels not in the other: >> - in 'x': gi|448814763|ref|NC_000962.3| >> - in 'y': Chromosome >> Make sure to always combine/compare objects based on the same reference >> genome (use suppressWarnings() to suppress this warning). >> 6: In .Seqinfo.mergexy(x, y) : >> Each of the 2 combined objects has sequence levels not in the other: >> - in 'x': Chromosome >> - in 'y': gi|448814763|ref|NC_000962.3| >> Make sure to always combine/compare objects based on the same reference >> genome (use suppressWarnings() to suppress this warning). >> >> head(counts) >> >> SRR568038 SRR568039 SRR568040 >> RVBD_0001 0 0 0 >> RVBD_0002 0 0 0 >> RVBD_0003 0 0 0 >> RVBD_0004 0 0 0 >> RVBD_0005 0 0 0 >> RVBD_0006 0 0 0 >> >> sessionInfo() >>> >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-redhat-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C >> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 >> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] Rsamtools_1.12.3 Biostrings_2.28.0 GenomicFeatures_1.12.3 >> [4] AnnotationDbi_1.22.6 Biobase_2.20.1 GenomicRanges_1.12.4 >> [7] IRanges_1.18.2 BiocGenerics_0.6.0 >> >> loaded via a namespace (and not attached): >> [1] biomaRt_2.16.0 bitops_1.0-5 BSgenome_1.28.0 DBI_0.2-6 >> >> [5] RCurl_1.95-4.1 RSQLite_0.11.3 rtracklayer_1.20.4 >> stats4_3.0.1 >> >> [9] tools_3.0.1 XML_3.96-1.1 zlibbioc_1.6.0 >> >> >> Kind regards >> >> >> >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> >> -- Reema Singh PhD Scholar Computational Biology and Bioinformatics School of Computational and Integrative Sciences Jawaharlal Nehru University New Delhi-110067 INDIA [[alternative HTML version deleted]]
ADD REPLYlink written 6.3 years ago by Reema Singh570
I would do some investigating with a single bam file. Confirm 'gnModel' and 'aln' have some common seqlevels. This call should produce a result. intersect(seqlevels(aln), seqlevels(gnModel)) Call countOverlaps on a single file. co <- countOverlaps(aln, gnModel) Evidently you only want the bam records that have exactly 5 hits. This could be limiting. To see the distribution of hits make a table of the counts. table(co) Valerie On 08/05/2013 10:05 PM, Reema Singh wrote: > Hi Valerie, > > Thank you so much for the reply. > > After checking the seqlevels, I am able to get rid off the error, but > still getting the zero count entries. Is there any another way of doing > this? > > KInd Regards > > > On Mon, Aug 5, 2013 at 9:21 PM, Valerie Obenchain <vobencha at="" fhcrc.org=""> <mailto:vobencha at="" fhcrc.org="">> wrote: > > Hi Reema, > > To perform overlap or matching operations the seqlevels (chromosome > names) of the objects must match. The error message is telling you > that some of these do not match. It's reasonable that a few names > may not match (maybe a chromosome is present in one object and not > the other) but the majority should. > > Check the seqlevels: > seqlevels(aln) > seqlevels(gnModel) > > Which names are common to both: > intersect(seqlevels(gnModel), seqlevels(aln)) > > You can rename seqlevels in several different ways. See > ?renameSeqlevels or ?seqlevels for examples. > > Valerie > > > On 08/04/2013 06:35 AM, Reema Singh wrote: > > Dear All, > > I am trying to extract the read count from three .bam files. But > I am > getting Zero count entries. > > I am using Mycobacterium Tuberculosis H37Rv gtf file ( > ftp://ftp.ensemblgenomes.org/__pub/release-19/bacteria//gtf/ __bacteria_1_collection/__mycobacterium_tuberculosis___h37rv/Mycobacte rium___tuberculosis_h37rv.GCA___000277735.1.19.gtf.gz > <ftp: ftp.ensemblgenomes.org="" pub="" release-19="" bacteria="" gtf="" b="" acteria_1_collection="" mycobacterium_tuberculosis_h37rv="" mycobacterium_tu="" berculosis_h37rv.gca_000277735.1.19.gtf.gz="">) > and RNASeq data used here were downloaded from ( > http://www.ncbi.nlm.nih.gov/__geo/query/acc.cgi?acc=GSE40846 > <http: www.ncbi.nlm.nih.gov="" geo="" query="" acc.cgi?acc="GSE40846">__) > and aligned > with bowtie2. > > > library(GenomicFeatures) > txdb <- > makeTranscriptDbFromGFF(file="__Mycobacterium_tuberculosis__ _h37rv.GCA_000277735.1.19.gtf",__format="gtf") > saveDb(txdb,file="__MycoTubeH37Rv.sqlite") > load("MycoTubeH37Rv.sqlite") > gnModel <- exonsBy(txdb,"gene") ### *also tried with > "transcripts", "cds", > but getting same * > > > bamFiles <- list.files(".", "bam$", full=TRUE) > names(bamFiles) <- sub("\\..*","",basename(__bamFiles)) > counter <- function(fl, gnModel){ > aln <- GenomicRanges::__readGappedAlignments(fl) > strand(aln) > hits <- countOverlaps(aln,gnModel) > counts <- countOverlaps(gnModel,aln[__hits==5]) > names(counts) <- names(gnModel) > counts > } > > counts <- sapply(bamFiles,counter,__gnModel) > > Note: method with signature ?Vector#GRangesList? chosen for function > ?countOverlaps?, > target signature ?GappedAlignments#GRangesList?__. > "GappedAlignments#Vector" would also be valid > Note: method with signature ?GRangesList#Vector? chosen for function > ?countOverlaps?, > target signature ?GRangesList#GappedAlignments?__. > "Vector#GappedAlignments" would also be valid > Warning messages: > 1: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels not in > the other: > - in 'x': gi|448814763|ref|NC_000962.3| > - in 'y': Chromosome > Make sure to always combine/compare objects based on the > same reference > genome (use suppressWarnings() to suppress this warning). > 2: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels not in > the other: > - in 'x': Chromosome > - in 'y': gi|448814763|ref|NC_000962.3| > Make sure to always combine/compare objects based on the > same reference > genome (use suppressWarnings() to suppress this warning). > 3: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels not in > the other: > - in 'x': gi|448814763|ref|NC_000962.3| > - in 'y': Chromosome > Make sure to always combine/compare objects based on the > same reference > genome (use suppressWarnings() to suppress this warning). > 4: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels not in > the other: > - in 'x': Chromosome > - in 'y': gi|448814763|ref|NC_000962.3| > Make sure to always combine/compare objects based on the > same reference > genome (use suppressWarnings() to suppress this warning). > 5: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels not in > the other: > - in 'x': gi|448814763|ref|NC_000962.3| > - in 'y': Chromosome > Make sure to always combine/compare objects based on the > same reference > genome (use suppressWarnings() to suppress this warning). > 6: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels not in > the other: > - in 'x': Chromosome > - in 'y': gi|448814763|ref|NC_000962.3| > Make sure to always combine/compare objects based on the > same reference > genome (use suppressWarnings() to suppress this warning). > > head(counts) > > SRR568038 SRR568039 SRR568040 > RVBD_0001 0 0 0 > RVBD_0002 0 0 0 > RVBD_0003 0 0 0 > RVBD_0004 0 0 0 > RVBD_0005 0 0 0 > RVBD_0006 0 0 0 > > sessionInfo() > > R version 3.0.1 (2013-05-16) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 > [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets > methods > [8] base > > other attached packages: > [1] Rsamtools_1.12.3 Biostrings_2.28.0 > GenomicFeatures_1.12.3 > [4] AnnotationDbi_1.22.6 Biobase_2.20.1 > GenomicRanges_1.12.4 > [7] IRanges_1.18.2 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.16.0 bitops_1.0-5 BSgenome_1.28.0 > DBI_0.2-6 > > [5] RCurl_1.95-4.1 RSQLite_0.11.3 rtracklayer_1.20.4 > stats4_3.0.1 > > [9] tools_3.0.1 XML_3.96-1.1 zlibbioc_1.6.0 > > > Kind regards > > > > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > > > > -- > Reema Singh > PhD Scholar > Computational Biology and Bioinformatics > School of Computational and Integrative Sciences > Jawaharlal Nehru University > New Delhi-110067 > INDIA
ADD REPLYlink written 6.3 years ago by Valerie Obenchain6.7k
On Tue, Aug 6, 2013 at 9:13 PM, Valerie Obenchain <vobencha@fhcrc.org>wrote: > I would do some investigating with a single bam file. > > Confirm 'gnModel' and 'aln' have some common seqlevels. This call should > produce a result. > > intersect(seqlevels(aln), seqlevels(gnModel)) > SRR568038 SRR568039 SRR568040 "Chromosome" "Chromosome" "Chromosome" > > Call countOverlaps on a single file. > > co <- countOverlaps(aln, gnModel) > > aln <- GenomicRanges::readGappedAlignments(bamFiles[1]) > head(aln) GappedAlignments with 6 alignments and 0 metadata columns: seqnames strand cigar qwidth start end width <rle> <rle> <character> <integer> <integer> <integer> <integer> [1] Chromosome + 39M 39 2 40 39 [2] Chromosome + 39M 39 2 40 39 [3] Chromosome + 39M 39 4 42 39 [4] Chromosome + 39M 39 4 42 39 [5] Chromosome + 39M 39 5 43 39 [6] Chromosome + 39M 39 5 43 39 ngap <integer> [1] 0 [2] 0 [3] 0 [4] 0 [5] 0 [6] 0 --- seqlengths: Chromosome 4411708 > > Evidently you only want the bam records that have exactly 5 hits. This > could be limiting. To see the distribution of hits make a table of the > counts. > > table(co) > > co <- countOverlaps(aln, gnModel) > table(co) co 0 1 2 575536 33179936 17850 You are right, when I changed the exactly bam hit list from % to all. Then I got some count. Here's the table. > head(counts) SRR568038 SRR568039 SRR568040 RVBD_0001 456 1331 94 RVBD_0002 258 1714 64 RVBD_0003 270 2090 90 RVBD_0004 122 536 15 RVBD_0005 2487 23056 997 RVBD_0006 2343 23013 814 I hope this is right. Kindly correct me, if you something wrong here. Kind Regards > > Valerie > > > > On 08/05/2013 10:05 PM, Reema Singh wrote: > >> Hi Valerie, >> >> Thank you so much for the reply. >> >> After checking the seqlevels, I am able to get rid off the error, but >> still getting the zero count entries. Is there any another way of doing >> this? >> >> KInd Regards >> >> >> On Mon, Aug 5, 2013 at 9:21 PM, Valerie Obenchain <vobencha@fhcrc.org>> <mailto:vobencha@fhcrc.org>> wrote: >> >> Hi Reema, >> >> To perform overlap or matching operations the seqlevels (chromosome >> names) of the objects must match. The error message is telling you >> that some of these do not match. It's reasonable that a few names >> may not match (maybe a chromosome is present in one object and not >> the other) but the majority should. >> >> Check the seqlevels: >> seqlevels(aln) >> seqlevels(gnModel) >> >> Which names are common to both: >> intersect(seqlevels(gnModel), seqlevels(aln)) >> >> You can rename seqlevels in several different ways. See >> ?renameSeqlevels or ?seqlevels for examples. >> >> Valerie >> >> >> On 08/04/2013 06:35 AM, Reema Singh wrote: >> >> Dear All, >> >> I am trying to extract the read count from three .bam files. But >> I am >> getting Zero count entries. >> >> I am using Mycobacterium Tuberculosis H37Rv gtf file ( >> ftp://ftp.ensemblgenomes.org/_**_pub/release-19/bacteria//gtf/** >> __bacteria_1_collection/__**mycobacterium_tuberculosis___** >> h37rv/Mycobacterium___**tuberculosis_h37rv.GCA___**000277735.1.19.g tf.gz<ftp: ftp.ensemblgenomes.org="" __pub="" release-19="" bacteria="" gtf="" __ba="" cteria_1_collection="" __mycobacterium_tuberculosis___h37rv="" mycobacterium="" ___tuberculosis_h37rv.gca___000277735.1.19.gtf.gz=""> >> <ftp: ftp.ensemblgenomes.org="" **pub="" release-19="" bacteria="" gtf="" **="">> bacteria_1_collection/**mycobacterium_tuberculosis_**h37rv/Mycobact erium_ >> **tuberculosis_h37rv.GCA_**000277735.1.19.gtf.gz<ftp: ftp.ensemblg="" enomes.org="" pub="" release-19="" bacteria="" gtf="" bacteria_1_collection="" mycobact="" erium_tuberculosis_h37rv="" mycobacterium_tuberculosis_h37rv.gca_00027773="" 5.1.19.gtf.gz=""> >> >) >> >> and RNASeq data used here were downloaded from ( >> http://www.ncbi.nlm.nih.gov/__**geo/query/acc.cgi?acc=GSE40 846<http: www.ncbi.nlm.nih.gov="" __geo="" query="" acc.cgi?acc="GSE40846"> >> <http: www.ncbi.nlm.nih.gov="" **geo="" query="" acc.cgi?acc="GSE408" 46<http:="" www.ncbi.nlm.nih.gov="" geo="" query="" acc.cgi?acc="GSE40846"> >> **>__) >> >> and aligned >> with bowtie2. >> >> >> library(GenomicFeatures) >> txdb <- >> makeTranscriptDbFromGFF(file="**__Mycobacterium_tuberculosis__** >> _h37rv.GCA_000277735.1.19.gtf"**,__format="gtf") >> saveDb(txdb,file="__**MycoTubeH37Rv.sqlite") >> >> load("MycoTubeH37Rv.sqlite") >> gnModel <- exonsBy(txdb,"gene") ### *also tried with >> "transcripts", "cds", >> but getting same * >> >> >> bamFiles <- list.files(".", "bam$", full=TRUE) >> names(bamFiles) <- sub("\\..*","",basename(__**bamFiles)) >> counter <- function(fl, gnModel){ >> aln <- GenomicRanges::__**readGappedAlignments(fl) >> strand(aln) >> hits <- countOverlaps(aln,gnModel) >> counts <- countOverlaps(gnModel,aln[__**hits==5]) >> names(counts) <- names(gnModel) >> counts >> } >> >> counts <- sapply(bamFiles,counter,__**gnModel) >> >> >> Note: method with signature ‘Vector#GRangesList’ chosen for >> function >> ‘countOverlaps’, >> target signature ‘GappedAlignments#GRangesList’**__. >> >> "GappedAlignments#Vector" would also be valid >> Note: method with signature ‘GRangesList#Vector’ chosen for >> function >> ‘countOverlaps’, >> target signature ‘GRangesList#GappedAlignments’**__. >> >> "Vector#GappedAlignments" would also be valid >> Warning messages: >> 1: In .Seqinfo.mergexy(x, y) : >> Each of the 2 combined objects has sequence levels not in >> the other: >> - in 'x': gi|448814763|ref|NC_000962.3| >> - in 'y': Chromosome >> Make sure to always combine/compare objects based on the >> same reference >> genome (use suppressWarnings() to suppress this warning). >> 2: In .Seqinfo.mergexy(x, y) : >> Each of the 2 combined objects has sequence levels not in >> the other: >> - in 'x': Chromosome >> - in 'y': gi|448814763|ref|NC_000962.3| >> Make sure to always combine/compare objects based on the >> same reference >> genome (use suppressWarnings() to suppress this warning). >> 3: In .Seqinfo.mergexy(x, y) : >> Each of the 2 combined objects has sequence levels not in >> the other: >> - in 'x': gi|448814763|ref|NC_000962.3| >> - in 'y': Chromosome >> Make sure to always combine/compare objects based on the >> same reference >> genome (use suppressWarnings() to suppress this warning). >> 4: In .Seqinfo.mergexy(x, y) : >> Each of the 2 combined objects has sequence levels not in >> the other: >> - in 'x': Chromosome >> - in 'y': gi|448814763|ref|NC_000962.3| >> Make sure to always combine/compare objects based on the >> same reference >> genome (use suppressWarnings() to suppress this warning). >> 5: In .Seqinfo.mergexy(x, y) : >> Each of the 2 combined objects has sequence levels not in >> the other: >> - in 'x': gi|448814763|ref|NC_000962.3| >> - in 'y': Chromosome >> Make sure to always combine/compare objects based on the >> same reference >> genome (use suppressWarnings() to suppress this warning). >> 6: In .Seqinfo.mergexy(x, y) : >> Each of the 2 combined objects has sequence levels not in >> the other: >> - in 'x': Chromosome >> - in 'y': gi|448814763|ref|NC_000962.3| >> Make sure to always combine/compare objects based on the >> same reference >> genome (use suppressWarnings() to suppress this warning). >> >> head(counts) >> >> SRR568038 SRR568039 SRR568040 >> RVBD_0001 0 0 0 >> RVBD_0002 0 0 0 >> RVBD_0003 0 0 0 >> RVBD_0004 0 0 0 >> RVBD_0005 0 0 0 >> RVBD_0006 0 0 0 >> >> sessionInfo() >> >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-redhat-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C >> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 >> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets >> methods >> [8] base >> >> other attached packages: >> [1] Rsamtools_1.12.3 Biostrings_2.28.0 >> GenomicFeatures_1.12.3 >> [4] AnnotationDbi_1.22.6 Biobase_2.20.1 >> GenomicRanges_1.12.4 >> [7] IRanges_1.18.2 BiocGenerics_0.6.0 >> >> loaded via a namespace (and not attached): >> [1] biomaRt_2.16.0 bitops_1.0-5 BSgenome_1.28.0 >> DBI_0.2-6 >> >> [5] RCurl_1.95-4.1 RSQLite_0.11.3 rtracklayer_1.20.4 >> stats4_3.0.1 >> >> [9] tools_3.0.1 XML_3.96-1.1 zlibbioc_1.6.0 >> >> >> Kind regards >> >> >> >> >> ______________________________**___________________ >> Bioconductor mailing list >> Bioconductor@r-project.org <mailto:bioconductor@r-**project.org<bioconductor@r-project.org> >> > >> https://stat.ethz.ch/mailman/_**_listinfo/bioconductor<http s:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> >> >> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor<https="" :="" stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >> > >> Search the archives: >> http://news.gmane.org/gmane.__**science.biology.informatics.__** >> conductor<http: news.gmane.org="" gmane.__science.biology.informatics="" .__conductor=""> >> >> <http: news.gmane.org="" gmane.**science.biology.informatics.**="">> conductor<http: news.gmane.org="" gmane.science.biology.informatics.c="" onductor=""> >> > >> >> >> >> >> -- >> Reema Singh >> PhD Scholar >> Computational Biology and Bioinformatics >> School of Computational and Integrative Sciences >> Jawaharlal Nehru University >> New Delhi-110067 >> INDIA >> > > -- Reema Singh PhD Scholar Computational Biology and Bioinformatics School of Computational and Integrative Sciences Jawaharlal Nehru University New Delhi-110067 INDIA [[alternative HTML version deleted]]
ADD REPLYlink written 6.3 years ago by Reema Singh570
Hi, I think you are on the right track. There are still a couple of things that don't look right. A bam file often has more than one chromosome. In your example I see only one chromosome named 'Chromosome'. Is there really only one chromosome in the file? The idea behind changing the names of the seqlevels is to make something like 'chr22' and '22' match by adding the 'chr' to the second. Then we have 'chr22' and 'chr22'. (Equivalently you could remove the 'chr' to get '22' and '22'). The chromosomes names must be preserved at some level so that chromosomes from the bam are being overlapped with the same chromosomes from the gff file. In the counter function below you check the strand of 'aln' but don't do anything with it. Did you want to change the strand? As an fyi, you can ignore strand when performing overlaps by setting the 'ignore.strand' argument to TRUE. Valerie On 08/08/2013 12:17 AM, Reema Singh wrote: > > > > On Tue, Aug 6, 2013 at 9:13 PM, Valerie Obenchain <vobencha at="" fhcrc.org=""> <mailto:vobencha at="" fhcrc.org="">> wrote: > > I would do some investigating with a single bam file. > > Confirm 'gnModel' and 'aln' have some common seqlevels. This call > should produce a result. > > intersect(seqlevels(aln), seqlevels(gnModel)) > > > SRR568038 SRR568039 SRR568040 > "Chromosome" "Chromosome" "Chromosome" > > > Call countOverlaps on a single file. > > co <- countOverlaps(aln, gnModel) > > > aln <- GenomicRanges::readGappedAlignments(bamFiles[1]) > > head(aln) > GappedAlignments with 6 alignments and 0 metadata columns: > seqnames strand cigar qwidth start end width > <rle> <rle> <character> <integer> <integer> <integer> <integer> > [1] Chromosome + 39M 39 2 40 39 > [2] Chromosome + 39M 39 2 40 39 > [3] Chromosome + 39M 39 4 42 39 > [4] Chromosome + 39M 39 4 42 39 > [5] Chromosome + 39M 39 5 43 39 > [6] Chromosome + 39M 39 5 43 39 > ngap > <integer> > [1] 0 > [2] 0 > [3] 0 > [4] 0 > [5] 0 > [6] 0 > --- > seqlengths: > Chromosome > 4411708 > > > Evidently you only want the bam records that have exactly 5 hits. > This could be limiting. To see the distribution of hits make a table > of the counts. > > table(co) > > > co <- countOverlaps(aln, gnModel) > > table(co) > co > 0 1 2 > 575536 33179936 17850 > > You are right, when I changed the exactly bam hit list from % to all. > Then I got some count. Here's the table. > > > head(counts) > SRR568038 SRR568039 SRR568040 > RVBD_0001 456 1331 94 > RVBD_0002 258 1714 64 > RVBD_0003 270 2090 90 > RVBD_0004 122 536 15 > RVBD_0005 2487 23056 997 > RVBD_0006 2343 23013 814 > > I hope this is right. Kindly correct me, if you something wrong here. > > > Kind Regards > > > > > Valerie > > > > On 08/05/2013 10:05 PM, Reema Singh wrote: > > Hi Valerie, > > Thank you so much for the reply. > > After checking the seqlevels, I am able to get rid off the > error, but > still getting the zero count entries. Is there any another way > of doing > this? > > KInd Regards > > > On Mon, Aug 5, 2013 at 9:21 PM, Valerie Obenchain > <vobencha at="" fhcrc.org="" <mailto:vobencha="" at="" fhcrc.org=""> > <mailto:vobencha at="" fhcrc.org="" <mailto:vobencha="" at="" fhcrc.org="">>> wrote: > > Hi Reema, > > To perform overlap or matching operations the seqlevels > (chromosome > names) of the objects must match. The error message is > telling you > that some of these do not match. It's reasonable that a few > names > may not match (maybe a chromosome is present in one object > and not > the other) but the majority should. > > Check the seqlevels: > seqlevels(aln) > seqlevels(gnModel) > > Which names are common to both: > intersect(seqlevels(gnModel), seqlevels(aln)) > > You can rename seqlevels in several different ways. See > ?renameSeqlevels or ?seqlevels for examples. > > Valerie > > > On 08/04/2013 06:35 AM, Reema Singh wrote: > > Dear All, > > I am trying to extract the read count from three .bam > files. But > I am > getting Zero count entries. > > I am using Mycobacterium Tuberculosis H37Rv gtf file ( > ftp://ftp.ensemblgenomes.org/____pub/release-19/bacteria//gt f/____bacteria_1_collection/____mycobacterium_tuberculosis_____h37rv/M ycobacterium_____tuberculosis_h37rv.GCA_____000277735.1.19.gtf.gz > <ftp: ftp.ensemblgenomes.org="" __pub="" release-19="" bacteria="" gtf="" __bacteria_1_collection="" __mycobacterium_tuberculosis___h37rv="" mycobact="" erium___tuberculosis_h37rv.gca___000277735.1.19.gtf.gz=""> > > <ftp: ftp.ensemblgenomes.org="" __pub="" release-19="" bacteria="" gtf="" __bacteria_1_collection="" __mycobacterium_tuberculosis___h37rv="" mycobact="" erium___tuberculosis_h37rv.gca___000277735.1.19.gtf.gz=""> <ftp: ftp.ensemblgenomes.org="" pub="" release-19="" bacteria="" gtf="" b="" acteria_1_collection="" mycobacterium_tuberculosis_h37rv="" mycobacterium_tu="" berculosis_h37rv.gca_000277735.1.19.gtf.gz="">>) > > and RNASeq data used here were downloaded from ( > http://www.ncbi.nlm.nih.gov/____geo/query/acc.cgi?acc=GSE40846 > <http: www.ncbi.nlm.nih.gov="" __geo="" query="" acc.cgi?acc="GSE40846"> > > <http: www.ncbi.nlm.nih.gov="" __geo="" query="" acc.cgi?acc="GSE40846"> <http: www.ncbi.nlm.nih.gov="" geo="" query="" acc.cgi?acc="GSE40846">__>__) > > and aligned > with bowtie2. > > > library(GenomicFeatures) > txdb <- > > makeTranscriptDbFromGFF(file="____Mycobacterium_tuberculosis _____h37rv.GCA_000277735.1.19.gtf"__,__format="gtf") > saveDb(txdb,file="____MycoTubeH37Rv.sqlite") > > load("MycoTubeH37Rv.sqlite") > gnModel <- exonsBy(txdb,"gene") ### *also tried with > "transcripts", "cds", > but getting same * > > > bamFiles <- list.files(".", "bam$", full=TRUE) > names(bamFiles) <- sub("\\..*","",basename(____bamFiles)) > counter <- function(fl, gnModel){ > aln <- GenomicRanges::____readGappedAlignments(fl) > strand(aln) > hits <- countOverlaps(aln,gnModel) > counts <- countOverlaps(gnModel,aln[____hits==5]) > names(counts) <- names(gnModel) > counts > } > > counts <- sapply(bamFiles,counter,____gnModel) > > > Note: method with signature ?Vector#GRangesList? chosen > for function > ?countOverlaps?, > target signature ?GappedAlignments#GRangesList?____. > > "GappedAlignments#Vector" would also be valid > Note: method with signature ?GRangesList#Vector? chosen > for function > ?countOverlaps?, > target signature ?GRangesList#GappedAlignments?____. > > "Vector#GappedAlignments" would also be valid > Warning messages: > 1: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels > not in > the other: > - in 'x': gi|448814763|ref|NC_000962.3| > - in 'y': Chromosome > Make sure to always combine/compare objects based > on the > same reference > genome (use suppressWarnings() to suppress this > warning). > 2: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels > not in > the other: > - in 'x': Chromosome > - in 'y': gi|448814763|ref|NC_000962.3| > Make sure to always combine/compare objects based > on the > same reference > genome (use suppressWarnings() to suppress this > warning). > 3: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels > not in > the other: > - in 'x': gi|448814763|ref|NC_000962.3| > - in 'y': Chromosome > Make sure to always combine/compare objects based > on the > same reference > genome (use suppressWarnings() to suppress this > warning). > 4: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels > not in > the other: > - in 'x': Chromosome > - in 'y': gi|448814763|ref|NC_000962.3| > Make sure to always combine/compare objects based > on the > same reference > genome (use suppressWarnings() to suppress this > warning). > 5: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels > not in > the other: > - in 'x': gi|448814763|ref|NC_000962.3| > - in 'y': Chromosome > Make sure to always combine/compare objects based > on the > same reference > genome (use suppressWarnings() to suppress this > warning). > 6: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels > not in > the other: > - in 'x': Chromosome > - in 'y': gi|448814763|ref|NC_000962.3| > Make sure to always combine/compare objects based > on the > same reference > genome (use suppressWarnings() to suppress this > warning). > > head(counts) > > SRR568038 SRR568039 SRR568040 > RVBD_0001 0 0 0 > RVBD_0002 0 0 0 > RVBD_0003 0 0 0 > RVBD_0004 0 0 0 > RVBD_0005 0 0 0 > RVBD_0006 0 0 0 > > sessionInfo() > > R version 3.0.1 (2013-05-16) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 > [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils > datasets > methods > [8] base > > other attached packages: > [1] Rsamtools_1.12.3 Biostrings_2.28.0 > GenomicFeatures_1.12.3 > [4] AnnotationDbi_1.22.6 Biobase_2.20.1 > GenomicRanges_1.12.4 > [7] IRanges_1.18.2 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.16.0 bitops_1.0-5 > BSgenome_1.28.0 > DBI_0.2-6 > > [5] RCurl_1.95-4.1 RSQLite_0.11.3 > rtracklayer_1.20.4 > stats4_3.0.1 > > [9] tools_3.0.1 XML_3.96-1.1 zlibbioc_1.6.0 > > > Kind regards > > > > > ___________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">> > https://stat.ethz.ch/mailman/____listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> > > <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">> > Search the archives: > http://news.gmane.org/gmane.____science.biology.informatics. ____conductor > <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor=""> > > > <http: news.gmane.org="" gmane.__science.biology.informatics.__conductor=""> <http: news.gmane.org="" gmane.science.biology.informatics.conductor="">> > > > > > -- > Reema Singh > PhD Scholar > Computational Biology and Bioinformatics > School of Computational and Integrative Sciences > Jawaharlal Nehru University > New Delhi-110067 > INDIA > > > > > > -- > Reema Singh > PhD Scholar > Computational Biology and Bioinformatics > School of Computational and Integrative Sciences > Jawaharlal Nehru University > New Delhi-110067 > INDIA
ADD REPLYlink written 6.3 years ago by Valerie Obenchain6.7k
Hi Valerie, There is only one chromosome entry. This is the genome of Mycobacterium tuberculosis H37Rv. There is a very strange thing, for the gene count analysis when I am used NC_000962.gff ( ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Mycobacterium_tuberculosis _H37Rv_uid57777/ ), I am getting error in makeTranscriptDbFromGFF. > txdb <- makeTranscriptDbFromGFF(file="NC_000962.gff") extracting transcript information Error in .prepareGFF3TXS(data) : No Transcript information present in gff file So I have tried another way to use gff file for the read count. But again getting zero count. library(rtracklayer) gff <- import.gff( "NC_000962.gff", asRangedData=FALSE) seqlengths(gff) <- end(ranges(gff[which(elementMetadata(gff)[,"type"]=="region"),]))[1] subgene_index <- which(elementMetadata(gff)[,"type"] == "gene") gffsub <- gff[subgene_index,] strand(gffsub) <- "*" ids <- as.character(elementMetadata(gffsub)[, "Dbxref"]) gffsub <- split(gffsub, ids) samples <- as.character(targets$Fastq) samplespath <- paste("./", samples, ".bam", sep="") names(samplespath) <- samples bfl <-Rsamtools:: BamFileList(samplespath, index=character()) countDF2 <- GenomicRanges::summarizeOverlaps(gffsub, bfl, mode="Union", ignore.strand=TRUE) countDF2<- as.matrix(assays(countDF2)$counts) rownames(countDF2) <- gsub(".*=", "", rownames(countDF2)) > head(countDF2) SRR568038.fastq SRR568039.fastq SRR568040.fastq GeneID:11117810 0 0 0 GeneID:11117811 0 0 0 GeneID:11117812 0 0 0 GeneID:14515843 0 0 0 GeneID:14515844 0 0 0 GeneID:14515845 0 0 0 I have tried this with cds, exon but still gets the same zero count. Kind regards On Thu, Aug 8, 2013 at 9:11 PM, Valerie Obenchain <vobencha@fhcrc.org>wrote: > Hi, > > I think you are on the right track. There are still a couple of things > that don't look right. > > A bam file often has more than one chromosome. In your example I see only > one chromosome named 'Chromosome'. Is there really only one chromosome in > the file? The idea behind changing the names of the seqlevels is to make > something like 'chr22' and '22' match by adding the 'chr' to the second. > Then we have 'chr22' and 'chr22'. (Equivalently you could remove the 'chr' > to get '22' and '22'). The chromosomes names must be preserved at some > level so that chromosomes from the bam are being overlapped with the same > chromosomes from the gff file. > > In the counter function below you check the strand of 'aln' but don't do > anything with it. Did you want to change the strand? As an fyi, you can > ignore strand when performing overlaps by setting the 'ignore.strand' > argument to TRUE. > > Valerie > > > > > On 08/08/2013 12:17 AM, Reema Singh wrote: > >> >> >> >> On Tue, Aug 6, 2013 at 9:13 PM, Valerie Obenchain <vobencha@fhcrc.org>> <mailto:vobencha@fhcrc.org>> wrote: >> >> I would do some investigating with a single bam file. >> >> Confirm 'gnModel' and 'aln' have some common seqlevels. This call >> should produce a result. >> >> intersect(seqlevels(aln), seqlevels(gnModel)) >> >> >> SRR568038 SRR568039 SRR568040 >> "Chromosome" "Chromosome" "Chromosome" >> >> >> Call countOverlaps on a single file. >> >> co <- countOverlaps(aln, gnModel) >> >> > aln <- GenomicRanges::**readGappedAlignments(bamFiles[**1]) >> > head(aln) >> GappedAlignments with 6 alignments and 0 metadata columns: >> seqnames strand cigar qwidth start end >> width >> <rle> <rle> <character> <integer> <integer> <integer> >> <integer> >> [1] Chromosome + 39M 39 2 40 >> 39 >> [2] Chromosome + 39M 39 2 40 >> 39 >> [3] Chromosome + 39M 39 4 42 >> 39 >> [4] Chromosome + 39M 39 4 42 >> 39 >> [5] Chromosome + 39M 39 5 43 >> 39 >> [6] Chromosome + 39M 39 5 43 >> 39 >> ngap >> <integer> >> [1] 0 >> [2] 0 >> [3] 0 >> [4] 0 >> [5] 0 >> [6] 0 >> --- >> seqlengths: >> Chromosome >> 4411708 >> >> >> Evidently you only want the bam records that have exactly 5 hits. >> This could be limiting. To see the distribution of hits make a table >> of the counts. >> >> table(co) >> >> > co <- countOverlaps(aln, gnModel) >> > table(co) >> co >> 0 1 2 >> 575536 33179936 17850 >> >> You are right, when I changed the exactly bam hit list from % to all. >> Then I got some count. Here's the table. >> >> > head(counts) >> SRR568038 SRR568039 SRR568040 >> RVBD_0001 456 1331 94 >> RVBD_0002 258 1714 64 >> RVBD_0003 270 2090 90 >> RVBD_0004 122 536 15 >> RVBD_0005 2487 23056 997 >> RVBD_0006 2343 23013 814 >> >> I hope this is right. Kindly correct me, if you something wrong here. >> >> >> Kind Regards >> >> >> >> >> Valerie >> >> >> >> On 08/05/2013 10:05 PM, Reema Singh wrote: >> >> Hi Valerie, >> >> Thank you so much for the reply. >> >> After checking the seqlevels, I am able to get rid off the >> error, but >> still getting the zero count entries. Is there any another way >> of doing >> this? >> >> KInd Regards >> >> >> On Mon, Aug 5, 2013 at 9:21 PM, Valerie Obenchain >> <vobencha@fhcrc.org <mailto:vobencha@fhcrc.org=""> >> <mailto:vobencha@fhcrc.org <mailto:vobencha@fhcrc.org="">>> wrote: >> >> Hi Reema, >> >> To perform overlap or matching operations the seqlevels >> (chromosome >> names) of the objects must match. The error message is >> telling you >> that some of these do not match. It's reasonable that a few >> names >> may not match (maybe a chromosome is present in one object >> and not >> the other) but the majority should. >> >> Check the seqlevels: >> seqlevels(aln) >> seqlevels(gnModel) >> >> Which names are common to both: >> intersect(seqlevels(gnModel), seqlevels(aln)) >> >> You can rename seqlevels in several different ways. See >> ?renameSeqlevels or ?seqlevels for examples. >> >> Valerie >> >> >> On 08/04/2013 06:35 AM, Reema Singh wrote: >> >> Dear All, >> >> I am trying to extract the read count from three .bam >> files. But >> I am >> getting Zero count entries. >> >> I am using Mycobacterium Tuberculosis H37Rv gtf file ( >> ftp://ftp.ensemblgenomes.org/_**___pub/release-19/bacteria//** >> gtf/____bacteria_1_collection/**____mycobacterium_** >> tuberculosis_____h37rv/**Mycobacterium_____**tuberculosis_h37rv.GCA _____* >> *000277735.1.19.gtf.gz<ftp: ftp.ensemblgenomes.org="" ____pub="" release="" -19="" bacteria="" gtf="" ____bacteria_1_collection="" ____mycobacterium_tubercul="" osis_____h37rv="" mycobacterium_____tuberculosis_h37rv.gca_____000277735.="" 1.19.gtf.gz=""> >> <ftp: ftp.ensemblgenomes.org="" **__pub="" release-19="" bacteria="" **="">> gtf/__bacteria_1_collection/__**mycobacterium_tuberculosis___** >> h37rv/Mycobacterium___**tuberculosis_h37rv.GCA___**000277735.1.19.g tf.gz<ftp: ftp.ensemblgenomes.org="" __pub="" release-19="" bacteria="" gtf="" __ba="" cteria_1_collection="" __mycobacterium_tuberculosis___h37rv="" mycobacterium="" ___tuberculosis_h37rv.gca___000277735.1.19.gtf.gz=""> >> > >> >> >> <ftp: ftp.ensemblgenomes.org="" **__pub="" release-19="" bacteria="" **="">> gtf/__bacteria_1_collection/__**mycobacterium_tuberculosis___** >> h37rv/Mycobacterium___**tuberculosis_h37rv.GCA___**000277735.1.19.g tf.gz<ftp: ftp.ensemblgenomes.org="" __pub="" release-19="" bacteria="" gtf="" __ba="" cteria_1_collection="" __mycobacterium_tuberculosis___h37rv="" mycobacterium="" ___tuberculosis_h37rv.gca___000277735.1.19.gtf.gz=""> >> <ftp: ftp.ensemblgenomes.org="" **pub="" release-19="" bacteria="" gtf="" **="">> bacteria_1_collection/**mycobacterium_tuberculosis_**h37rv/Mycobact erium_ >> **tuberculosis_h37rv.GCA_**000277735.1.19.gtf.gz<ftp: ftp.ensemblg="" enomes.org="" pub="" release-19="" bacteria="" gtf="" bacteria_1_collection="" mycobact="" erium_tuberculosis_h37rv="" mycobacterium_tuberculosis_h37rv.gca_00027773="" 5.1.19.gtf.gz=""> >> >>) >> >> and RNASeq data used here were downloaded from ( >> http://www.ncbi.nlm.nih.gov/__**__geo/query/acc.cgi?acc=** >> GSE40846 <http: www.ncbi.nlm.nih.gov="" ____geo="" query="" acc.cgi?acc="GSE40846"> >> <http: www.ncbi.nlm.nih.gov="" _**_geo="" query="" acc.cgi?acc="**GS" e40846<http:="" www.ncbi.nlm.nih.gov="" __geo="" query="" acc.cgi?acc="GSE40846"> >> > >> >> <http: www.ncbi.nlm.nih.gov="" _**_geo="" query="" acc.cgi?acc="**GS" e40846<http:="" www.ncbi.nlm.nih.gov="" __geo="" query="" acc.cgi?acc="GSE40846"> >> <http: www.ncbi.nlm.nih.gov="" **geo="" query="" acc.cgi?acc="GSE408" 46<http:="" www.ncbi.nlm.nih.gov="" geo="" query="" acc.cgi?acc="GSE40846"> >> **>__>__) >> >> >> and aligned >> with bowtie2. >> >> >> library(GenomicFeatures) >> txdb <- >> >> makeTranscriptDbFromGFF(file="**____Mycobacterium_** >> tuberculosis_____h37rv.GCA_**000277735.1.19.gtf"__,__**format="gtf") >> saveDb(txdb,file="____**MycoTubeH37Rv.sqlite") >> >> >> load("MycoTubeH37Rv.sqlite") >> gnModel <- exonsBy(txdb,"gene") ### *also tried with >> "transcripts", "cds", >> but getting same * >> >> >> bamFiles <- list.files(".", "bam$", full=TRUE) >> names(bamFiles) <- sub("\\..*","",basename(____** >> bamFiles)) >> counter <- function(fl, gnModel){ >> aln <- GenomicRanges::____**readGappedAlignments(fl) >> strand(aln) >> hits <- countOverlaps(aln,gnModel) >> counts <- countOverlaps(gnModel,aln[____**hits==5]) >> names(counts) <- names(gnModel) >> counts >> } >> >> counts <- sapply(bamFiles,counter,____**gnModel) >> >> >> >> Note: method with signature ‘Vector#GRangesList’ chosen >> for function >> ‘countOverlaps’, >> target signature ‘GappedAlignments#GRangesList’** >> ____. >> >> >> "GappedAlignments#Vector" would also be valid >> Note: method with signature ‘GRangesList#Vector’ chosen >> for function >> ‘countOverlaps’, >> target signature ‘GRangesList#GappedAlignments’** >> ____. >> >> >> "Vector#GappedAlignments" would also be valid >> Warning messages: >> 1: In .Seqinfo.mergexy(x, y) : >> Each of the 2 combined objects has sequence levels >> not in >> the other: >> - in 'x': gi|448814763|ref|NC_000962.3| >> - in 'y': Chromosome >> Make sure to always combine/compare objects based >> on the >> same reference >> genome (use suppressWarnings() to suppress this >> warning). >> 2: In .Seqinfo.mergexy(x, y) : >> Each of the 2 combined objects has sequence levels >> not in >> the other: >> - in 'x': Chromosome >> - in 'y': gi|448814763|ref|NC_000962.3| >> Make sure to always combine/compare objects based >> on the >> same reference >> genome (use suppressWarnings() to suppress this >> warning). >> 3: In .Seqinfo.mergexy(x, y) : >> Each of the 2 combined objects has sequence levels >> not in >> the other: >> - in 'x': gi|448814763|ref|NC_000962.3| >> - in 'y': Chromosome >> Make sure to always combine/compare objects based >> on the >> same reference >> genome (use suppressWarnings() to suppress this >> warning). >> 4: In .Seqinfo.mergexy(x, y) : >> Each of the 2 combined objects has sequence levels >> not in >> the other: >> - in 'x': Chromosome >> - in 'y': gi|448814763|ref|NC_000962.3| >> Make sure to always combine/compare objects based >> on the >> same reference >> genome (use suppressWarnings() to suppress this >> warning). >> 5: In .Seqinfo.mergexy(x, y) : >> Each of the 2 combined objects has sequence levels >> not in >> the other: >> - in 'x': gi|448814763|ref|NC_000962.3| >> - in 'y': Chromosome >> Make sure to always combine/compare objects based >> on the >> same reference >> genome (use suppressWarnings() to suppress this >> warning). >> 6: In .Seqinfo.mergexy(x, y) : >> Each of the 2 combined objects has sequence levels >> not in >> the other: >> - in 'x': Chromosome >> - in 'y': gi|448814763|ref|NC_000962.3| >> Make sure to always combine/compare objects based >> on the >> same reference >> genome (use suppressWarnings() to suppress this >> warning). >> >> head(counts) >> >> SRR568038 SRR568039 SRR568040 >> RVBD_0001 0 0 0 >> RVBD_0002 0 0 0 >> RVBD_0003 0 0 0 >> RVBD_0004 0 0 0 >> RVBD_0005 0 0 0 >> RVBD_0006 0 0 0 >> >> sessionInfo() >> >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-redhat-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C >> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 >> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils >> datasets >> methods >> [8] base >> >> other attached packages: >> [1] Rsamtools_1.12.3 Biostrings_2.28.0 >> GenomicFeatures_1.12.3 >> [4] AnnotationDbi_1.22.6 Biobase_2.20.1 >> GenomicRanges_1.12.4 >> [7] IRanges_1.18.2 BiocGenerics_0.6.0 >> >> loaded via a namespace (and not attached): >> [1] biomaRt_2.16.0 bitops_1.0-5 >> BSgenome_1.28.0 >> DBI_0.2-6 >> >> [5] RCurl_1.95-4.1 RSQLite_0.11.3 >> rtracklayer_1.20.4 >> stats4_3.0.1 >> >> [9] tools_3.0.1 XML_3.96-1.1 >> zlibbioc_1.6.0 >> >> >> Kind regards >> >> >> >> >> ______________________________**_____________________ >> >> Bioconductor mailing list >> Bioconductor@r-project.org <mailto:bioconductor@r-**project.org<bioconductor@r-project.org> >> > >> <mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org> >> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org=""> >> >> >> https://stat.ethz.ch/mailman/_**___listinfo/bioconductor<ht tps:="" stat.ethz.ch="" mailman="" ____listinfo="" bioconductor=""> >> <https: stat.ethz.ch="" mailman="" **__listinfo="" bioconductor<htt="" ps:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> >> > >> >> >> <https: stat.ethz.ch="" mailman="" **__listinfo="" biocond="" uctor<https:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> >> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor<https="" :="" stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >> >> >> Search the archives: >> http://news.gmane.org/gmane.__**__science.biology.informatics.** >> ____conductor<http: news.gmane.org="" gmane.____science.biology.infor="" matics.____conductor=""> >> <http: news.gmane.org="" gmane._**_science.biology.informatics._**="">> _conductor<http: news.gmane.org="" gmane.__science.biology.informatic="" s.__conductor=""> >> > >> >> >> >> <http: news.gmane.org="" gmane._**_science.biology.informatics._**="">> _conductor<http: news.gmane.org="" gmane.__science.biology.informatic="" s.__conductor=""> >> <http: news.gmane.org="" gmane.**science.biology.informatics.**="">> conductor<http: news.gmane.org="" gmane.science.biology.informatics.c="" onductor=""> >> >> >> >> >> >> >> -- >> Reema Singh >> PhD Scholar >> Computational Biology and Bioinformatics >> School of Computational and Integrative Sciences >> Jawaharlal Nehru University >> New Delhi-110067 >> INDIA >> >> >> >> >> >> -- >> Reema Singh >> PhD Scholar >> Computational Biology and Bioinformatics >> School of Computational and Integrative Sciences >> Jawaharlal Nehru University >> New Delhi-110067 >> INDIA >> > > -- Reema Singh PhD Scholar Computational Biology and Bioinformatics School of Computational and Integrative Sciences Jawaharlal Nehru University New Delhi-110067 INDIA [[alternative HTML version deleted]]
ADD REPLYlink written 6.3 years ago by Reema Singh570
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 308 users visited in the last hour