Count reads with package "bamsignals" from a female bam file (no chrY)
2
1
Entering edit mode
steige_e ▴ 10
@steige_e-7978
Last seen 9.0 years ago
Germany

Hi!

I got my gene annotation as a GRanges-object from Biomart. Now I try to count the reads from a bamfile using this annotation with the bamCount function from the bamsignals package. Unfortunately this otherwise excellent function gives me an

Error: chromosome chrY not present in the bam file

(no surprise, since the data comes from a woman). Since I look at more bamfiles which are from men and women, I'd like something like NA for the chrY gene counts. What would be the best way to tackle this problem? I'm sure I'm not the first one to encounter a woman.

Thanks, Edgar

bamsignals genomicfeatures • 1.3k views
ADD COMMENT
2
Entering edit mode
@martin-morgan-1513
Last seen 19 days ago
United States

Usually reads are aligned to reference genomes, and reference genomes contain chry, so probably your up-stream analysis has done something different from usual and excluded chrY for this sample.

Here's a bam file, and a way to discover the 'seqlevels' (chromsomes) it knows about

> fl = system.file("extdata", "ex1.bam", package="Rsamtools")
> seqlevels(BamFile(fl))
[1] "seq1" "seq2"

Here's a GRanges with another chromosome

> gr
GRanges object with 3 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     seq1   [1, 10]      *
  [2]     seq2   [1, 10]      *
  [3]     seq3   [1, 10]      *
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

And finally an update to the GRanges to keep only those seqlevels present in the BAM file

> keepSeqlevels(gr, seqlevels(BamFile(fl)))
GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     seq1   [1, 10]      *
  [2]     seq2   [1, 10]      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

 

 

ADD COMMENT
1
Entering edit mode
mammana ▴ 20
@mammana-7957
Last seen 8.4 years ago
European Union

Thanks for the post. To insert NAs for those GRanges in chromosomes that are not in the header of the bam file is a solution I already thought about. It is probably not a very clean solution, but on the other hand changing the header of bam files is much more time consuming. So I will add this feature in the development version of bamsignals.

To comment on the already excellent answer given by Martin, and to illustrate the future behaviour of bamsignals, this is a workaround:

>gr
GRanges object with 3 ranges and 0 metadata columns:
      seqnames       ranges strand
         <Rle>    <IRanges>  <Rle>
  [1]     chr3 [5000, 5009]      *
  [2]     chr3 [6000, 6009]      *
  [3]     chr4 [1000, 1009]      *

> seqlevels(BamFile(bampath))
[1] "chr1" "chr2" "chr3"
> counts <- rep(NA, length(gr))
> valid <- as.logical(seqnames(gr) %in% seqlevels(BamFile(bampath)))
> counts[valid] <- bamCount(bampath, gr[valid])

 

Thanks for the suggestion!

Best,

Ale

ADD COMMENT

Login before adding your answer.

Traffic: 788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6