RNA-Seq : zero counts
1
0
Entering edit mode
Brian Smith ▴ 120
@brian-smith-6197
Last seen 3.5 years ago
United States

Hi,

This is my first time with RNA-Seq analysis, so I might be doing something wrong! I got some bam files from a collaborator that I need to process. I followed the pipeline, but I seem to be getting 0 counts! The warnings given are "The 2 combined objects have no sequence levels in common." What do I need to do?

I followed the bioconductor pipeline https://www.bioconductor.org/help/course-materials/2016/CSAMA/lab-3-rnaseq/rnaseq_gene_CSAMA2016.html

Here is my code and screen output:

*********************

library("Rsamtools")
library("GenomicFeatures")
library("GenomicAlignments")
library("BiocParallel")
register(MulticoreParam(1))

refdir <- "/home/xxxx/ref_databases/rna_gtf/"
refgtf <- paste0(refdir,"Homo_sapiens.GRCh38.90.gtf") # downloaded from ftp://ftp.ensembl.org/pub/release-90/gtf/homo_sapiens


> txdb <- makeTxDbFromGFF(refgtf, format="gtf")
Import genomic features from the file as a GRanges object ... 
OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .get_cds_IDX(type, phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.

> ebg <- exonsBy(txdb, by="gene")
> ebg
GRangesList object of length 58302:
$ENSG00000000003 
GRanges object with 20 ranges and 2 metadata columns:
       seqnames                 ranges strand |   exon_id       exon_name
          <Rle>              <IRanges>  <Rle> | <integer>     <character>
   [1]        X [100627109, 100629986]      - |    676055 ENSE00003730948
   [2]        X [100628670, 100629986]      - |    676056 ENSE00001459322
   [3]        X [100630759, 100630866]      - |    676057 ENSE00000868868
   [4]        X [100632063, 100632068]      - |    676058 ENSE00003731560
   [5]        X [100632485, 100632568]      - |    676059 ENSE00000401072
   ...      ...                    ...    ... .       ...             ...
  [16]        X [100635558, 100635746]      - |    676070 ENSE00003733424
  [17]        X [100636191, 100636689]      - |    676071 ENSE00001886883
  [18]        X [100636608, 100636806]      - |    676072 ENSE00001855382
  [19]        X [100636793, 100637104]      - |    676073 ENSE00001863395
  [20]        X [100639945, 100639991]      - |    676074 ENSE00001828996

...
<58301 more elements>
-------
seqinfo: 47 sequences (1 circular) from an unspecified genome; no seqlengths

> bamfiles <- BamFileList(filenames[1:2], yieldSize=1000000)

> se <- summarizeOverlaps(features=ebg, 
+                         reads=bamfiles,
+                         mode="Union",
+                         singleEnd=FALSE,
+                         ignore.strand=TRUE,
+                         fragments=TRUE )


There were 28 warnings (use warnings() to see them)

> warnings()
Warning messages:
1: call dbDisconnect() when finished working with a connection
2: In .Seqinfo.mergexy(x, y) :
  The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
3: In .Seqinfo.mergexy(x, y) :
  The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)



> print(se)
class: RangedSummarizedExperiment 
dim: 58302 2 
metadata(0):
assays(1): counts
rownames(58302): ENSG00000000003 ENSG00000000005 ... ENSG00000284747
  ENSG00000284748
rowData names(0):
colnames(2):
  run1798_lane12_read_indexN701-S517=BR-337-All_Aligned_Reads.bam
  run1798_lane12_read_indexN702-S502=BR-101-All_Aligned_Reads.bam
colData names(0):




> sum(assay(se))
[1] 0

*****************************

How/What do I troubleshoot? The gtf file seems ok (each line has ENS ids, coordinates etc.).

thanks!

 

> sessionInfo()

R version 3.4.2 (2017-09-28)

Platform: x86_64-redhat-linux-gnu (64-bit)

Running under: Scientific Linux 7.3 (Nitrogen)

 

Matrix products: default

BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

 

locale:

[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              

[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    

[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   

[7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 

[9] LC_ADDRESS=C               LC_TELEPHONE=C            

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

 

attached base packages:

[1] stats4    parallel  stats     graphics  grDevices utils     datasets

[8] methods   base     

 

other attached packages:

[1] BiocParallel_1.12.0        Rsubread_1.28.0           

[3] GenomicAlignments_1.14.1   SummarizedExperiment_1.8.0

[5] DelayedArray_0.4.1         matrixStats_0.52.2        

[7] GenomicFeatures_1.30.0     AnnotationDbi_1.40.0      

[9] Biobase_2.38.0             Rsamtools_1.30.0          

[11] Biostrings_2.46.0          XVector_0.18.0            

[13] GenomicRanges_1.30.0       GenomeInfoDb_1.14.0       

[15] IRanges_2.12.0             S4Vectors_0.16.0          

[17] BiocGenerics_0.24.0       

 

loaded via a namespace (and not attached):

[1] Rcpp_0.12.14            compiler_3.4.2          prettyunits_1.0.2      

[4] bitops_1.0-6            tools_3.4.2             zlibbioc_1.24.0        

[7] progress_1.1.2          biomaRt_2.34.0          digest_0.6.12          

[10] bit_1.1-12              lattice_0.20-35         RSQLite_2.0            

[13] memoise_1.1.0           tibble_1.3.4            pkgconfig_2.0.1        

[16] rlang_0.1.4             Matrix_1.2-11           DBI_0.7                

[19] GenomeInfoDbData_0.99.1 rtracklayer_1.38.0      stringr_1.2.0          

[22] grid_3.4.2              bit64_0.9-7             R6_2.2.2               

[25] XML_3.98-1.9            RMySQL_0.10.13          blob_1.1.0             

[28] magrittr_1.5            assertthat_0.2.0        stringi_1.1.6          

[31] RCurl_1.95-4.8

rnaseq bioconductor • 1.6k views
ADD COMMENT
1
Entering edit mode

Hi,

Have you checked that the chromosome names in your bam files corresponds to the annotations you are using? If there is mismatches in chromosome name between annotations and bams (most often either 1 or chr1 for chromosome 1) you might get 0 counts.

ADD REPLY
1
Entering edit mode

Thanks thokall. You were right!

ADD REPLY
2
Entering edit mode
@james-w-macdonald-5106
Last seen 9 hours ago
United States

These two errors:

2: In .Seqinfo.mergexy(x, y) :
  The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
3: In .Seqinfo.mergexy(x, y) :
  The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)

Are saying that your bamfiles and GRanges objects have no sequences in common. My guess is that you have aligned your data against a UCSC genome, rather than Ensembl. Mixing those two is probably not ideal. In addition, you are probably working too hard to do this. You could just do

library(BiocInstaller)

biocLite("TxDb.Hsapiens.UCSC.hg38.knownGene")

library(TxDb.Hsapiens.UCSC.hg38.knownGene)

ebg <- exonsBy(TxDb.Hsapiens.UCSC.hg38.knownGene, by="gene")

And proceeding from there.

ADD COMMENT
0
Entering edit mode

Hi James,

Thanks for the reply. Yep, you were right - checking back, it was aligned using UCSC while I was trying with ensembl. 

As per your suggestion above, I tried using UCSC but get a new error:

 

> ebg <- exonsBy(TxDb.Hsapiens.UCSC.hg38.knownGene, by="gene")

> bamfiles <- BamFileList(filenames[1:5], yieldSize=1000000)

> se <- summarizeOverlaps(features=ebg,

+                         reads=bamfiles,

+                         mode="Union",

+                         singleEnd=FALSE,

+                         ignore.strand=TRUE,

+                         fragments=TRUE )

Error: BiocParallel errors

  element index: 1, 2, 3, 4, 5

  first error: sequences chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY have incompatible seqlengths:

  - in 'x': 248956422, 242193529, 198295559, 190214555, 181538259, 170805979, 159345973, 145138636, 138394717, 133797422, 135086622, 133275309, 114364328, 107043718, 101991189, 90338345, 83257441, 80373285, 58617616, 64444167, 46709983, 50818468, 156040895, 57227415

  - in 'y': 249250621, 243199373, 198022430, 191154276, 180915260, 171115067, 159138663, 146364022, 141213431, 135534747, 135006516, 133851895, 115169878, 107349540, 102531392, 90354753, 81195210, 78077248, 59128983, 63025520, 48129895, 51304566, 155270560, 59373566 

 

>

>

> traceback()

9: stop(.error_bplist(res))

8: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = BPPARAM)

7: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = BPPARAM)

6: bplapply(setNames(seq_along(reads), names(reads)), function(i,

       FUN, reads, features, mode, ignore.strand, inter.feature,

       param, preprocess.reads, ...) {

       bf <- reads[[i]]

       .countWithYieldSize(FUN, features, bf, mode, ignore.strand,

           inter.feature, param, preprocess.reads, ...)

   }, FUN, reads, features, mode = match.fun(mode), ignore.strand = ignore.strand,

       inter.feature = inter.feature, param = param, preprocess.reads = preprocess.reads,

       ...)

5: bplapply(setNames(seq_along(reads), names(reads)), function(i,

       FUN, reads, features, mode, ignore.strand, inter.feature,

       param, preprocess.reads, ...) {

       bf <- reads[[i]]

       .countWithYieldSize(FUN, features, bf, mode, ignore.strand,

           inter.feature, param, preprocess.reads, ...)

   }, FUN, reads, features, mode = match.fun(mode), ignore.strand = ignore.strand,

       inter.feature = inter.feature, param = param, preprocess.reads = preprocess.reads,

       ...)

4: .dispatchBamFiles(features, reads, mode, ignore.strand, inter.feature = inter.feature,

       singleEnd = singleEnd, fragments = fragments, param = param,

       preprocess.reads = preprocess.reads, ...)

3: .local(features, reads, mode, ignore.strand, ...)

2: summarizeOverlaps(features = ebg, reads = bamfiles, mode = "Union",

       singleEnd = FALSE, ignore.strand = TRUE, fragments = TRUE)

1: summarizeOverlaps(features = ebg, reads = bamfiles, mode = "Union",

       singleEnd = FALSE, ignore.strand = TRUE, fragments = TRUE)

 

Not sure how I should go about correcting this!

 

ADD REPLY
0
Entering edit mode

You are still mixing things, but this time it's hg19 and hg38.

> library(BSgenome.Hsapiens.UCSC.hg19)

> seqinfo(Hsapiens)
Seqinfo object with 93 sequences (1 circular) from hg19 genome:
  seqnames       seqlengths isCircular genome
  chr1            249250621      FALSE   hg19
  chr2            243199373      FALSE   hg19
  chr3            198022430      FALSE   hg19
  chr4            191154276      FALSE   hg19
  chr5            180915260      FALSE   hg19
  ...                   ...        ...    ...
  chrUn_gl000245      36651      FALSE   hg19
  chrUn_gl000246      38154      FALSE   hg19
  chrUn_gl000247      36422      FALSE   hg19
  chrUn_gl000248      39786      FALSE   hg19
  chrUn_gl000249      38502      FALSE   hg19

> library(BSgenome.Hsapiens.UCSC.hg38)

> seqinfo(Hsapiens)
Seqinfo object with 455 sequences (1 circular) from hg38 genome:
  seqnames         seqlengths isCircular genome
  chr1              248956422      FALSE   hg38
  chr2              242193529      FALSE   hg38
  chr3              198295559      FALSE   hg38
  chr4              190214555      FALSE   hg38
  chr5              181538259      FALSE   hg38

So for example, chr1 in hg38 is 248956422 bp long, and in hg19 it's 249250621 bp. Your error indicates the reads were aligned against hg19, and now you are trying to count overlaps using hg38.

 

ADD REPLY
0
Entering edit mode

ah, I now see what the error message was trying to tell me!

James, thanks for all your help!!

ADD REPLY

Login before adding your answer.

Traffic: 878 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6