Question

Question: qCount in QuasR for RNAseq quantification

1

Entering edit mode

tarek.mohamed ▴ 10

@tarekmohamed-9489

Last seen 5.3 years ago

Hi All,

I am using QuasR to analyze mt paired end RNAseq data, now I have bam files for my 12 samples and the next step is to generate a count table for Differential gene expression analysis. I follwed the QuasR vignette and I used qCount() but I had a problem (R could not find my genome file)

sampleFile <- "DOX_RNAseq.txt"
genomeFile<-"BSgenome.Hsapiens.NCBI.GRCh38"
proj_RNASeq<- qAlign(sampleFile, genomeFile,cacheDir = "E:/Doxorubicin Project RNA-seq Data")

library(GenomicFeatures)
annotationFile <- "BSgenome.Hsapiens.NCBI.GRCh38_annotation.gtf"
chrLen <- scanFaIndex(genomeFile)

Error in scanFaIndex(open(FaFile(file))) :
error in evaluating the argument 'file' in selecting a method for function 'scanFaIndex': Error in .io_check_exists(path(con)) : file(s) do not exist:
'BSgenome.Hsapiens.NCBI.GRCh38'

Thanks

Tarek

quasr error • 2.3k views

ADD COMMENT • link updated 8.3 years ago by Hotz, Hans-Rudolf ▴ 400 • written 8.3 years ago by tarek.mohamed ▴ 10

score 1 · Answer 1 · 2016-01-15

1

Entering edit mode

Hotz, Hans-Rudolf ▴ 400

@hotz-hans-rudolf-3951

Last seen 3.5 years ago

Switzerland

Hi Tarek

It looks like you are mixing up two different things here - though, I admit, QuasR is confusing in this regards.

First, you are running 'qAlign'. Here, you provide the string "BSgenome.Hsapiens.NCBI.GRCh38" for the second, ie the 'genome' argument. You do not provide a file (path) to a genome in fasta format, but you provide the name of a BSgenome package, which will be downloaded automatically from Bioconductor.

Next, you work with an annotation file (I assume you want to create a 'GRanges' or 'GRangesList' in order to provide it as the 'query' argument to 'qCount', do you?). Here, I don't understand why you are calling the 'scanFaIndex'. Nevertheles, 'scanFaIndex' requires a file (path) to a genome in fasta format. But you are just providing the name of a BSgenome package, which is obvioulsy not a fasta file in your path. Hence it is not a QuasR error, isn't-it?

So, you first need to make a local fasta file out of the BSgenome package:

export(BSgenome.Hsapiens.NCBI.GRCh38, "BSgenome.Hsapiens.NCBI.GRCh38.fasta")

and then you can call

scanFaIndex("BSgenome.Hsapiens.NCBI.GRCh38.fasta")

Hans-Rudolf

PS: Have you considered working with a TxDb as query?

ADD COMMENT • link 8.3 years ago Hotz, Hans-Rudolf ▴ 400

0

Entering edit mode

Hi Hans,

Thanks for you reply

-Sorry for the confusion, actually I am trying to generate a TxDB as query. According to QausR vignette they generate a TxDB using this script. They used scanFaIndex first then they proceed till TxDb

library(GenomicFeatures)
> annotFile <- "extdata/hg19sub_annotation.gtf"
> chrLen <- scanFaIndex(genomeFile)
> chrominfo <- data.frame(chrom=as.character(seqnames(chrLen)),length=width(chrLen),is_circular=rep(FALSE, length(chrLen)))
> txdb <- makeTxDbFromGFF(file=annotFile, format="gtf",chrominfo=chrominfo, dataSource="Ensembl",
organism="Homo sapiens")

- Can I get annotation file from BSgenome? so that I can use makeTxDbFromGFF () to get TxDb

- Alternatively, I tried to generate a TxDb from biomart which was successful, but when calling qCount() I got an error (see below)

>TxDb<- makeTxDbFromBiomart(dataset="hsapiens_gene_ensembl")

>TxDb

TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: BioMart
# Organism: Homo sapiens
# Taxonomy ID: 9606
# Resource URL: www.ensembl.org:80
# BioMart database: ENSEMBL_MART_ENSEMBL
# BioMart database version: Ensembl Genes 83
# BioMart dataset: hsapiens_gene_ensembl
# BioMart dataset description: Homo sapiens genes (GRCh38.p5)
# BioMart dataset version: GRCh38.p5
# Full dataset: yes
# miRBase build ID: NA
# transcript_nrow: 217425
# exon_nrow: 735779
# cds_nrow: 295205
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2016-01-15 14:29:41 -0600 (Fri, 15 Jan 2016)
# GenomicFeatures version at creation time: 1.22.7
# RSQLite version at cr

>proj_RNASeq_geneLevels <- qCount(proj_RNASeq, TxDb, reportLevel="gene")

Error in qCount(proj_RNASeq, TxDb, reportLevel = "gene") :
sequence levels in 'query' not found in alignment files: CHR_HG126_PATCH, CHR_HG1342_HG2282_PATCH, CHR_HG1362_PATCH, CHR_HG142_HG150_NOVEL_TEST, CHR_HG151_NOVEL_TEST, CHR_HG1651_PATCH, CHR_HG1832_PATCH, CHR_HG2021_PATCH, CHR_HG2022_PATCH, CHR_HG2030_PATCH, CHR_HG2058_PATCH, CHR_HG2062_PATCH, CHR_HG2066_PATCH, CHR_HG2072_PATCH, CHR_HG2095_PATCH, CHR_HG2104_PATCH, CHR_HG2116_PATCH, CHR_HG2128_PATCH, CHR_HG2191_PATCH, CHR_HG2213_PATCH, CHR_HG2216_PATCH, CHR_HG2217_PATCH, CHR_HG2232_PATCH, CHR_HG2233_PATCH, CHR_HG2235_PATCH, CHR_HG2237_PATCH, CHR_HG2239_PATCH, CHR_HG2241_PATCH, CHR_HG2242_HG2243_PATCH, CHR_HG2244_HG2245_PATCH, CHR_HG2247_PATCH, CHR_HG2249_PATCH, CHR_HG2288_HG2289_PATCH, CHR_HG2290_PATCH, CHR_HG2291_PATCH, CHR_HG2334_PATCH, CHR_HG23_PATCH, CHR_HG26_PATCH, CHR_HG986_PATCH, CHR_HSCHR10_1_CTG1, CHR_HSCHR10_1_CTG2, CHR_HSCHR10_1_CTG3, CHR_HSCHR10_1_CTG4, CHR_HSCHR10_1_CTG6, CHR_HSCHR11_1_CTG1_1, CHR_HSCHR11_1_CTG1_2, CHR_HSCHR11_1_CTG2, CHR_HSCHR11_1_CTG3, CHR_HSCHR11

Tarek

ADD REPLY • link 8.3 years ago tarek.mohamed ▴ 10

0

Entering edit mode

Hi Tarek

In the QuasR vignette, they can use 'chrLen <- scanFaIndex(genomeFile)' because "genome" file was defined like this:'genomeFile <- "extdata/hg19sub.fa"'.

Have you tried making a local fasta file out of the BSgenome package? and then called scanFaIndex , as I was suggesting last week?

WRT you new questions:

- BSgenome packages do not contain gene annotation

- yes, you can make a TxDb from Biomart, but you have to be careful, to make sure, it is the same assembly and the individual chromosome are called the same. Which is obviously not the case in your situation: "sequence levels in 'query' not found in alignment files"

Hans-Rudolf

ADD REPLY • link 8.3 years ago Hotz, Hans-Rudolf ▴ 400

0

Entering edit mode

Hi Hans,

Thanks a lot, I downloaded the annotation file from FTP ensemble download page (same asembly) and it worked.

>genome<-export(BSgenome.Hsapiens.NCBI.GRCh38, "BSgenome.Hsapiens.NCBI.GRCh38.fasta")
>genome_fa<-scanFaIndex("BSgenome.Hsapiens.NCBI.GRCh38.fasta")
>annotationfile<-"Homo_sapiens.GRCh38.83.chr.gtf.gz"
>chrominfo <- data.frame(chrom=as.character(seqnames(genome_fa)),length=width(genome_fa),is_circular=rep(FALSE, length(genome_fa)))
>txdb <- makeTxDbFromGFF(file=annotationfile, format="gtf",chrominfo=chrominfo,dataSource="NCBI",organism="Homo sapiens")

Tarek

ADD REPLY • link 8.3 years ago tarek.mohamed ▴ 10