Search
Question: Question: qCount in QuasR for RNAseq quantification
0
gravatar for tarek.mohamed
22 months ago by
tarek.mohamed0 wrote:

Hi All,

I am using QuasR to analyze mt paired end RNAseq data, now I have bam files for my 12 samples and the next step is to generate a count table for Differential gene expression analysis. I follwed the QuasR vignette and I used qCount() but I had a problem (R could not find my genome file)

sampleFile <- "DOX_RNAseq.txt"
genomeFile<-"BSgenome.Hsapiens.NCBI.GRCh38"
proj_RNASeq<- qAlign(sampleFile, genomeFile,cacheDir = "E:/Doxorubicin Project RNA-seq Data")

library(GenomicFeatures)
annotationFile <- "BSgenome.Hsapiens.NCBI.GRCh38_annotation.gtf"
chrLen <- scanFaIndex(genomeFile)

Error in scanFaIndex(open(FaFile(file))) : 
  error in evaluating the argument 'file' in selecting a method for function 'scanFaIndex': Error in .io_check_exists(path(con)) : file(s) do not exist:
  'BSgenome.Hsapiens.NCBI.GRCh38'

 

Thanks

Tarek

ADD COMMENTlink modified 22 months ago by Hotz, Hans-Rudolf380 • written 22 months ago by tarek.mohamed0
0
gravatar for Hotz, Hans-Rudolf
22 months ago by
Switzerland
Hotz, Hans-Rudolf380 wrote:

Hi Tarek

It looks like you are mixing up two different things here - though, I admit, QuasR is confusing in this regards.

 

First, you are running 'qAlign'. Here, you provide the string "BSgenome.Hsapiens.NCBI.GRCh38" for the second, ie the 'genome' argument. You do not provide a file (path) to a genome in fasta format, but you provide the name of a BSgenome package, which will be downloaded automatically from Bioconductor.

Next, you work with an annotation file (I assume you want to create a 'GRanges' or 'GRangesList' in order to provide it as the 'query' argument to 'qCount', do you?). Here, I don't understand why you are calling the 'scanFaIndex'. Nevertheles, 'scanFaIndex' requires a file (path) to a genome in fasta format. But you are just providing the name of a BSgenome package, which is obvioulsy not a fasta file in your path.  Hence it is not a QuasR error, isn't-it?

So, you first need to make a local fasta file out of the BSgenome package:

export(BSgenome.Hsapiens.NCBI.GRCh38, "BSgenome.Hsapiens.NCBI.GRCh38.fasta")

and then you can call

scanFaIndex("BSgenome.Hsapiens.NCBI.GRCh38.fasta")

 

Hans-Rudolf

PS: Have you considered working with a TxDb as query?

 

 

ADD COMMENTlink written 22 months ago by Hotz, Hans-Rudolf380

Hi Hans,

Thanks for you reply

-Sorry for the confusion, actually I am trying to generate a TxDB as query. According to QausR vignette  they generate a TxDB using this script. They used scanFaIndex first then they proceed till TxDb

library(GenomicFeatures)
> annotFile <- "extdata/hg19sub_annotation.gtf"
> chrLen <- scanFaIndex(genomeFile)
> chrominfo <- data.frame(chrom=as.character(seqnames(chrLen)),length=width(chrLen),is_circular=rep(FALSE, length(chrLen)))
> txdb <- makeTxDbFromGFF(file=annotFile, format="gtf",chrominfo=chrominfo, dataSource="Ensembl",
organism="Homo sapiens")

- Can I get annotation file from BSgenome? so that I can use makeTxDbFromGFF () to get TxDb 

- Alternatively, I tried to generate a TxDb from biomart which was successful, but when calling qCount() I got an error (see below)

>TxDb<- makeTxDbFromBiomart(dataset="hsapiens_gene_ensembl")

>TxDb

TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: BioMart
# Organism: Homo sapiens
# Taxonomy ID: 9606
# Resource URL: www.ensembl.org:80
# BioMart database: ENSEMBL_MART_ENSEMBL
# BioMart database version: Ensembl Genes 83
# BioMart dataset: hsapiens_gene_ensembl
# BioMart dataset description: Homo sapiens genes (GRCh38.p5)
# BioMart dataset version: GRCh38.p5
# Full dataset: yes
# miRBase build ID: NA
# transcript_nrow: 217425
# exon_nrow: 735779
# cds_nrow: 295205
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2016-01-15 14:29:41 -0600 (Fri, 15 Jan 2016)
# GenomicFeatures version at creation time: 1.22.7
# RSQLite version at cr

>proj_RNASeq_geneLevels <- qCount(proj_RNASeq, TxDb, reportLevel="gene")

Error in qCount(proj_RNASeq, TxDb, reportLevel = "gene") : 
  sequence levels in 'query' not found in alignment files: CHR_HG126_PATCH, CHR_HG1342_HG2282_PATCH, CHR_HG1362_PATCH, CHR_HG142_HG150_NOVEL_TEST, CHR_HG151_NOVEL_TEST, CHR_HG1651_PATCH, CHR_HG1832_PATCH, CHR_HG2021_PATCH, CHR_HG2022_PATCH, CHR_HG2030_PATCH, CHR_HG2058_PATCH, CHR_HG2062_PATCH, CHR_HG2066_PATCH, CHR_HG2072_PATCH, CHR_HG2095_PATCH, CHR_HG2104_PATCH, CHR_HG2116_PATCH, CHR_HG2128_PATCH, CHR_HG2191_PATCH, CHR_HG2213_PATCH, CHR_HG2216_PATCH, CHR_HG2217_PATCH, CHR_HG2232_PATCH, CHR_HG2233_PATCH, CHR_HG2235_PATCH, CHR_HG2237_PATCH, CHR_HG2239_PATCH, CHR_HG2241_PATCH, CHR_HG2242_HG2243_PATCH, CHR_HG2244_HG2245_PATCH, CHR_HG2247_PATCH, CHR_HG2249_PATCH, CHR_HG2288_HG2289_PATCH, CHR_HG2290_PATCH, CHR_HG2291_PATCH, CHR_HG2334_PATCH, CHR_HG23_PATCH, CHR_HG26_PATCH, CHR_HG986_PATCH, CHR_HSCHR10_1_CTG1, CHR_HSCHR10_1_CTG2, CHR_HSCHR10_1_CTG3, CHR_HSCHR10_1_CTG4, CHR_HSCHR10_1_CTG6, CHR_HSCHR11_1_CTG1_1, CHR_HSCHR11_1_CTG1_2, CHR_HSCHR11_1_CTG2, CHR_HSCHR11_1_CTG3, CHR_HSCHR11
 

 

Tarek

 

 

ADD REPLYlink modified 22 months ago • written 22 months ago by tarek.mohamed0

Hi Tarek

In the QuasR vignette, they can use 'chrLen <- scanFaIndex(genomeFile)' because "genome" file was defined like this:'genomeFile <- "extdata/hg19sub.fa"'.

Have you tried making a local fasta file out of the BSgenome package? and then called scanFaIndex , as I was suggesting last week?

 

WRT you new questions:

 - BSgenome packages do not contain gene annotation

 - yes, you can make a TxDb from Biomart, but you have to be careful, to make sure, it is the same assembly and the individual chromosome are called the same. Which is obviously not the case in your situation: "sequence levels in 'query' not found in alignment files"

 

Hans-Rudolf



 

ADD REPLYlink written 22 months ago by Hotz, Hans-Rudolf380

Hi Hans,

Thanks a lot, I downloaded the annotation file from  FTP ensemble download page (same asembly) and it worked.

>genome<-export(BSgenome.Hsapiens.NCBI.GRCh38, "BSgenome.Hsapiens.NCBI.GRCh38.fasta")
>genome_fa<-scanFaIndex("BSgenome.Hsapiens.NCBI.GRCh38.fasta")
>annotationfile<-"Homo_sapiens.GRCh38.83.chr.gtf.gz"
>chrominfo <- data.frame(chrom=as.character(seqnames(genome_fa)),length=width(genome_fa),is_circular=rep(FALSE, length(genome_fa)))
>txdb <- makeTxDbFromGFF(file=annotationfile, format="gtf",chrominfo=chrominfo,dataSource="NCBI",organism="Homo sapiens")

Tarek

ADD REPLYlink written 22 months ago by tarek.mohamed0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 182 users visited in the last hour