Search
Question: Question: qCount in QuasR for RNAseq quantification
0
gravatar for tarek.mohamed
2.4 years ago by
tarek.mohamed0 wrote:

Hi All,

I am using QuasR to analyze mt paired end RNAseq data, now I have bam files for my 12 samples and the next step is to generate a count table for Differential gene expression analysis. I follwed the QuasR vignette and I used qCount() but I had a problem (R could not find my genome file)

sampleFile <- "DOX_RNAseq.txt"
genomeFile<-"BSgenome.Hsapiens.NCBI.GRCh38"
proj_RNASeq<- qAlign(sampleFile, genomeFile,cacheDir = "E:/Doxorubicin Project RNA-seq Data")

library(GenomicFeatures)
annotationFile <- "BSgenome.Hsapiens.NCBI.GRCh38_annotation.gtf"
chrLen <- scanFaIndex(genomeFile)

Error in scanFaIndex(open(FaFile(file))) : 
  error in evaluating the argument 'file' in selecting a method for function 'scanFaIndex': Error in .io_check_exists(path(con)) : file(s) do not exist:
  'BSgenome.Hsapiens.NCBI.GRCh38'

 

Thanks

Tarek

ADD COMMENTlink modified 2.4 years ago by Hotz, Hans-Rudolf390 • written 2.4 years ago by tarek.mohamed0
0
gravatar for Hotz, Hans-Rudolf
2.4 years ago by
Switzerland
Hotz, Hans-Rudolf390 wrote:

Hi Tarek

It looks like you are mixing up two different things here - though, I admit, QuasR is confusing in this regards.

 

First, you are running 'qAlign'. Here, you provide the string "BSgenome.Hsapiens.NCBI.GRCh38" for the second, ie the 'genome' argument. You do not provide a file (path) to a genome in fasta format, but you provide the name of a BSgenome package, which will be downloaded automatically from Bioconductor.

Next, you work with an annotation file (I assume you want to create a 'GRanges' or 'GRangesList' in order to provide it as the 'query' argument to 'qCount', do you?). Here, I don't understand why you are calling the 'scanFaIndex'. Nevertheles, 'scanFaIndex' requires a file (path) to a genome in fasta format. But you are just providing the name of a BSgenome package, which is obvioulsy not a fasta file in your path.  Hence it is not a QuasR error, isn't-it?

So, you first need to make a local fasta file out of the BSgenome package:

export(BSgenome.Hsapiens.NCBI.GRCh38, "BSgenome.Hsapiens.NCBI.GRCh38.fasta")

and then you can call

scanFaIndex("BSgenome.Hsapiens.NCBI.GRCh38.fasta")

 

Hans-Rudolf

PS: Have you considered working with a TxDb as query?

 

 

ADD COMMENTlink written 2.4 years ago by Hotz, Hans-Rudolf390

Hi Hans,

Thanks for you reply

-Sorry for the confusion, actually I am trying to generate a TxDB as query. According to QausR vignette  they generate a TxDB using this script. They used scanFaIndex first then they proceed till TxDb

library(GenomicFeatures)
> annotFile <- "extdata/hg19sub_annotation.gtf"
> chrLen <- scanFaIndex(genomeFile)
> chrominfo <- data.frame(chrom=as.character(seqnames(chrLen)),length=width(chrLen),is_circular=rep(FALSE, length(chrLen)))
> txdb <- makeTxDbFromGFF(file=annotFile, format="gtf",chrominfo=chrominfo, dataSource="Ensembl",
organism="Homo sapiens")

- Can I get annotation file from BSgenome? so that I can use makeTxDbFromGFF () to get TxDb 

- Alternatively, I tried to generate a TxDb from biomart which was successful, but when calling qCount() I got an error (see below)

>TxDb<- makeTxDbFromBiomart(dataset="hsapiens_gene_ensembl")

>TxDb

TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: BioMart
# Organism: Homo sapiens
# Taxonomy ID: 9606
# Resource URL: www.ensembl.org:80
# BioMart database: ENSEMBL_MART_ENSEMBL
# BioMart database version: Ensembl Genes 83
# BioMart dataset: hsapiens_gene_ensembl
# BioMart dataset description: Homo sapiens genes (GRCh38.p5)
# BioMart dataset version: GRCh38.p5
# Full dataset: yes
# miRBase build ID: NA
# transcript_nrow: 217425
# exon_nrow: 735779
# cds_nrow: 295205
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2016-01-15 14:29:41 -0600 (Fri, 15 Jan 2016)
# GenomicFeatures version at creation time: 1.22.7
# RSQLite version at cr

>proj_RNASeq_geneLevels <- qCount(proj_RNASeq, TxDb, reportLevel="gene")

Error in qCount(proj_RNASeq, TxDb, reportLevel = "gene") : 
  sequence levels in 'query' not found in alignment files: CHR_HG126_PATCH, CHR_HG1342_HG2282_PATCH, CHR_HG1362_PATCH, CHR_HG142_HG150_NOVEL_TEST, CHR_HG151_NOVEL_TEST, CHR_HG1651_PATCH, CHR_HG1832_PATCH, CHR_HG2021_PATCH, CHR_HG2022_PATCH, CHR_HG2030_PATCH, CHR_HG2058_PATCH, CHR_HG2062_PATCH, CHR_HG2066_PATCH, CHR_HG2072_PATCH, CHR_HG2095_PATCH, CHR_HG2104_PATCH, CHR_HG2116_PATCH, CHR_HG2128_PATCH, CHR_HG2191_PATCH, CHR_HG2213_PATCH, CHR_HG2216_PATCH, CHR_HG2217_PATCH, CHR_HG2232_PATCH, CHR_HG2233_PATCH, CHR_HG2235_PATCH, CHR_HG2237_PATCH, CHR_HG2239_PATCH, CHR_HG2241_PATCH, CHR_HG2242_HG2243_PATCH, CHR_HG2244_HG2245_PATCH, CHR_HG2247_PATCH, CHR_HG2249_PATCH, CHR_HG2288_HG2289_PATCH, CHR_HG2290_PATCH, CHR_HG2291_PATCH, CHR_HG2334_PATCH, CHR_HG23_PATCH, CHR_HG26_PATCH, CHR_HG986_PATCH, CHR_HSCHR10_1_CTG1, CHR_HSCHR10_1_CTG2, CHR_HSCHR10_1_CTG3, CHR_HSCHR10_1_CTG4, CHR_HSCHR10_1_CTG6, CHR_HSCHR11_1_CTG1_1, CHR_HSCHR11_1_CTG1_2, CHR_HSCHR11_1_CTG2, CHR_HSCHR11_1_CTG3, CHR_HSCHR11
 

 

Tarek

 

 

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by tarek.mohamed0

Hi Tarek

In the QuasR vignette, they can use 'chrLen <- scanFaIndex(genomeFile)' because "genome" file was defined like this:'genomeFile <- "extdata/hg19sub.fa"'.

Have you tried making a local fasta file out of the BSgenome package? and then called scanFaIndex , as I was suggesting last week?

 

WRT you new questions:

 - BSgenome packages do not contain gene annotation

 - yes, you can make a TxDb from Biomart, but you have to be careful, to make sure, it is the same assembly and the individual chromosome are called the same. Which is obviously not the case in your situation: "sequence levels in 'query' not found in alignment files"

 

Hans-Rudolf



 

ADD REPLYlink written 2.4 years ago by Hotz, Hans-Rudolf390

Hi Hans,

Thanks a lot, I downloaded the annotation file from  FTP ensemble download page (same asembly) and it worked.

>genome<-export(BSgenome.Hsapiens.NCBI.GRCh38, "BSgenome.Hsapiens.NCBI.GRCh38.fasta")
>genome_fa<-scanFaIndex("BSgenome.Hsapiens.NCBI.GRCh38.fasta")
>annotationfile<-"Homo_sapiens.GRCh38.83.chr.gtf.gz"
>chrominfo <- data.frame(chrom=as.character(seqnames(genome_fa)),length=width(genome_fa),is_circular=rep(FALSE, length(genome_fa)))
>txdb <- makeTxDbFromGFF(file=annotationfile, format="gtf",chrominfo=chrominfo,dataSource="NCBI",organism="Homo sapiens")

Tarek

ADD REPLYlink written 2.4 years ago by tarek.mohamed0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 138 users visited in the last hour