Granges and reference genome
1
1
Entering edit mode
Asma rabe ▴ 290
@asma-rabe-4697
Last seen 6.2 years ago
Japan

I have file with 3 columns 

Chr   start  end

I read it to Granges object :

Data <- read.table("file",header=T)
#create Granges object
object <- GRanges(seqnames=Rle(Data$Chr),ranges=IRanges(Data$start,end=Data$end),seqlengths=c(chr1=249250621,chr2=243199373,chr3=198022430,chr4=191154276))

I have two questions:

Q1-How it is recognized that at this position the nucleotide is G or A even if no information about reference genome is provided (nothing determine if it is human or mouse genome for example or it can be known from chr. length).

Q2-if I have the reference sequence in fastq format and i would like to read data into granges object whereas reference genome is the fastq file ,how to do that?

granges • 1.9k views
ADD COMMENT
1
Entering edit mode

Re Q1, where have you seen any indications about sequence information? Here is what I have, based on the code you provided:

> library("GenomicRanges")
> Data = data.frame(Chr = paste0("chr", 1:4), 
                    start = 1000001:1000004, end = 2000001:2000004)
> Data
   Chr   start     end
1 chr1 1000001 2000001
2 chr2 1000002 2000002
3 chr3 1000003 2000003
4 chr4 1000004 2000004
> object <- GRanges(seqnames=Rle(Data$Chr),
                    ranges=IRanges(Data$start,end=Data$end),
                    seqlengths=c(chr1=249250621,chr2=243199373,
                                 chr3=198022430,chr4=191154276))

> object
GRanges with 4 ranges and 0 metadata columns:
      seqnames             ranges strand
                     
  [1]     chr1 [1000001, 2000001]      *
  [2]     chr2 [1000002, 2000002]      *
  [3]     chr3 [1000003, 2000003]      *
  [4]     chr4 [1000004, 2000004]      *
  ---
  seqlengths:
        chr1      chr2      chr3      chr4
   249250621 243199373 198022430 191154276
ADD REPLY
0
Entering edit mode

Dear Asma, just wanted to let you know that I update some formatting to differentiate between code chunks and text to increase the readability. Best wishes, Laurent

ADD REPLY
1
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States

Have a look at the getSeq() method in the biostrings package and the FaFile() in the Rsamtools package.  In short, you'll need to make an FaFile object from your fasta reference sequence and then use getSeq with the FaFile object and your GRanges object.  If you have questions on the details, please use the comments to ask further questions.

ADD COMMENT
0
Entering edit mode

Yes that's basically it, granted that your reference genome is in a FASTA file, and not in a FASTQ file like you said you have it. I don't think it's possible to store a reference genome in FASTQ format anyway...

ADD REPLY
0
Entering edit mode

Hi Hervé,

This was mistyping sorry!! I meant FASTA not FASTQ

ADD REPLY

Login before adding your answer.

Traffic: 750 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6