Question

Granges and reference genome

1

Entering edit mode

Asma rabe ▴ 290

@asma-rabe-4697

Last seen 6.2 years ago

Japan

I have file with 3 columns

Chr   start  end

I read it to Granges object :

Data <- read.table("file",header=T)
#create Granges object
object <- GRanges(seqnames=Rle(Data$Chr),ranges=IRanges(Data$start,end=Data$end),seqlengths=c(chr1=249250621,chr2=243199373,chr3=198022430,chr4=191154276))

I have two questions:

Q1-How it is recognized that at this position the nucleotide is G or A even if no information about reference genome is provided (nothing determine if it is human or mouse genome for example or it can be known from chr. length).

Q2-if I have the reference sequence in fastq format and i would like to read data into granges object whereas reference genome is the fastq file ,how to do that?

granges • 1.9k views

ADD COMMENT • link updated 9.6 years ago by Sean Davis 21k • written 9.6 years ago by Asma rabe ▴ 290

1

Entering edit mode

Re Q1, where have you seen any indications about sequence information? Here is what I have, based on the code you provided:

> library("GenomicRanges")
> Data = data.frame(Chr = paste0("chr", 1:4), 
                    start = 1000001:1000004, end = 2000001:2000004)
> Data
   Chr   start     end
1 chr1 1000001 2000001
2 chr2 1000002 2000002
3 chr3 1000003 2000003
4 chr4 1000004 2000004
> object <- GRanges(seqnames=Rle(Data$Chr),
                    ranges=IRanges(Data$start,end=Data$end),
                    seqlengths=c(chr1=249250621,chr2=243199373,
                                 chr3=198022430,chr4=191154276))

> object
GRanges with 4 ranges and 0 metadata columns:
      seqnames             ranges strand
                     
  [1]     chr1 [1000001, 2000001]      *
  [2]     chr2 [1000002, 2000002]      *
  [3]     chr3 [1000003, 2000003]      *
  [4]     chr4 [1000004, 2000004]      *
  ---
  seqlengths:
        chr1      chr2      chr3      chr4
   249250621 243199373 198022430 191154276

ADD REPLY • link 9.6 years ago Laurent Gatto 1.6k

0

Entering edit mode

Dear Asma, just wanted to let you know that I update some formatting to differentiate between code chunks and text to increase the readability. Best wishes, Laurent

ADD REPLY • link 9.6 years ago Laurent Gatto 1.6k

score 1 · Answer 1 · 2014-09-25

1

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 3 months ago

United States

Have a look at the getSeq() method in the biostrings package and the FaFile() in the Rsamtools package. In short, you'll need to make an FaFile object from your fasta reference sequence and then use getSeq with the FaFile object and your GRanges object. If you have questions on the details, please use the comments to ask further questions.

ADD COMMENT • link 9.6 years ago Sean Davis 21k

0

Entering edit mode

Yes that's basically it, granted that your reference genome is in a FASTA file, and not in a FASTQ file like you said you have it. I don't think it's possible to store a reference genome in FASTQ format anyway...

ADD REPLY • link 9.6 years ago Hervé Pagès 16k

0

Entering edit mode

Hi Hervé,

This was mistyping sorry!! I meant FASTA not FASTQ

ADD REPLY • link 9.6 years ago Asma rabe ▴ 290