New to BioC, just trying to learn. I’ve got a sequence file (FASTA) and an annotation file (BED) with the various ranges of genes and other features. I would like to use the BED file to annotate my sequence, and then be able to pull out portions of the sequence not found in the BE file (eg, 200 bp next to one of the genes listed in the BED file). I’m just starting out with BioC, right now I’m learning the GenomicRanges, Biostrings, and rtracklayer packages. I can import the BED file (bedfile <- import(“bedfile.txt”, format = bed) and the portion of the FASTA file that I want (sequenceICareAbout <- readDNAStringSet(“myFile.fasta”, nrec = 1, skip = 8, seek.first.red = FALSE, use.names = FALSE), but I have no real idea how to combine them. I looked a bit at the annotatr package and annorate_regions, but it’s asking for a GRanges object and I only see how to manually make a fake GRanges object with example found in the An Introduction to the Genomic Ranges Package page. Thank you for all the help!
Hi, thank you for the help and sorry for the late reply, I had an emergency come up and didn’t have time for BioC this last week.
I’ll try to clarify a bit more. I can make a GRange object with the BED file like you showed, but it’s not referencing a genome. If I do seqinfo(x) under “genome” it’s “<NA>, so while my GRange object knows there is an IRange from 14010-14201 with a character name for the gene and which strand it’s on, it doesn’t actually have a sequence (ATGCG etc) for that range. My actual sequence is in a FASTA file on my computer, it’s not public ally available yet. There are several chromosomes in the file, I know how to pull out the portion I want as a string with readDNAStringSet, but I don’t know how to make that DNAStringSet with my sequence be the genome the GRanges object uses.
Thank you again for the help!
If you just have a FASTA file, you might need to forge a BSGenome object. You might also need to add the seqnames to your GRanges object. But conceptually if you have a BSGenome object and a GRanges object, you can get the sequences like I showed you.
Thank you for the help. I worked on making a BSGenome object for a bit but it was a bit beyond me right now. I found I could pull it up by, after making sure all the names were the same, using "getSeq(myDNAStringSet, myGRangesObject). I can pull out individual genes marked on the BED file I used for the GRanges object. Just working now on how to get chunks of DNA between them so I know what to put in my plasmids.
Thank you for all the help!