Question: Working with Arabidopsis Databases
1
gravatar for oliwindram
3.8 years ago by
oliwindram10
United Kingdom
oliwindram10 wrote:

Hi 

I have been trying to follow this tut (http://www.bioconductor.org/help/workflows/gene-regulation-tfbs/#sequence-search) and apply it to working with some gene in Arabidopsis.

I try this:

library(MotifDb)
library(seqLogo)
library(motifStack)
library(Biostrings)
library(GenomicFeatures)
library(org.At.tair.db)
library(BSgenome.Athaliana.TAIR.TAIR9)
library(TxDb.Athaliana.BioMart.plantsmart25)

myb108 <- "AT3G06490"
chrommyb108.loc <- transcriptsBy(TxDb.Athaliana.BioMart.plantsmart25, by="gene") [myb108]
promoter.myb108 <- getPromoterSeq(chrommyb108.loc , Athaliana, upstream=1000, downstream=0)

Error in .getOneSeqFromBSgenomeMultipleSequences(x, name, start, NA, width,  : 
  sequence 3 not found
In addition: Warning message:
In .Seqinfo.mergexy(x, y) :
  The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)

So I had a look at the seqinfo for a few objects namelyTxDb.Athaliana.BioMart.plantsmart25 and Athaliana and I noticed that the chromosome names differ along with the isCircular statement for the last chromosome:

> seqinfo(TxDb.Athaliana.BioMart.plantsmart25)
Seqinfo object with 7 sequences (1 circular) from an unspecified genome:
  seqnames seqlengths isCircular genome
  1          30427671      FALSE   <NA>
  2          19698289      FALSE   <NA>
  3          23459830      FALSE   <NA>
  4          18585056      FALSE   <NA>
  5          26975502      FALSE   <NA>
  Mt           366924       TRUE   <NA>
  Pt           154478      FALSE   <NA>
> seqinfo(Athaliana)
Seqinfo object with 7 sequences (2 circular) from TAIR9 genome:
  seqnames seqlengths isCircular genome
  Chr1       30427671      FALSE  TAIR9
  Chr2       19698289      FALSE  TAIR9
  Chr3       23459830      FALSE  TAIR9
  Chr4       18585056      FALSE  TAIR9
  Chr5       26975502      FALSE  TAIR9
  ChrM         366924       TRUE  TAIR9
  ChrC         154478       TRUE  TAIR9

I wondered if I am perhaps using the wrong TxDb? Are there any others I should be using?

Anyway I could get this to work?

 

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by oliwindram10
Answer: Working with Arabidopsis Databases
1
gravatar for Hervé Pagès
3.8 years ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi,

You're using the right TxDb. It's just that the data in TxDb.Athaliana.BioMart.plantsmart25 is coming from the Ensembl Plants database and the Ensembl Plants folks use different chromosome names than the TAIR people. Normally switching from one chromosome naming convention to the other should be easy, it's just a matter of doing something like:

seqlevelsStyle(TxDb.Athaliana.BioMart.plantsmart25) <- seqlevelsStyle(BSgenome.Athaliana.TAIR.TAIR9)

However this doesn't seem to work properly here so we'll need to look into it a little bit more. In the meantime you can do:

seqlevels(TxDb.Athaliana.BioMart.plantsmart25) <- seqlevels(BSgenome.Athaliana.TAIR.TAIR9)

instead, which is usually not the right way of doing it but it happens to work fine here.

Cheers,

H.

ADD COMMENTlink written 3.8 years ago by Hervé Pagès ♦♦ 13k
Answer: Working with Arabidopsis Databases
0
gravatar for oliwindram
3.8 years ago by
oliwindram10
United Kingdom
oliwindram10 wrote:

 

Hi Hervé,

Thanks for your comment,  this does help to a degree it changes the seqnames. However the isCircular statement for ChrC which I imagine is the plastid chromosome still differs between the two? I think this is what is causing the error I am getting now which is:

> promoter.myb108 <- getPromoterSeq(chrommyb108.loc , Athaliana, upstream=1000, downstream=0)
Error in mergeNamedAtomicVectors(isCircular(x), isCircular(y), what = c("sequence",  : 
  sequence ChrC has incompatible circularity flags:
  - in 'x': TRUE
  - in 'y': FALSE

I tried a few commands like these:

> isCircular(TxDb.Athaliana.BioMart.plantsmart25) <- isCircular(BSgenome.Athaliana.TAIR.TAIR9)
Error in .seqinfo.TxDbReplace(x, new2old = new2old, force = force, value) : 
  'new2old' must be specified when replacing the 'seqinfo' of a TxDb object

> seqinfo(TxDb.Athaliana.BioMart.plantsmart25) <- seqinfo(BSgenome.Athaliana.TAIR.TAIR9)
Error in .seqinfo.TxDbReplace(x, new2old = new2old, force = force, value) : 
  'new2old' must be specified when replacing the 'seqinfo' of a TxDb object

but they throw the above errors, also I am also unsure what structure the new2old argument should have?

Any more suggestions are welcome.

 

ADD COMMENTlink written 3.8 years ago by oliwindram10

Hi,

Unfortunately the isCircular() setter is currently broken on TxDb objects so I'm also going to look at this one. In the meantime, the workaround is to use it on any GRanges or GRangesList object you extract from the TxDb object (e.g. chrommyb108.loc) before you do anything with the object (like calling getPromoterSeq()).

Cheers,

H.

ADD REPLYlink written 3.8 years ago by Hervé Pagès ♦♦ 13k

This seems to have solved the problem:

seqlevels(TxDb.Athaliana.BioMart.plantsmart25) <- seqlevels(BSgenome.Athaliana.TAIR.TAIR9)
chrommyb108.loc <- transcriptsBy(TxDb.Athaliana.BioMart.plantsmart25, by="gene") [myb108]
isCircular(chrommyb108.loc) <- isCircular(BSgenome.Athaliana.TAIR.TAIR9)
promoter.myb108 <- getPromoterSeq(chrommyb108.loc, Athaliana, upstream=1000, downstream=0)

 

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by oliwindram10

Hi oliwindram,

Just to let you know that in BioC 3.2 (released 2 days ago),

seqlevelsStyle(TxDb.Athaliana.BioMart.plantsmart25) <- seqlevelsStyle(BSgenome.Athaliana.TAIR.TAIR9)

works and does the right thing.

Modifying the sequence circularity flags of a TxDb object with e.g.

isCircular(TxDb.Athaliana.BioMart.plantsmart25) <- isCircular(BSgenome.Athaliana.TAIR.TAIR9)

is still not supported though but will be soon.

Cheers,

H.

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by Hervé Pagès ♦♦ 13k

I meant seqlevelsStyle instead of seqlevels in the above code. I edited the code.

H.

ADD REPLYlink written 3.6 years ago by Hervé Pagès ♦♦ 13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 341 users visited in the last hour