Question: drawProteins from Uniprot entry (missing chain information)
gravatar for mblango
10 months ago by
mblango0 wrote:

Hi, I am using the drawProteins package to draw protein domains as described nicely in several other places. My problem is that in some instances, Uniprot entries are missing CHAIN information, which is required for drawing the background chain in the plot. The CHAIN information essentially provides the length of a given protein. Is there a way to add this information to the data.frame produced by drawProteins::featurestodataframe? I am new to R, so there is probably an embarrassingly simple solution to this problem. I understand how to add rows to a data.frame, but unfortunately I do not understand how to add this information to the slightly more complicated data.frame created by drawProteins. Alternatively, I could contact Uniprot.

Here is the code I am using. If you replace Uniprot ID Q4WXX3 with Q4WVE3 (a different protein), then you can see what is missing.

Thanks in advance!


prot <- drawProteins::get_features("Q4WXX3") 

drawProteins::feature_to_dataframe(prot) -> prot_data

draw_canvas(prot_data) -> p
p <- draw_chains(p, prot_data,
                 labels = c("AgoA"))
p <- draw_domains(p, prot_data,
                  label_domains = FALSE)
p <- draw_regions(p, prot_data) 
p <- draw_repeat(p, prot_data)
p <- draw_motif(p, prot_data)
p <- draw_phospho(p, prot_data, size = 8)

p <- p + theme_bw(base_size = 20) + # white background
        panel.grid.major=element_blank()) +
  theme(axis.ticks = element_blank(), 
        axis.text.y = element_blank()) +
  theme(panel.border = element_blank())
p <- p + theme(legend.position="bottom") + labs(fill="") 

prot_subtitle <- paste0("nsource:Uniprot")
p <- p + labs(title = "Protein Domains",
              subtitle = prot_subtitle)
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] biomaRt_2.38.0       BiocInstaller_1.32.1 forcats_0.3.0        stringr_1.3.1       
 [5] dplyr_0.7.8          purrr_0.2.5          readr_1.3.1          tidyr_0.8.2         
 [9] tibble_2.0.1         tidyverse_1.2.1      ggplot2_3.1.0        drawProteins_1.2.0  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0           lubridate_1.7.4      lattice_0.20-38      prettyunits_1.0.2   
 [5] assertthat_0.2.0     digest_0.6.18        R6_2.3.0             cellranger_1.1.0    
 [9] plyr_1.8.4           backports_1.1.3      stats4_3.5.1         RSQLite_2.1.1       
[13] httr_1.4.0           pillar_1.3.1         rlang_0.3.1          progress_1.2.0      
[17] lazyeval_0.2.1       curl_3.3             readxl_1.2.0         rstudioapi_0.9.0    
[21] blob_1.1.1           S4Vectors_0.20.1     labeling_0.3         RCurl_1.95-4.11     
[25] bit_1.1-14           munsell_0.5.0        broom_0.5.1          compiler_3.5.1      
[29] modelr_0.1.2         pkgconfig_2.0.2      BiocGenerics_0.28.0  tidyselect_0.2.5    
[33] IRanges_2.16.0       XML_3.98-1.16        crayon_1.3.4         withr_2.1.2         
[37] bitops_1.0-6         grid_3.5.1           nlme_3.1-137         jsonlite_1.6        
[41] gtable_0.2.0         DBI_1.0.0            magrittr_1.5         scales_1.0.0        
[45] cli_1.0.1            stringi_1.2.4        bindrcpp_0.2.2       xml2_1.2.0          
[49] generics_0.0.2       tools_3.5.1          bit64_0.9-7          Biobase_2.42.0      
[53] glue_1.3.0           hms_0.4.2            parallel_3.5.1       yaml_2.2.0          
[57] AnnotationDbi_1.44.0 colorspace_1.4-0     rvest_0.3.2          memoise_1.1.0       
[61] bindr_0.1.1          haven_2.0.0         
uniprot drawproteins • 190 views
ADD COMMENTlink modified 10 months ago by James W. MacDonald52k • written 10 months ago by mblango0
Answer: drawProteins from Uniprot entry (missing chain information)
gravatar for James W. MacDonald
10 months ago by
United States
James W. MacDonald52k wrote:

When you query for Q4WXX3, you end up going here, and you can see that there isn't any chain information provided. In fact there is a lot of missing data, presumably because this is a putative protein. If the annotation service doesn't have the data you need to make a plot, there isn't much that drawProteins can do to fix the situation. If you have more data, then it wouldn't be that difficult to add it by hand. For example I can get much of the protein drawn by just adding the chain information by hand:

> prot_data
                 type description begin end length accession    entryName
featuresTemp   DOMAIN         PAZ   302 391     89    Q4WXX3 Q4WXX3_ASPFU
featuresTemp.1 DOMAIN        Piwi   564 871    307    Q4WXX3 Q4WXX3_ASPFU
                taxid order
featuresTemp   330879     1
featuresTemp.1 330879     1

> prot_data <- rbind(data.frame(type = "CHAIN", description = "Eukaryotic translation initiation factor eIF-2C4", begin = 1, end = 320, length = 320, accession = "Q4WXX3", entryName = "Q4WXX3_ASPFU", taxid = 330879, order = 1), prot_data)
> prot_data
                 type                                      description begin
1               CHAIN Eukaryotic translation initiation factor eIF-2C4     1
featuresTemp   DOMAIN                                              PAZ   302
featuresTemp.1 DOMAIN                                             Piwi   564
               end length accession    entryName  taxid order
1              320    320    Q4WXX3 Q4WXX3_ASPFU 330879     1
featuresTemp   391     89    Q4WXX3 Q4WXX3_ASPFU 330879     1
featuresTemp.1 871    307    Q4WXX3 Q4WXX3_ASPFU 330879     1

But I have no idea if that is the correct protein length! If you have those data, you can easily add. But if you don't, then there's no way to add anything because you don't have the data.

ADD COMMENTlink written 10 months ago by James W. MacDonald52k

Hi James, Nice job. To add to your answer, we can use the amino acid sequence to calculate the protein length. Here is some code that will do that. Best wishes, Paul

# Load the package required to read JSON files.
data <- readLines(url)
# extract JSON
result <- fromJSON(data)
# here is the sequence
sequence <- result[[1]]$sequence
# count the number of characters.
length <- nchar(sequence)

prot_data <- rbind(data.frame(type = "CHAIN",
  description = "Eukaryotic translation initiation factor eIF-2C4",
  begin = 1,
  end = length, length = length,
  accession = "Q4WXX3",
  entryName = "Q4WXX3_ASPFU",
  taxid = 330879, order = 1), prot_data)

ADD REPLYlink written 10 months ago by Paul Brennan10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 215 users visited in the last hour