I am trying to interpret the figure that is plotted by syntenet (https://github.com/almeidasilvaf/syntenet) that visualizes the phylogenomic profiles for the different species and groups; What exactly are the different colors corresponding to increasing numbers (e.g. red color -> +3)? In addition what does the clustering tree on top of the figure represent? Are these groups of clusters that are shared in different species? Some more infor in the documentation would help.
The heatmap is based on a matrix mij that displays the number of genes in synteny cluster j that can be found in species i. That's what colors represent. Thus, if a synteny cluster (column) is red for a particular species (row), it means that this species contains 3 or more genes in this synteny cluster.
The clustering on the columns is a simple hierarchical clustering (Ward's clustering on a matrix of Euclidean distances) to group similar synteny clusters together.
It is then somewhat strange that my heatmap includes a lot of light blue areas (also in the vignette example) which according to your description include only 1 gene per synteny cluster for this specific species. How can a region be part of a synteny cluster based on only one gene? Or maybe I did not fully understand your description..
Many thanks I think I got this now. Initially I thought that a cluster includes all the genes of a single synteny block across the analyzed genomes but as far as I understand it now each cluster is composed of separate gene anchor pairs of the defined blocks extended over several genomes. Am I correct?
What does it mean then if you have more than 1 gene per cluster? Could it be whole genome duplication or even uncollapsed haplotypes in the genome assembly? If tandem arrays are collapsed and are not part of the individual clusters I assume they cannot account for the pattern of multiple genes per cluster.
Your interpretation is correct. Multiple genes per cluster typically indicate polyploidization events (whole-genome duplication, triplication, etc.), but uncollapsed haplotypes are also a reasonable explanation.
As an example, in Figure 1B of the syntenet manuscript (https://www.biorxiv.org/content/10.1101/2022.08.16.504079v2), you can see some species with orange (2) and red (3+) for most of the synteny clusters, which is explained by the fact that these species are recent polyploids.
I would admit that the trimodality of the density plot of those data makes them appear strange, but I have not had much experience with Nimblegen data, so perhaps this is to be anticipated. gartic phone
In my file after going through the check_input() and creating the pdata <- process_input() the error below occurs and the seq_2 <- process_input(proteomes1, annotation1)$seq[1:4]
if(diamond_is_installed()) {
blast_list <- run_diamond(seq_2)
}
the result of blast_list is the source file data(annotation)
and data(proteomes) your file, not my file
Error: The sequences are expected to be proteins but only contain DNA letters. Use the option --ignore-warnings to proceed.
Error: The sequences are expected to be proteins but only contain DNA letters. Use the option --ignore-warnings to proceed.
Error: The sequences are expected to be proteins but only contain DNA letters. Use the option --ignore-warnings to proceed.
Error: The sequences are expected to be proteins but only contain DNA letters. Use the option --ignore-warnings to proceed.
No such file or directory
Error: Error opening file C:\Users\lwand\AppData\Local\Temp\Rtmp0cCOSb/diamond/dbs/Creinhardtii_281
No such file or directory
Error: Error opening file C:\Users\lwand\AppData\Local\Temp\Rtmp0cCOSb/diamond/dbs/Creinhardtii_281
No such file or directory
Error: Error opening file C:\Users\lwand\AppData\Local\Temp\Rtmp0cCOSb/diamond/dbs/Creinhardtii_281
No such file or directory
Error: Error opening file C:\Users\lwand\AppData\Local\Temp\Rtmp0cCOSb/diamond/dbs/Creinhardtii_281
No such file or directory
Error: Error opening file C:\Users\lwand\AppData\Local\Temp\Rtmp0cCOSb/diamond/dbs/Czofingiensis_461
No such file or directory
Error: Error opening file C:\Users\lwand\AppData\Local\Temp\Rtmp0cCOSb/diamond/dbs/Czofingiensis_461
No such file or directory
Error: Error opening file C:\Users\lwand\AppData\Local\Temp\Rtmp0cCOSb/diamond/dbs/Czofingiensis_461
No such file or directory
Error: Error opening file C:\Users\lwand\AppData\Local\Temp\Rtmp0cCOSb/diamond/dbs/Czofingiensis_461
No such file or directory
Error: Error opening file C:\Users\lwand\AppData\Local\Temp\Rtmp0cCOSb/diamond/dbs/Dsalina_325_v1
No such file or directory
Error: Error opening file C:\Users\lwand\AppData\Local\Temp\Rtmp0cCOSb/diamond/dbs/Dsalina_325_v1
No such file or directory
Error: Error opening file C:\Users\lwand\AppData\Local\Temp\Rtmp0cCOSb/diamond/dbs/Dsalina_325_v1
No such file or directory
Error: Error opening file C:\Users\lwand\AppData\Local\Temp\Rtmp0cCOSb/diamond/dbs/Dsalina_325_v1
I noticed that your question is not related to this post. When asking questions, please open a new post with your own question, don't ask them as comments in other questions. Make sure to also include all steps you took to get to the error, otherwise people won't be able to reproduce your problem and help you.
Thanks Fabricio,
It is then somewhat strange that my heatmap includes a lot of light blue areas (also in the vignette example) which according to your description include only 1 gene per synteny cluster for this specific species. How can a region be part of a synteny cluster based on only one gene? Or maybe I did not fully understand your description..
Best Alex
Hi, Alex
I believe you have not fully understood the concept of synteny clusters. I think Figure 1 of https://www.nature.com/articles/s41467-021-23665-0 will be helpful.
Best, Fabricio
Hi Fabricio,
Many thanks I think I got this now. Initially I thought that a cluster includes all the genes of a single synteny block across the analyzed genomes but as far as I understand it now each cluster is composed of separate gene anchor pairs of the defined blocks extended over several genomes. Am I correct?
What does it mean then if you have more than 1 gene per cluster? Could it be whole genome duplication or even uncollapsed haplotypes in the genome assembly? If tandem arrays are collapsed and are not part of the individual clusters I assume they cannot account for the pattern of multiple genes per cluster.
Thanks Alex
Hi, Alex.
Your interpretation is correct. Multiple genes per cluster typically indicate polyploidization events (whole-genome duplication, triplication, etc.), but uncollapsed haplotypes are also a reasonable explanation.
As an example, in Figure 1B of the syntenet manuscript (https://www.biorxiv.org/content/10.1101/2022.08.16.504079v2), you can see some species with orange (2) and red (3+) for most of the synteny clusters, which is explained by the fact that these species are recent polyploids.
Best, Fabricio