NOTE: This post is advanced and primarily intended for those who wants to know how Polygene works under the hood. This article could be helped with diagrams, but…
Since the Apache Polygene 3.0 was released, we have been hard at work of creating a nicer SQL Entity Store, that lays out the data model in a Enterprise-like manner, rather than the Key-Value approach that has been around in Polygene for years but never really used, since it doesn’t really play well with external tools.
SQL Model
First of all, if you are not familiar with Polygene, please see the Polygene Introduction from earlier this year (2017).
The relational model is fully supportive of the Polygene entities model, and it achieves this with 2 tables for overhead and two tables per mixin type;
- Types table – which maps Polygene/Java types to Mixin tables.
- Entities table – which maps Identity to current value rows in the respective Mixin table.
- Mixin tables – One field for each Property and each Association (not ManyAssociation and NamedAssociation though) declared in the mixin type. Inherited Properties are always stored in their own tables.
- Mixin Association tables – These tables contains the many-to-many mappings of ManyAssociation and NamedAssoication references in the mixin type.
Table names follows, if possible, the simple name of the mixin types.
Example
package org.hedhman.niclas; import org.apache.polygene.api.association.Association; import org.apache.polygene.api.common.Optional; import org.apache.polygene.api.property.Property; import org.apache.polygene.api.common.UseDefaults; public interface Person { Property<String> name(); @Optional Assocation<Person> spouse(); @UseDefaults ManyAssociation<Person> children(); }
This will cause the following DDL (Postgresql);
create schema "POLYGENE" create table if not exists "POLYGENE"."TYPES" ( "_identity" varchar null, "_table_name" varchar null, "_created_at" timestamp null, "_modified_at" timestamp null, primary key ("_identity") ) create table if not exists "POLYGENE"."ENTITIES" ( "_identity" varchar null, "_app_version" varchar null, "_value_id" varchar null, "_version" varchar null, "_type" varchar null, "_modified_at" timestamp null, "_created_at" timestamp null, primary key ("_identity") ) create table "POLYGENE"."Person" ( "_identity" varchar null, "_created_at" timestamp null, "name" varchar null, "spouse" varchar null, primary key ("_identity") ) create table"POLYGENE"."Person_ASSOCS" ( "_identity" varchar null, "_name" varchar null, "_index" varchar null, "_reference" varchar null ) create index "IDX_Person_ASSOCS" on "POLYGENE"."Person_ASSOCS"( "_identity" )
This should almost be self-explanatory. But the Person_ASSOCS table might require some explanation.
When we add two Person instances to the children() ManyAssocation, one row will be written into the Person_ASSOCS, such as;
insert into "POLYGENE"."Person_ASSOCS" ( "_identity", "_name", "_index", "_reference" ) values ( '0a6e89dd-89f7-46a9-9ce4-305f5ddda47c', 'children', '0', 'c8116128-0f4c-4225-813c-c89a8e480c6c' ) insert into "POLYGENE"."Person_ASSOCS" ( "_identity", "_name", "_index", "_reference" ) values ( '0a6e89dd-89f7-46a9-9ce4-305f5ddda47c', 'children', '1', '0a6e89dd-89f7-46a9-9ce4-305f5ddda47c' )
The “index” column is used both to preserve the order the children were added to the ManyAssociation, but also used for the name in the NamedAssociation case.
If there are multiple Polygene types in different packages, the Types table will keep them separated and start adding a sequence number to them. Person, Person_1, Person_2 and so on.
Fetching an Entity
So, in reality all entities are spread out over two or more tables. And when these are loaded back into memory an outer join query will take place. Well, in fact there will be two queries, one for the entity itself and one for the associations. So for a simple entity like Person above, these are a fairly small queries.
select * from "POLYGENE"."ENTITIES" left outer join "POLYGENE"."Person" on "_value_id" = "Person"."_identity" where "ENTITIES"."_identity" = 'c8116128-0f4c-4225-813c-c89a8e480c6c' select * from "POLYGENE"."ENTITIES" join "POLYGENE"."Person_ASSOCS" on "_value_id" = "Person_ASSOCS"."_identity" where "ENTITIES"."_identity" = 'c8116128-0f4c-4225-813c-c89a8e480c6c'
Say that the Person type was a bit more complex;
public interface Person extends LegalEntity, AccountHolder, Customer {}
Then there will be one “left outer join” per mixin type it inherits.
select * from "POLYGENE"."ENTITIES" left outer join "POLYGENE"."Person" on "_value_id" = "Person"."_identity" left outer join "POLYGENE"."LegalEntity" on "_value_id" = "LegalEntity"."_identity" left outer join "POLYGENE"."AccountHolder" on "_value_id" = "AccountHolder"."_identity" left outer join "POLYGENE"."Customer" on "_value_id" = "Customer"."_identity" where "ENTITIES"."_identity" = 'c8116128-0f4c-4225-813c-c89a8e480c6c'
FAQ
Is it available now?
This is still work in development. The ‘develop’ branch right now has the implementation working, but further stress testing is needed. This extension is scheduled to be released in Polygene 3.1 before end of the year.
What Polygene features are not supported?
As far as we know, all Polygene persistence features are supported by the SQL Entity Store extension.
An auxiliary feature in Polygene persistence system is Migration Support, which allows for models to be modified over time and users can add support for entities to be “upgraded” on-the-fly. The Migration support is predicated on the JSON backed storage model that most Entity Stores are using. That is not currently available in this Entity Store, but something similar should come in a future release.
At the time of writing, the underlying JOOQ framework seems to have a bug in MySQL/MariaDb table creation script generation, and those databases don’t work at this moment, but we are working with them to solve this. And we expect that to be resolved before the Apache Polygene 3.1 release itself.
How does Polygene SQL Entity Store differ from JPA/JDO ?
First of all, JPA and JDO don’t have explicit modeling of entities built into their definitions. One is expected to work with CLASSES which are annotated either directly in the code, or in attached XML files.
In Polygene, you work with explicit Entities, Properties and various Associations types, and therefor no additional annotations are required. The entities model in Polygene is a top-down design, where the Polygene model drives that SQL database layout. Once used to the Polygene metaphor, it is much more natural, easier and less boiler-plate code. Additionally, all the AOP features of Polygene applies to entities and many advanced features are easily implemented in Polygene, but requires Hibernate hooks and other special implementation features to achieve in JPA/JDO.
In JPA and JDO, it is possible to “map” almost arbitrarily relational models on to Java classes, and that feature creates an immense complexity both in terms of design/usage as well as the internals. It is not likely that Polygene will attempt to support “existing DBs” to be mapped transparently.
Is Indexing/Query handled as well?
One of the promises of using SQL is that it might be possible to reuse the entity data for querying, and not require a separate indexing store. This is currently not the case, but something we will look into in the future.