The Darlean Tables Service provides a scalable solution for storing and querying structured data in a NoSQL-like manner. It is the Darlean equivalent to table services like AWS DynamoDB and Azure Tables.
Introduction
Even though Darlean is an actor-oriented framework, and actors can store their state via persistence, just using plain actors with standard persistence sometimes is just not enough to organize and query your data.
Example situation
Let’s consider a forum application that may store a huge amount of forum posts (possibly organized in threads). These posts are typically modeled as actors that persist their data via standard persistence. But users of the forum application also want to see a list of all available posts (or threads). And they also want to search in the data for a specific post. Those operations are difficult to implement in a scalable and performing way by using just actors and standard persistence.
One way of solving the issue using actors and standard persistence is by making one “posts-actor” that has the list of all available posts/threads (and key attributes like the id and title) as its state. This is shown in the left part of the below figure. A search can then be simply and very efficiently implemented by iterating over these items in memory.
The downside: the lack of sclability because of the limits on the maximum size of blobs that can be persisted and the amount of memory you want to use for your posts-actor.
Another way of solving the issue is shown in the right part of the above figure. It could be possible to store the state for all post actors in the same partition (by supplying them all with the same partition key, but with a different unique sort key). That gives a PostSearchActor the ability to collect a list of all the unique PostActor id’s, and it can then invoke the Filter action method on each of those actors that returns whether the actor matches with the search criteria or not.
The downside: it is quite a lot of implementation work, and performance-wise it is expensive having to load each of those actors and to invoke an action method on them. Even when we do this in parallel, it is an overly expensive operation of having to activate and invoke all actors when we are only interested in the small fraction of them that match with our search criteria.
Using tables
A much simpler, scalable approach is by using tables. Darlean provides a special type of persistence, ITablePersistence, that can be used instead of the IPersistence that is normally used for persistence of actor state. ITablePersistence can be used as a drop-in replacement for the normal IPersistence, provided that the actor state already is a dictionary (map) of keys and values (which typically is the case when using objects/interfaces to contain actor state).
The underlying Tables service (which is used by the ITablePersistence) provides out-of-the-box functionality:
- To automatically generate one or more indexes on specific key fields of the actor state. This facilitates a very efficient search, provided that the search can be done on exact key values, on key field value prefix, or based on comparison of key field values (≤, ≥).
- To perform efficient table scans when an index can not be used, for example for a fuzzy search. During a table scan, powerful filtering like substring matching can be performed on the fields of the actor state.
When the table persistence is used for the post actors, and proper indexing is configured, it becomes trivial to select those posts that:
- Belong to a certain parent post
- Have no parent post (that is, they are a ‘root post’ or thread)
- Contain a certain word or combination of words in their title or contents
- Are posted before or after a certain data
- Et cetera.
Persistence
Tables are implemented on top of persistence. So, tables do not store their data themselves, but they use the persistence service to store the data and to perform the underlying queries. That means that tables behave the same on different platforms and regardless of the type of underlying storage.
A Table must have a unique name. All data for a single table is stored in one persistence partition. The table name determines in which partition this is. Because all data for a table is stored in one partition, tables provide atomic commit operations: the update of an entity and the additional work of keeping the indexes up to date is either performed entirely, or not at all.
When more throughput is required than can be provided by a single Table, it is possible to use multiple tables (with different id’s, like
my-table-01
,my-table-02
andmy-table-03
). Each entity is stored in exactly one of those tables, for example by deriving the sub-table-name from a hash of entity id’s. This mechanism provides increased throughput, at the price of having to query all tables (sequentially or in parallel, depending on requirements on latency vs amount of processing / data transfer) to get all results.
Entities
Tables store entities. An entity consists of an id and data. The id is an array of strings (string[]
). The data must be a dictionary (map) of key-value pairs. Values can be primitives (booleans, numbers, strings, et cetera) or they can be arrays or dictionaries.
Example
The figure below shows a simple table that contains songs. It consists of one entity with Id ['12345']
, and with a value dictionary that contains the Title
(Waterloo) and Artist
(Abba).
This simple table makes it possible to directly load a certain record when the id
is known. But when more advanced queries are to be performed efficiently, we need additional indexes, like the ByArtist
index.
The above picture shows the Songs table with an additional index. As key we use the name of the artist. Internally, the Tables Service adds an Id field that reference to the main record (that is, the corresponding record in the main table).
This new index makes it possible to:
- Efficiently get the id’s of all songs by a certain artist (like all songs by Abba).
- Efficiently get the id’s of all songs with artist names that start with
'Ab'
, or artist names<= 'G'
.
When we do a lot of queries where we are not only interested in the Id of the main record, but also need the title of the song, we can project the Title
field onto the index:
This approach allows us to do only one query that returns both the Id of the main record as well as the Title, at the expense of needing a bit more storage (because the title is now stored on two places: once in the main table, and once in the index table). The Tables Services makes sure that the projected data stays in sync with the values in the main table.
Tables Service vs Table Persistence
There are two ways of storing, loading and searching entities: directly via de table service, or more conveniently via the table persistence interface. Application code (the blue area in the figure below) can choose whether to use the convenient Table Persistence interface, or directly invoke the raw Tables Service.
The tables service provides low level access to table data. The table service consists of service actors that implement the ITablesService interface. Calls to the action methods get
, put
and search
can be made to interact with the table service.
The table persistence interface provides more convenient higher level access to table data. It consists of a regular class (not an actor) that form a convenience wrapper around the table service. During startup of your application, you generate an instance of TablePersistence with the proper configuration (like how to derive the values for index keys for an entity). You can use that instance to load
data or to perform a search
. When you load data, you get an IPersistable instance that you can use to access the data, to modify it and to store it, just like regular persistence. The complexity of maintaining version numbers and handling pagination is hidden from the developer.
So, which one should you choose? Normally, you would like to use the more convenient table persistence. Only for very specific low-level tasks, use the table service directly.
Storing entities
Directly via ITablesService.put
An entity can be stored by calling ITablesService.put. It expects the following input:
id: string
– The id that uniquely identifies the entity.data: {[key: string]}:unknown}
– The (structured) data of the entity as a dictionary of key-value pairsindexes: IIndexItem[]
– An optional list of index items. For every index in which the entity should be present, one index item should be provided.version: string
– the version of the data to be stored. Only when the providedversion
is lexicographically larger than the version of the current data in the table, the table is updated with the new data.baseline?: string
– an optional baseline that should either not be specified, or be literally copied from the most recent store or load operation for the entity. The baseline value contains information about which index records are present for the entity in the table. When provided, the store operation becomes more efficient because it saves an additional load of this data.specifier?: string
– The specifier that determines in which underlying persistence compartment the entity is stored. All entities for the same table must use the same specifier.
Indexing
Darlean Tables does not require upfront configuration of indexes. It is sufficient that during the storing of entities, a list of IIndexItems is provided, one for each index in which the entity should be present. When the entity should not be indexed at all, or should not be indexed anymore, the array can be omitted.
An index item consists of:
name: string
– the name of the index. Can be anything, but must be the same name as used for the performing of later index queries.keys: string[]
– list of key values for the index item on which queries can be performed. When you have a table of songs, and the index consists of the keys ‘artist’ and ‘title’, the value for keys could look like['Abba', 'Waterloo']
.data?: {[key: string]}:unknown}
– optional dictionary of data that should be stored in the index. Typically, thedata
is a subset (projection) of the fields of thedata
dictionary of the entity. The data can be returned on index searches, which avoids need for another lookup of the main table.
An entity can occur multiple times in an index (but with different key values). Duplicate keys are allowed in indexes, but not for the same entity (new data for a certain entity, index and index key replace the previous data for that entity, index and index key).
When you directly invoke the Tables service via the ITablesService, you have to manually derive the index items for an entity prior to storing the item.
Conveniently via ITablePersistence
When you indirectly store an entity in the Tables service via the ITablePersistence interface, Darlean takes care of the mapping from entity data to the index items. To do this, you have to specify a mapping from entity data to index items when you create a new TablePersistence instance:
const tableService = portal.retrieve<ITablesService>(TABLE_SERVICE, ['songstable']);
const persistence = new TablePersistence<ISongState>(
tableService,
(item) => [{
name: 'byartist',
keys: [item.artist],
data: { title: item.title }
}],
'my-specifier');
//...
const data = await persistence.load(['12345']);
data.change({title: 'Waterloo', artist: 'Abba'});
await data.store(); // Will also update the 'byartist' index
This code snippet illustrates the use of ITablePersistence where an index called byartist
is created that uses the name of the artist as the key, and stores the song title in the index. That makes it possible to find all song titles for a given artist very efficiently:
const options: ITableSearchRequest = {
index: 'byartist',
keys: [{operator: 'eq', value: 'Abba'}]
};
for await (const result of persistence.searchItems(options)) {
console.log(result.id[0], result.tableFields['title']);
}
This query prints the id and song title for all songs by Abba that the Table is aware of:
12345 Waterloo
Loading entities
Directly via ITablesService
When the exact id of a previously stored entity is known, it can be loaded directly. An entity is loaded by means of the ITablesService.get action method. It has the following inputs:
keys: string[]
– the keys (id) of the entityspecifier?: string
– optional specifier that indicates from which compartment the entity must be resolved. Must be the same value as used when storing the entity (and also, must be the same value for all entities in a table).projection?: string[]
– optional projection filter that selects a subset of fields for inclusion in the result. Projection can be used to reduce the amount of data that is transfered.
The response contains:
data?: {[key: string]}:unknown}
– the dictionary of entity data, or the dictionary of projected fields when a projection filter is specified. Is not present (undefined
) when there is no entity with the provided key.version: string
– the version for the data. This corresponds to the version as provided in the most recent successful put operation for the entity.
Conveniently via ITablePersistence
To load an entity via ITablePersistence, which often is more convenient than directly via ITablesService, use ITablePersistence.load. See the above example for the use of the load method. Just provide the key, and load returns an IPersistable that you can use to access the data and to change and store data.
Searching for entities
To be able to search for certain entities in a simple and efficient manner is the key feature of the Tables Service. Queries are, depending on their complexity, performed in one to four steps:
Depending on the query options, a query consists of one or more of the following steps:
- Using the index on the key fields to retrieve the range of entities that match with the given key constraints
- Matching each of the resulting entities against the provided filter.
- Selection of a subset of the data fields that are present in the index
- Lookup of the corresponding main record in the main table, and projecting a subset of fields on the result set.
The following section provides a detailed description of the query options.
Directly via ITablesService.search
To search for entities via ITablesService.search, provide the following options:
index?: string
– Optional name of the index to be used in the search. When not present, the search is performed on the main table (not on an index).keys?: IKeyConstraint[]
– Optional list of key constraints. For every key field in the main table or the index, anoperator
can be specified as well as avalue
(or optionalvalue2
for thebetween
operator).keysOrder: 'ascending' | 'descending'
– Specifies how the entities are sorted.filter?: IFilter
– Optional filter that is applied to each of the entities that match with the provided key constraints. For a query on the main table, the filter can access the entity keys (id) and the data fields. For a query on an index table, the filter can access the index keys and data fields that are projected onto the index (present in the data field of the index). For a query on an index table, the filter does not have access to the fields of the entity data itself. See item filters for more info.tableProjection?: string[]
– When present, indicates which entity fields from the main table are projected to the result. When the search is done on an index table, a performance penalty is involved as this requires an additional (internal) query on the main table. But, it is still more efficient than doing that additional query client-side afterwards. When not present, no fields from the main table are projected on the result for searches on index tables. See projection filters for more info.indexProjection?: string[]
– When present, indicates which fields from the index data are projected onto the result. Only applicable when the search is done on an index table. See projection filters for more info.specifier?: string
– optional specifier that indicates from which compartment the entities must be resolved. Must be the same value as used when storing the entities (and also, must be the same value for all entities in a table).maxItems?: number
– Limits the result set to contain no more than max items.continuationToken?: string
– When a previous query returned a continuation token, it should be passed to a subsequent request to obtain the next chunk of results. When supplying a continuation token, all other query fields must be filled exactly as they were in the initial request.
The search returns a continuationToken
when there can be more results, and a list of items
:
id: string[]
– The id (keys of the main table) of the entitykeys?: string[]
– The keys of the index table for the entitytableFields?: {[key: string]}:unknown}
– The dictionary of original or projected fields of the entity itself (from the main table)indexFields?: {[key: string]}:unknown}
– The dictionary of original or projected fields from the index data.
Conveniently via ITablePersistence
A search via ITablePersistence is very similar to a search via ITablesService, but it is a bit more convenient:
- It is not required to specify the
specifier
(as the table persistence already knows the specifier) - When using searchChunks, Darlean takes care of pagination (handling the
continuationToken
). Client code only has to iterate over the returned chunks via thefor await of
construct for asynchronous generators. - When using searchItems, Darlean takes care of pagination and also unchunks the chunks, so that client code only has to iterate over the returned entities. Again, using the
for await of
construct for asynchronous generators.
Key constraints and operators
When performing a search, it is possible to specify constraints on the keys for the main table or the index on which the search is performed.
The key constraints consists of a list of IKeyConstraint records, one for each key for the main table or index. A key constraint record consists of the following fields:
operator
– the type of constraint (see the list below)value: string
– the value for the constraintvalue2?: string
– additional value for thebetween
operator.
The following operators are supported:
eq
– Evaluates totrue
when the corresponding key value is equal tovalue
.lte
– Evaluates totrue
when the corresponding key value is <= value.gte
– Evaluates totrue
when the corresponding key value is >= value.prefix
– Evaluates totrue
when the corresponding key value starts withvalue
. The match is performed literally (that is, unnormalized and case-sensitive). See the description of normalize.between
– Evaluates totrue
when the corresponding key value is >=value
and <=value2
.contains
– Evaluates totrue
when the corresponding key value containsvalue
. The match is performed literally (that is, unnormalized and case-sensitive). See the description of normalize.containsni
– Evaluates totrue
when the corresponding key value containsvalue
. The match is performed normalized and case-insensitive. See the description of normalize.
All operators work on string values that are compared in a lexicographical way. When it is required to compare numbers, they can be encoded from en decoded to strings via encodeNumber and decodeNumber. These functions encode respectively decode positive and negative numbers, with or without fractions, in such a way that the numbers can be compared lexicographically.