Unified persistence is one of the key functionalities of Darlean. Persistence is the ability to store data for a short or long term, to reload specific data at a later moment, and to search the the data. Unified means that applications can use one single API to use a wide variety of underlying storage services.
Persistence makes it possible that actors can reincarnate (move to different nodes without loss of state) and provides the foundation for integrated services like tables, queues, files and persistent timers.
A layered approach
Darlean uses a layered approach to persistence. As can be seen in the image below, actors can directly use persistence, but they can also use all kinds of other services (tables, queues, timers, files, tracing). These services use the persistence layer to persist their data. The persistence layer is an abstraction over a wide range of storage services, including the physical or virtual file system, AWS (S3, DynamoDB), Azure (Files, Tables), SQL, NoSQL.
Benefits of the layered approach are:
- The layered approach makes it possible to provide application developers with a consistent set of services that behave the same, regardless of the chosen underlying storage service. Application developers can focus on developing their application without worrying about the platform the application will be deployed to.
- The layered approach makes it possible for operations engineers to deploy the application on an entirely different platform with entirely different storage services without having to change a single line of code.
- The layered approach decouples the application code from the platform and its native services, thus removing the vendor lock-in that is typically found in cloud applications.
Compartments
Application data is organized in compartments. A compartment contains related data. You can put all data for your application in one compartment, or split your data up over various compartments, like one for the user accounts, one for the shopping carts, one for your product catalogue, and one for archived orders.
A compartment is bound to exactly one specific underlying storage service. Depending on the type of underlying storage, various configuration options are present (for example the amount of redundancy).
The idea of compartments is that related data goes in one compartment. With related data, we mean: data that is functionally related to each other, has the same availability/redundancy requirements, has about the same life cycle, has a similar backup schedule, et cetera. This makes it possible to perform administrative tasks (like restoring a backup, or moving to another storage service) on an isolated part of your system. It also allows you to configure more expensive, faster and redundant storage for critical parts of your application, and cheaper, slower or less redundant storage for other parts.
Application code can provide a hint to Darlean in which compartment data should be stored. This is done by means of specifiers. A specifier is a string that reflects the functional role of data. Example specifiers are shoppingcart.active
, shoppingcart.archive
and useraccounts
.
Application code passes such a specifier to every store, load or query operation. The configuration of the persistence provides a mapping from these specifiers to the corresponding compartment.
Partition keys and sort keys
Darlean supports the concept of partitioning – but it also works with storage services that do not use partitioning. Partitioning is a mechanism that facilitates horizontal scalability by dividing a data set over multiple partitions (places where the data is stored). Every piece of data is stored in only one partition.
To provide applications with finegrained control over partitioning (like which data should be stored together in which partition), the core concepts of partitioning form an integral part of the Darlean persistence API in the form of partition keys and sort keys that must be provided to every load, store or query operation.
Partition keys and sort keys in Darlean are similar in concept to the same terms in AWS DynamoDB or Azure Tables (in the latter, the sort key is called row key):
- The combination of the two must uniquely identify an item.
- The partition key of an item determines in which partition the item is stored. Two items with the same partition key are guaranteed to be stored in the same partition. Two items with different partition keys can be stored in the same partition, but they can also be stored in different partitions.
When the underlying storage service does not use partitioning, it can simply be considered a service with partitioning, but with only one partition, which means that all data ends up in the same partition. The above mentioned characteristics still hold in this situation, so that application developers do not have to worry whether the underlying storage system does suport partitioning or not: they should just write their software is if it would support partitioning for maximum flexibility and portability.
In Darlean, partition keys and sort keys each can consist of multiple elements (they are arrays of strings). The advantage is that any unicode string can be used as a key element without the need to do client-side escaping of certain reserved characters, such as those internally used as path delimiters. All this escaping is performed by the framework.
Storing data
To store data, the following fields must/can be supplied:
specifier?: string;
An optional specifier that is used to determine in which compartment the data is to be stored.partitionKey: string[];
The partition key that determines in which partition the item is stored. The combination of partitionKey and sortKey must uniquely identify an item.sortKey?: string[];
The optional sort key that makes it possible to perform efficient queries on items with the same partitionKey.value?: Buffer;
The binary value that needs to be stored. When not present, the item is removed from the store. It is up to the application developer to choose the format, but BSON is the only format that is understood by Darlean itself and for which advanced querying functionality (like projection filters and item filters) is available.version: string;
The mandatory version of the data. Only when the provided version string is lexicographically larger than the version for the current item in the store (if any), the item is stored.
See also: API documentation for IPersistenceService.store() and IPersistenceStoreOptions
Loading data
To load previously stored data, the following fields must/can be supplied:
specifier?: string;
An optional specifier that is used to determine from which compartment the data is to be loaded.partitionKey: string[];
The partition key of the item to be loaded. Must be identical to the partition key used for previous storing of the item.sortKey?: string[];
The sort key of the item to be loaded. Must be identical to the sort key used for the previous storing of the item.projectionFilter?: string[];
Optional list of fields that should be present in the result object. When present, the value must be encoded as BSON object, and the selected fields are transformed into a new object and encoded as BSON. When not present, the exact (literal) stored value is returned. See Projection filters.
Darlean returns the following data:
value?: Buffer;
The value of the item, or not present when the item was not found. When a projectionFilter is applied,value
contains a BSON encoded object with the projection fields.version?: string;
The version of the item. That is, the version provided in the last successful store operation. The caller can store the version and use it to derive a new (lexicographically larger) version later on when it wants to update the item via a new store operation.
See also: API documentation for IPersistenceService.load(), IPersistenceLoadOptions and IPersistenceLoadResult.
Querying data
Querying of data is a powerful way of efficiently finding the data that you want. On a functional level, the execution of a query consists of the 3 subsequent steps as illustrated in the figure below:
- The sort key filter step uses constraints on the sort key to narrow down the result set in a very efficient way. That is possible because the sort keys are indexed, and only very specific types of constraints (greater-than-equal and less-than-equal) are allowed.
- The item filter step is applied to all items resulting from the sort key filter step. The item filter step allows more sophisticated filtering, both on keys fields and on data fields (provided that the data is stored as BSON object). See the section on Item filters for more information.
- The field projection step is applied to all items resulting from the item filter step. When field projection is requested, only a configurable subset of fields is returned.
Query Input
Generic options
The following generic fields must/can be provided:
specifier?: string;
An optional specifier that is used to determine in which compartment the query is to be performed.partitionKey: string[];
The partition key of the items to be queried. This is a mandatory field, and only queries on items with the same partition key are allowed. Data should be organized in such a way that this requirement is met.
Sort key options
The following fields control the selection of items based on the sort key. Because most storage services will have an index on the sort key, using these sort key constraints yield very efficient queries:
sortKeyFrom?: string[];
Smallest value (inclusive) that sort keys that items must have to be returned.sortKeyTo?: string[];
Largest value (inclusive) that sort keys for items must have to be returned. Please see the below note about returning child items.sortKeyToMatch?: 'strict' | 'loose';
Indicates whether the last element ofsortKeyTo
must match exactly (full string match) with the corresponding sort key element of items ('strict'
, which is the default) or whether a prefix match is sufficient ('loose'
).sortKeyOrder?: 'ascending' | 'descending';
Indicates whether the resulting items are sorted according to ascending order of the sort keys (the default) or in descending order.
Note: The query always returns all child items of the items that match with the provided
sortKeyFrom
,sortKeyTo
andsortKeyToMatch
values. Child items of a base item are items for which the sort key starts with the sort key of the base item. So, whensortKeyTo = ['T']
, an imaginary item with sort key['T', 'A']
would also be returned because it is a child of['T']
.
Tip: Prefix queries (select all items where the sort key starts with x) can be performed by providing the same value as
sortKeyFrom
andsortKeyTo
together withsortKeyToMatch = 'loose'
.
Item filtering options
When filtering on just the sort key is not sufficient, it is also possible to apply a more powerful item filter. This filter is applied to every item resulting from the previous step (the sort key constraints). Only items that match the filter are returned.
Item filtering can be performed on key fields and on the contents of value. For the latter to work, value must be a BSON encoded object.
filterExpression?: unknown[];
Nested-list structure of filter operations. A list consists of a keyword, followed by zero or more arguments. See Item filters for more information.filterFieldBase?: string;
Optional name of a root element in value that is used as root for finding field values that are part of the filter expression.filterPartitionKeyOffset?: number;
Optional offset that indicates how many leading partition key elements are ignored when deriving the value for a certain partition key field that is part of the filter expression.filterSortKeyOffset?: number;
Optional offset that indicates how many leading sort key elements are ignored when deriving the value for a certain sort key field that is part of the filter expression.
Field projection options
After filtering is performed, it is possible to apply a projection filter. This is only possible when the values are encoded as BSON. When a projection filter is specified, only the specified subset of fields is included in the response.
projectionFilter?: string[];
An optional projection filter. See Projection filters for more information.
Pagination options
Pagination can be controlled via these 2 options:
maxItems?: number;
When present, limits the result set to the specified number of items.continuationToken?: string;
Instructs Darlean to resume a previous query and return the next part of the result set. The token should be the exact same continuation token as returned from a previous query. In addition to that, all other fields must be exactly the same as for the original query.
Query Output
The output of a query is a list of items
, together with an optional continuationToken
that can be used for pagination.
Pagination
A continuation token is a string that is returned when there may be more items that could not be returned in the current response. That can happen when the provided maxItems
was reached, when the maximum allowed response size is reached, or for any other reasons.
The presence of a continuation token is not a guarantee that there are in fact more items remaining. However, the absence of a continuation token is a guarantee that there are no more items remaining.
When items are added/removed from the storage between pagination requests for a query, the added/modified/deleted items may not be included in the result set, or may be included multiple times.
Result items
Per item, the following fields are returned:
sortKey: string[];
The sort key for the item. Note: the partition key is not returned, because, unlike the sort key, it should already be known by the caller, because it is a mandatory field of the request.value?: T;
The value for the item. When aprojectionFilter
is applied, only the projected fields are present as a BSON encoded object.
See also: API documentation for IPersistenceService.query(), IPersistenceQueryOptions and IPersistenceQueryResult.
Projection filters
Projection filters make it possible to only include a part of a structured document in a query result, which saves bandwidth and other resources.
Projection filters can only be used when the data is stored as BSON object.
Projection is performed by
- recursively iterating over all fields from the document (arrays are ignored here)
- reconstructing the full path to that field (in the form
parent.child.subchild
) - matching the full path with the provided projection filter
- if the path matches with the filter, the field is included in the resulting document.
The projection filter is a ‘multifilter’, which means that is consists of comma-separated parts. Each part starts with a symbol ('+'
or '-'
) followed by a wildcard-match expression. The parts are evaluated left-to-right.
The first part for which the wildcard-match expression matches determines what happens. When the symbol is a '+'
, the field is included. Otherwise, the field is not included.
Example
Let’s assume the following simple document:
{
"Hello": "World",
"Moon": {
"State": "full",
"Color": "yellow"
}
}
+*
– Returns the entire structure-*
– Returns{}
+Hello.*, -*
– Returns{ "Hello": "World" }
+Hello.*, -Moon.State, +Moon.*, -*
– Returns{ "Hello": "World", "Moon": { "Color": "yellow" } }
Item filters
When sort key filters are not suitable, it is possible to use item filters. Item filters are less efficient than sort key filters because they require a (partial) table scan, but they are more efficient than ruthlessly querying all data and then performing the filtering on the client side.
Item filters are represented as a structure of nested lists, where the first element of a list represents the operator to perform, and the remainder of the list are the arguments.
Item filter operators
The following operators are supported:
or
– Represents the or of two or more values by returning the first value that is truthy (orfalse
otherwise).and
– Represents the and of two or more values by returningtrue
when all values are truthy.eq
– Returnstrue
when two values are strictly equal to each other (in javascript terms, using the===
operator). For example,'a'
is not equal to'A'
, and the string'123'
is not equal to the number123
.neq
– Not equal, the inverse ofeq
.lte
,lt
,gte
,gt
– Returnstrue
when the first value <=, <, >= or > the second valueliteral
– Returns the first argument as it is specified. Can be used to use a list as a literal value (instead of as a filter operator)not
– Returnstrue
when the first arguments is falsy,false
otherwise.prefix
– Returnstrue
when the first argument starts with the second argument. Arguments must be strings.contains
– Returns the number of arguments that are contained in the first argument. Arguments must be strings. Comparison is performed case-sensitive without normalization.containsni
– Normalized case-insensitive version ofcontains
. Returns the number of arguments that are contained in the first argument. Arguments must be strings. Comparison is performed case-insensitive with normalization. That is, all arguments are first normalized and converted to lowercase, and then compared. Seenormalize
for more information on normalization.field
– Returns the value of the root field in the BSON item data with the name specified in the first argument.pk
– Return the value of thei
-th field of the partition key of the current item, wherei
is the value of the first argument.sk
– Return the value of thei
-th field of the sort key of the current item, wherei
is the value of the first argument.wildcardmatch
– Returnstrue
when the first argument is conform the pattern of the second argument,false
otherwise.uppercase
– Returns the uppercase of the first argument.lowercase
– Returns the lowercase of the first argument.normalize
– Returns the normalized version of the first argument. In the normalized version, diacritic signs are removed (so thaté
becomese
) and compound characters are split up (so that single characterÆ
becomesAE
(two characters) andff
becomesff
). The normalized version is useful in string matching as it is quite robust against variations in how users type special characters.
Some of these operations use the concepts of truthy and falsy. The definitions of these are:
- Falsy –
undefined
, empty string (''
), number0
, booleanfalse
, empty array[]
- Truthy – otherwise.
Items for which the filter expression evaluates to truthy are included in the result set.
Examples:
["eq", ["sk", 2], "Bar"]
– Include items where the 3rd (!) sort key field is equal to "Bar".
["contains", ["field", "Foo"], "a"]
– Include items where data field Foo
includes the letter "a"
.
["and", [...], [...]]
– Include items for which both the expressions that should be placed at the position of the dots evaluate to truthy.
Atomicity and transactions
Atomicity is the property that a set of more than one store operations is either performed completely, or not at all, but nothing in between, even not when there are hardware, software or other failures in between.
Darlean depends on the characteristics regarding atomicity that underlying storage services offer.
In general, store operations that are issued in a batch are processed atomically when all of the items have the same partition key. When the batch consists of items of more than one partition key, those groups of items with the same partition key are processed atomically, but it could be that one such group succeeds (is committed) and another such group fails (is reverted or not performed at all).
When it is not possible to design your data structures in such a way that atomic operations operate on the same partition key, applications must take care of ensuring atomicity by themselves.
One such possibility is to ensure eventual consistency by:
- Scheduling a persistent timer that will repeat the operation
- Performing the individual actions
- When all is successful, remove the scheduled timer
Configuration
Recap of persistence
The runtime applications provide persistence to the applications in the cluster. When applications want to load or store data, they provide a specifier that indicates the functional role of the data. For example, shoppingcart.state
to indicate that the internal state of the shopping cart is to be stored or retrieved.
The runtime then determines which persistence provider should handle the request, and the persistence provider determines in which compartment the data should be stored or loaded from.
A persistence provider is one implementation of persistence. Darlean comes out-of-the-box with a persistence provider that stores data on disk (local or shared, physical or virtual, that does not matter). But other providers (like SQL or cloud storage) can be added as well.
A compartment represents a relatively independent part of the storage. For file system storage, a compartment could be a separate folder in which data is stored. For a SQL provider, it could be a separate table or even a separate database. Compartments provide separation of data, and allow management tasks (like backing up or restoring or moving data to another persistence provider) to take place without affecting other parts of the system.
Because of the complexity of persistence, it is usually configured via configuration files only.
Example configuration file
Here is an example of a persistence configuration that:
- Maps shopping cart related data to a dedicated compartment.
- Overrides the shard count for that dedicated compartment.
- Registers the FileSystem Persistence Service as provider for fs.* compartments.
{
runtime: {
// Configuration of the generic persistence service
persistence: {
// Mapping from specifiers to compartment.
// Let's map our shoppingcart related data to a dedicated compartment.
specifiers: [{ specifier: 'shoppingcart.*', compartment: 'fs.shoppingcart' }],
// Mapping from compartment mask to which actor type implements the persistence service
// Note: This default mapping is already provided by default, we just show it here
// for illustration
handlers: [{ compartment: 'fs.*', actorType: 'io.darlean.fspersistenceservice' }],
},
// Configuration of the file-system persistence service
fspersistence: {
compartments: [
// Default settings for all compartments
{ compartment: '*', basePath: './persistence', shardCount: 8 },
// Settings for the compartment where shopping cart info is stored. We choose here
// to increase the shard count to allow even more throughput than Darlean already
// provides.
{ compartment: 'fs.shoppingcart', shardCount: 32 }
]
}
}
}
Specifier mapping
The optional specifier mapping (specifiers: [...]
) determines in which compartment requests with a certain specifier end up. In other words, it maps the specifiers that the persistence service receives from applications that want to load or store data on compartment names.
The specifier mapping configuration consists of a list of records, where each record maps one or more specifiers to one or more compartments. A record has the following fields:
specifier
– A wildcard filter exression that determines to which specifier(s) this record appliescompartment
– The compartment in which data for the specifier(s) must be stored.
The specifier filter expression can contain wildcards, like a.*
or a.*.b
. That makes it possible to map multiple specifiers on a certain compartment.
The compartment name can contain wildcard placeholders. Wildcard placeholders look like ${*}
or ${**}
. The placeholders are replaced with the value from the specifier that matches with the first, respectively second, wildcard match.
Example.
The record
{ specifier: 'foo.*.z', compartment: 'fs.bar.${*}' }
applied to specifierfoo.abc.z
results in compartmentfs.bar.abc
.
The first record for which the specifier filter matches applies, all remaining records are ignored.
Two extreme configurations are:
- All specifiers map to the same compartment.
{ specifier: '*', compartment: 'fs.default' }
. - Each specifier has its own compartment.
{ specifier: '*', compartment: 'fs.${*}' }
.
By default, all specifiers map to one single comparment
fs.default
.
Handler configuration
The optional handler configuration (handlers: [...]
) determines which persistence provider handles the requests for a certain compartment. The configuration consists of a list of records, where each record maps one or more compartments to one handler. The record has the following fields:
compartment
– A wildcard filter expression that determines to which compartments this record applies.actorType
– The type of the actor that implements the persistence provider.
All requests of which the compartment matches with the compartment
filter expression are forwarded to an actor with the specified actorType
and with an id consisting of one element, namely the full compartment name from the request (not the compartment wildcard filter expression).
By default, Darlean already provides the listed mapping from
fs.*
toio.darlean.fspersistenceservice
, so it is not necessary to include this in your configuration file unless you want to use a different actor type for compartments that start withfs.
.See also: Configuration options for persistence