Crank Specifications

*Crank is the origin of LDAP data, entered through a web interface which feeds into an (internalised) LDAP component. The intention of the Crank is to originate the data that travels through the SteamWorks.

This site reflects work in progress.

The Crank is an LDAP client, which inserts changes in as transactional a manner as possible. The changes are presented in a manner that is suitable for web interfaces, which is the interface of choice for making these changes.

Realtime update pulling: LDAP SyncRepl

SteamWorks is based on LDAP SyncRepl (RFC 4533) which permits a mode refreshAndPersist which is perfect for receiving real-time updates to the results of a past queries. Services could find it difficult to process information in batches, and those may use refreshOnly to collect data for such batches (perhaps as cron jobs). The result would not be real-time, but there could be a maximum delay before a service processed updates. Applications that stay up (such as daemons) could use refreshAndPersist mode to receive updates and continue to receive further updates in real time.

Design around Idempotence

Idempotence is a mathematical property; one could say it is the impotence of idem; repeating things is not going to add anything; repeating things is not going to add anything.

When an LDAP object is sent somewhere as an update, and it turns out to not cause a change, then that change can be ignored. This is especially useful when servers are chaining updates to each other; the forwarding process stops automatically, and this property permits arbitrary update graphs, even including cycles. The flow of updates triggering updates terminates as soon as it reaches a server that has already seen it before.

Idempotence is incorporated into the design of the SteamWorks wherever it helps to put an end to repeated updates.

Design around Access Control

The Crank deposits updates into an LDAP network; it incorporates a directory service that supports SyncRepl. The purpose of Crank is not to incorporate information from other sources, it is purely a source of LDAP information.

The clients that push information into a Crank service are web-based and take the shape of a FastCGI interface. The access control to this interface is arranged through a surrounding web server and network protection measures around its link to the FastCGI interface. The FastCGI interface has full read/write access to its built-in LDAP service.

The clients that pull information from a Crank service in a SteamWorks configuration are its Shaft components. These login across realms using Kerberos5, as built into GSS-API, as built into SASL, as built into LDAP. As an alternative, other SASL mechanisms could also be used, so an existing SASL solution will be quite usable. The LDAP stack must be able to support the Start TLS extension, and be able to enforce its use. Services using Kerberos are as straightforward to setup as pre-shared keys; we can provision a KDC and service tickets for this project.

The tree structure will often be stored as cn=TOPIC,SUFFIX, which matches the FreeIPA structure. Underneath there may be multiple branches, for instance dc=example,dc=com,cn=TOPIC,SUFFIX or associatedDomain=example.com,cn=TOPIC,SUFFIX might be used to distinguish more specific setups. The SUFFIX is expected to hold a realm name of some kind, possible as a domain name or a Kerberos5 realm name. There may be multiple of these suffixes, covering multiple configuration realms. The data stored in the tree objects is perceived to mainly deal with configuration, which means that a lot of LDAP configuration may pop up here.

Access control for clients that pull information from a Crank service over an authenticated LDAP connection should be flexible enough to allow for a dedicated setup per pulling client:

modify/add/delete access is never granted to pulling clients;
the attributes that are visible/searchable to a certain pulling client;
the group of DNs that are visible/searchable to a certain pulling client;
LDAP SyncRepl is available and implements these rules.

The addition of DNs to the group of DNs that are visible/searchable as well as the attributes visible/searchable to a certain pulling client are available over a separate FastCGI service. It would be useful to support a simple callback API to control the access to this particular part of the service.

Note that a group of DNs is not the only possible format of storing access rights in LDAP; another option would be to add access-granted users and groups inside objects when they are made accessible.

Design around Optimistic Transactions

The SyncRepl specification separates runs of work, each usually comprising of a series of complete object rewrites and object deletions. It marks the end of such runs of work explicitly, but it does not assign any semantics to it.

The SteamWorks read meaning into these runs of work, namely as database transactions. Normally in LDAP, object updates are considered atomic, but there is no mechanism to make atomic updates across multiple objects. Formally, any assumption of such semantics can at best be called optimistic, but if all components of SteamWorks implement the approach then a realistic opportunity arises to actually make it work.

This is why the runs of work presented by SyncRepl will be downloaded completely by SteamWorks components before they are applied.

The intention of Crank is to pass changes out in transactions as much as possible. This can be approached with SyncRepl, which collects changes into two phases; one for adding objects and another for deleting objects. Even if these changes are not made atomically and in isolation, they still come very close to transactional updates, and are probably the best possible approach to transactions with LDAP.

When submitting changes, it is useful to keep this in mind and therefore to submit updates in bulk, and perform those updates atomically; that is, in case of failure of part of the updates the entire bulk update should fail.

LDAP has a few mechanisms for doing this, but not all are obliged parts of the standard. We propose to try the most advanced first, and to fall back to lesser versions otherwise, under the assumption that this has no dramatic impact on the amount of work. The result should definately work with the optimistic transactions on the LDAP server selected to run inside Crank; but being able to swap that avoids adding yet another fixed-solution piece to the scene that already has to deal with incompatibilities of FreeIPA and Samba4.

ACID transactions. The most advanced are the ACID transaction semantics of RFC 5805. These do not seem to have been implemented anywhere. The standard introduces an LDAP Control that can be used to inquire if the facilities are available on the server. When they do, handling transactions is easy: the LDAP server can be asked the begin, abort and commit transactions just like this is possible with an SQL database.

Although ACID has been considered for OpenLDAP, it has hitherto not been implemented, nor has it been added to RedHat's 389 Directory Server. The desire to prepare for this addition in Crank stems from a wish to ask for movement in this direction in software, and perhaps to implement this in a later ARPA2 project; furthermore, it is assumed to be a very simply bypass for the code required for the other two methods that are needed to work on today's servers.

Prereading. Crank is based on a single writing process, and under the assumption that the FastCGI interface does indeed write to LDAP one update at a time, it is quite possible to use a simpler update model. RFC 4527 introduces a facility of pre-reading; this means that whenever an object is updated, which is always done atomically in LDAP, it will return the old value of the object. This old value can be stored for a manually performed rewinding of the LDAP update if parts of it were to fail.

Manual rollback. Finally, LDAP is modified through add, delete, modify and modrdn operations. The state prior to each can be retrieved manually just before performing the work, and the result can be rolled back manually if so required. This is close to prereading, albeit manually implemented. As with prereading, this is dependent on the property that Crank is active with only a single transaction on the LDAP server at a time; which is a realistic assumption for Crank.

Design around Web Interaction

The interaction with Crank is done over FastCGI, over which it communicates JSON data. The basic idea is that a web client accesses this interface through an HTTP server which takes care of access control and redirects acceptable clients to this FastCGI interface. In support of this, the JSON interface has an entrance point for retrieval of structural definitions, unpacked into a JSON-format and keeping all OIDs available. More information can be found in RFC 4512 starting from the subschemaSubentry attribute in the directory root.

Through the FastCGI interface, the web client can collect information as it is currently stored in Crank's LDAP. Based on this information, it builds up a difference to be applied. This difference includes explicitly required assumptions about prior state, which can serve as a precondition to the changes. The conceptual model of LDIF Updates could be followed, but translated into JSON for easy access in a web environment. Preconditions could take the shape of merely mentioning a DN, or a fifth kind of "update" to the format, basically a "check" to be performed on the value prior to an update.

The collected changes are submitted, including these preconditions, to Crank which will apply them as a transaction, and report back whether it succeeded or failed, and in case of failure where things went awry. It is beneficial if as many problems as possible are reported back in case of failure. The failure notice should help an interface to present messages along the line of attribute X of object Y changed to Z, do you still want to submit this change? -- possibly using colours and asking whether the user agrees to retry from the present situation.

The same FastCGI interface also permits reading information from the data stored in the Crank's LDAP server, and get it deliverd in a JSON mapping that matches the concepts of LDIF. As part of such reads, prior subscription to Server-Sent Events may receive any future changes to the data that has been read. The Crank internally routes incoming changes after their transactions have succeeded to any such listeners. The idea of this interface is to permit user interfaces to show dynamically what changes have been made by other users of Crank. This may impact current work.

Finally, as described under access control, a JSON format must exist to change what pulling clients may read or search for. This format would most likely look like normal LDAP updates, but sent to another node where a DN is treated like a client identifier and where groups of DNs and attributes may be described.

Alternative Implementation Strategies

There are (at least) two possible ways of implementing the Crank.

Client-style. This implementation style addresses an existing LDAP server over the standard LDAP protocol, usually in the role of a directory manager, and submits data. Transactions must be run over LDAP, causing some extra efforts for negotiating the best variety. Cancelling a transaction may be visible to pulling clients, although this would fall within a very small period. There is only one Crank addressing the LDAP server with updates, so sequencing transactions can be done outside of this server, which takes some of the pressure off of transaction handling.

Server-style. This implementation style provides a minimalist LDAP server that is just enough to feed a listening Shaft component with SyncRepl updates. There is no use for searching data, let alone doing this efficiently; basically a log of succeeded transactions must be replayed in response to a refreshOnly or a refreshAndPersist request, where the updates are combined into one large transaction (adding and deleting the same object, in whatever order, can be reduced to the last of these operations). Such merging may also be required from time to time for garbage collection of historic transactions; SyncRepl has a mechanism to replay from this initial data if a client asks for updates since an outdated moment.

We request, if possible, a client-style implementation based on the 389 Directory Server that is used in FreeIPA, because we see parallels between the FreeIPA project and our intentions of sharing some configuration details across the boundaries of organisations.

FreeIPA focusses chiefly on identity management (for users and hosts) under a single administrative/security realm, whereas we are trying to support hosting environments with multiple such realms and we want to crossover to external providers for plugin services; parties with whom a directory may be willing to co-operate, but not widely open. The use of 389 DS makes it more likely that we will be able to integrate parts of the FreeIPA solution, bind into it for user administration, and so on. One very interesting aspect of FreeIPA is its ability to setup a trust link to Active Directory domains.

Design around Availability

The presumed client for Crank is a Shaft component, which has local caches to survive downtime of an upstream component. This is because the path between Crank and Shaft often crosses organisational boundaries, routing through the general Internet and being outside the control of either or both organisations involved.

This means that the Crank design does not need to take special precautions in relation to availability, other than reproducibility of the data after downtime.

Design for Monitoring

An interface from Crank to Nagios is desired. This may be shaped as a checker script with potential FastCGI support, or it may be an activity of the daemon to check its internal state and trigger external scripts in case of trouble; such scripts could then report to a monitoring application such as Nagios. Of course there should also be support for Nagios' assumption of regular uptime checks, both pulling and pushing. The actual Nagios scripts may be supplied if easy to do, but they don't have to be.

Program Requirements

Crank will be one or two daemon programs written in C, using an asynchronous style -- so no threads or process forking other than to fork off to become a daemon. The code shall be suitable for compilation on POSIX and Windows. Testing must be done on Linux, and we may be able to help out with ports to Windows if the code is kept sufficiently general.

Included is a commandline utility that can start and stop this server (or a given one of these two servers) when provided with an IP/port combination.

All software interaces will be documented in common formats; man pages for commands and daemons; annotated text formats such as Markdown for APIs, including JSON data formats.

The software will be delivered in open source, for instance as pull requests to a github repository to be determined. The following license will be applied to the code:

Copyright (c) 2014 InternetWide.org and the ARPA2.net project
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

1. Redistributions of source code must retain the above copyright
   notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright
   notice, this list of conditions and the following disclaimer in
   the documentation and/or other materials provided with the
   distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.