Pulley Specifications
Pulley is a realtime-synchronised subscription to LDAP data. This information is pulled into servers, where the data is used to generate local configuration databases.
This site reflects work in progress.
This specification details how Pulley can be used to co-ordinate satellite services that are “plugged into” a domain.
Realtime pulling: LDAP SyncRepl
The Pulley specification is based on LDAP SyncRepl (RFC 4533) which permits a mode refreshAndPersist which is perfect for receiving updates over a query outcome in real-time. Services could find it difficult to use information immediately, and those may initially use refreshOnly to run updates in batches (probably as cron jobs). The result would not be real-time, but there could be a maximum delay before a service processed updates.
Design around Idempotence
Idempotence is a mathematical property; one could say it is the impotence of idem; repeating things is not going to add anything.
When an LDAP object is sent somewhere as an update, and it turns out to not make any difference, then that change is ignored and not further distributed. The result will be that the flow of updates triggering updates stops.
Idempotence is incorporated into the design of the Pulley wherever it helps to put an end to repeated updates.
Use a Local Shaft for Local Consistency
It is advised to install Shaft instances on every network; usually this means within the boundaries of network administration and control. This establishes a locally consistent view on the information that is posted across this network from remote areas of the Internet, and makes the local network tolerant to faults. But most importantly, this improves local consistency; when multiple servers (for instance, IMAP and SMTP) take information from the same Shaft, then it is useful if they are always in sync. In contrast, when their services independently pull from a remote LDAP, they might notice intermediate downtime in other ways, and end up in inconsistent states that can look nasty in terms of end user experience.
Instead, it is better to have a single Shaft instance pulling from the remote LDAP, and all local Pulley instances pulling from that instance.
One Upstream LDAP Server
As described above, it is assumed that a single Shaft instance serves the local network. If there is a need to listen to multiple upstream Crank and/or Shaft sources, then this is arranged in the Shaft uplink, which presents a single directory hierarchy to Pulley.
This means that the Pulley can safely make the simplifying assumption that there is only one upstream LDAP server; this means it will not query multiple sources and it will not force sensitive cross-realm credentials to be installed on local services.
Downloading Data
The data to be downloaded for Pulley's configuration desires is described as one or more LDAP queries. These are sent to the upstream LDAP server, further assumed to be a Shaft instance.
The administrator will not construct explicit LDAP queries; instead, those will be constructed from the attempts to derive information to be stored in configuration file formats. The only thing that is needed in terms of LDAP configuration is pointing to the server running Shaft and possibly credentials to access it. All the rest is automatically derived by Pulley.
Global Configuration File
The global configuration is stored in a file, usually
/etc/steamworks/pulley.conf
and filled with the following details:
-
The hostname(s) that run the upstream LDAP service, usually in the form of a Shaft component.
-
Optionally, the credentials to use when accessing that LDAP service;
-
Daemon control options, such as whether to background Pulley, where to send log messages, and the directory holding the Machine Configuration Files.
Machine Configuration Files
Each of these files follows a similar format, but each also stands on its own. The recommended use for these files within a distribution is to package them with a piece of software that can be configured with Pulley, and then to leave it to the systems administrator whether Pulley will actually be installed to take care of the configuration.
The purpose of a Machine Configuration File is to download data from the globally configured upstream LDAP service, and turn it into a configuration for the Machine, or in other words, for the program being configured. Most of these programs permit inclusion of configurations, and/or of data from databases, and that can be used to update configurations on the fly.
The actual production of configurations is a matter that is highly dependent on the receiving Machine; for this reason, a plugin architecture has been constructed that permits plugins with a well-defined Python API. These plugins are provided with additions and removals as instructed from the Machine Configuration File. The task of Pulley is to construct a data flow machine from LDAP updates to these plugins, and to take care of minimal work and maximal reliability in passing on the configuration data.
Conceptual Configuration Model
Conceptually, the configuration files are closer to a declarative program that is intensively rewritten to a computable form than most configuration files.
The format is perhaps best described as set comprehension, with each of the guards written out on a separate line and producing lines usually at the end.
One effect of this style is what could be called "implicit forking" of results: When a variable could take on multiple values, then the definitions that follow can be considered replicated for each of these values. There is no need to spell out iteration, it is implied. A side-effect of this approach is that when there is no possible value available, then no instances of the definitions are made at all. This applies for instance when multiple-value attributes are bound to an attribute variable, or when a DN is specified in a form that permits variation.
It is important to understand that Pulley surpasses the expressiveness of LDAP, by permitting multiple queries whose results are joined; in other words, where the forks lead to a bounded-but-potentially-large number of combinations, they are effectively reduced in number by equating forks; this is not dissimilar to the join function in SQL. A difference however, is that unrelated forks are not combined into a combinatorial product as may be the case in SQL queries.
The configuration works towards output assertions that contain a list of variables that may, in any way supported by a backend plugin, be processed into a configuration. Such assertions are idempotent, meaning that once stated there is no added value in stating it again; and once retracted, there is no added value in retracting it again. Such repeated (de)assertions are therefore silently ignored; they are not false or illegal, but just useless.
Special situations are handled with grace. If a change in the offered LDAP configuration removes a "fork" for a certain value, then any assertions that were derived from it in the past will be retracted. This is not shown in the configuration file, because it suffices to specify assertions in a declarative manner to know when to retract it.
Another aspect not shown in the configuration is the transaction processing. This is useful because it permits instant replacement of an old dataset with a new one, inasfar as LDAP groups such changes into its SyncRepl updates; the idea is to avoid changes to a variable to be visible as an intermittent drop of the variable value. For instance, when changing an email account it should preferrably not have a glitch where it does not exist.
Every backend plugin must support two-phase commit operations. This
means that a backend will support setting a binary string as the commit
identifier, and the last of these identifiers can be dug up later on. In
addition, it is possible to ask a backend if it has successfully done a
prepare_to_commit
to a certain identifier, but is still waiting for
the actual commit
to happen. After a restart, if some of the backends
did commit and others just prepared (for the same identifier) then it is
safe to commit the last ones too. The commit identifier can be used in
the SyncRepl mechanism, which offers to use such state to avoid
repeating older configuration.
In fact, the commit identifier does not only contain the state identifier from SyncRepl, but also a timestamp and/or checksum for the configuration file. This is used to permit complete recalculation (and flushing of any caches) whenever the configuration files are changed at the time that they are reloaded. This ensures that the incremental nature of script processing cannot be confused when the script is modified halfway.
Configuration File Parsing
The Machine Configuration File is structured on a line-by-line basis. The description that follows indicates how the files are processed. Since the grammar is not LL(1), but rather based on breaking up the line piece by piece, the configuration file format is best described through its parser. (And rest assured that we have heard of XML as a possible file format, but these specifications are meant for human consumption, and so their readability and evasion of misinterpretation is preferred over living up to a random/popular syntax style guide.)
While parsing lexemes, the special symbol #
followed by arbitrary text
until the end of the line is considered to be (an extension of)
whitespace. Whitespace that follows any lexeme is silently removed. Lines
with no lexemes are entirely removed from the configuration. Whitespace
prefixing a line's first lexeme currently consititutes a syntax error;
in the future this may be used to block parts of the script together, in a
similar style to Python and Miranda.
The result is now a sequence of lines. Each line is modelled as a
numeric indentation level (currently only 0 is possible) followed by a
sequence of lexemes. The textual representation of this format, with
\n
at the end of each line, is hashed for purposes of detecting
configuration script changes.
Words that start with a capital are reserved words; they refer to LDAP attribute types and, perhaps at some point, object classes. This is a modification relative to LDAP, for reasons of readability. All variable names start with a lower case letter.
The lines that now remain may have an optional guard. A guard signals
one or more variables that must be known for the remainder of the line
to be activated. The format is var,var,var => rest_of_line
where the
var
are guards and the rest_of_line
is the guarded line. When a
guard disables a line, its effects will be cancelled out or removed, as
appropriate. This means that a guard has dynamic impact on the
rest_of_line
and actively decides on parts of the configuration that
is enacted. Note that any free variables occurring in rest_of_line
may
be thought of as part of the guarding variable list; if one does not
exist, the rest_of_line
is also, implicitly, disabled. Further
processing applies to the rest_of_line
part.
The remaining line can take a number of forms:
-
[binding] <- nodes
are called generator lines; they bind variables inbinding
to existing entries specified innodes
.Constant values mean the same thing in both
binding
andnodes
declarations, namely a value that should be matched.The
nodes
declaration ends in a starting point, which is the name of a variable of typeDistinguishedName
; the nameworld
is predefined to point to the root entry configured for Pulley. Other variables may be picked up as attributes or even RDNs in abinding
part, or they may be bound to any entry using the@
construct in abinding
part.Variables occurring in
nodes
are dug up from prior declarations and will be used to select entries attributes. It is good to realise that this can often be read as a DN formed asbinding,nodes
but with a different interpretation for variables to the left and right of the<-
arrow: to the left they are being bound and to the right they are being included.The
binding
can take one of a few forms:-
It may be a DN, where variables occur in places that should be bound.
-
There may be a
@Variable
prefix to bind the namedVariable
to the DN of the path at hand. -
There may be one or more comma-separated
AttrType: Variable
constructs to bind an attribute of a certainAttrType
to aVariable
that can further be used in the context. When multiple values exist, then these can be considered to be forked; the rest of the declarations will individually be applied for each of the values found.
-
-
[binding] <-- nodes
is functionally equivalent tobinding <- nodes
, but the results are cached in a key-to-value mapping. The key consists of the sequence of variable values used innodes
(which is expanded until it ends inworld
) and the mapped value is a set of potential bindings, regardless of further constraint evaluation, each comprising of the sequence of variable values bound inbinding
.TODO: Take note of whether a binding is active or disabled? And if it is disabled, perhaps the reason(s)?
The choice whether to cache a query is left to the administrator, because it involves a subjective trade-off between local storage and network traffic. Caches persist across restarts, but they will be flushed when the configuration script has altered or when the SyncRepl mechanism calls for a complete resync.
Note however, that it is incorrect to assume that
<-
is uncached; it may be wholly or partially cached if Pulley finds reasons to do so. -
attrvar comparator attrval
compares anattrvar
bound above with anattrval
value which may be an expression, including constant values and anotherattrvar
. Thecomparator
can be anything possible inside a filter. Note the uselessness of the filter form(!(attrvar=*))
to detect presence of an attribute, because presence is implied if a variable exists at all. This type of expression is meant to be translated back to the LDAP query from which it originated, and intended to form a filter expression. -
driver (args) -> attrvars
sends data to a configuration target. What it does is establish to the given internal/plugindriver
type, with givenargs
for setting up an instance of the given driver type, that theattrvars
are available, in the given combination. The API of the driver will support both insertion and removal of such statements, but these are not shown in the configuration file. TODO: Will there be multiple paths to come to possibleattrvars
to include?
Generally, a DN entry takes the shape of zero or more RDNs prefixed to a
DN. (Well, actually the RDN is a plus-separated set of RNDs, but that is
not the point here.) This recursive definition calls for "root" DNs,
which may either be a DN-typed variable or a predefined one. One
variable is predefined: world
contains all the objects in the upstream
LDAP service.
It is possible to define what constitutes "virtual attribute types",
which would be compositions of an RDN sequence rewritten into a combined
form. Specifically consider the case of DCList
, which takes any
sequence of DomainComponent=
RDNs and combines them into a DNS name.
As a result, the following would produce a domain dom
from the top of
the DN:
@domains, DCList=dom <- world
It is now possible to further zoom in with things like
UserID:friend <- CommonName="Whitelist", domains
We currently do not believe that the concept of virtual attribute types needs general treatment, so it is hard-wired into the Pulley code instead of through, say, a user-defined Python function. The following hard-wired definitions are currently available:
-
DCList
combinesDomainComponent=
RDNs into a DNS name; it will deliver all results with at least one such component, sodc=example,dc=com
will result in bindings forexample.com
as well as forcom
; it is likely that further generation constraints remove the fork forcom
but that is not done by default. The generated DNS Names are in lowercase and not terminated with a dot, since all names are considered to hang under the DNS root. -
RDN
matches one RDN, and stores its outcome in a variable typed as an RDN list. It can be used as a condition that one RDN can be inserted, but it is more likely to be used intermediately. There is no way of finding back the skipped level. -
RDNList
matches zero or more RDNs, and stores its outcome in a variable typed as an RDN list. It adds no value as a condition because zero or more RDNs can be inserted underneath any entry. There is no way of finding back the skipped levels. The use of this virtual attribute type will lead to an LDAP query for subtree scope. This makes it a different kind of SyncRepl subscription than one for another scope. -
Note the absense of a subtree match for a sequence one or more RDNs, as this is easily constructed by using the form
RDNList x,RDN y
orRDN x,RDNList y
which will do the desired thing.
The virtual attribute type is a future extension mechanism that will
probably include Python functions to make them general; for now, the
DCList
is the only useful application of this mechanism, so the name
will be hard-wired into Pulley.
Virtual attributes are not just useful to bind variables with more
dynamic RDNs, but they may also be used to match such structures when
used on the right side of the <-
arrow.
The syntax of the Machine Configuration Files can be extended in a number of ways, but none of these appear to be necessary at this moment. Conveniences that may be useful in a later stage include block processing, e.g. for guards, and additional forms of conditions and expressions.
LDAP Query Derivation
In most scripts, there will be a few variables that refer to entries,
and those may fork over multiple outcomes. The predefined variable
world
is an example of one that is not forked, but its further matches
may fork. Any further use of these entries to come to subtree entries
costitutes a further forking possibility, at least when variables are
being bound.
Every place where a fork can occur is a useful place for SyncRepl subscription of a query, because it is a place where new forks can be added, or old ones removed. Places which do not fork still represent outcomes that may be created and destroyed, albeit not as a set but rather as a single value.
It is easy to go overboard with SyncRepl however. Take this example:
UserID:uid <- CommonName="Whitelist", DCList=dom, world
This instructs Pulley to bind uid
values from a whitelist entry under
a variable domain dom
that has already been bound before. It is not a
good idea to create a SyncRepl subscription for every possible dom
, as
that would lead to unbounded subscriptions; in general, the number of
subscriptions should remain in the order of the number of lines in the
configuration script, to avoid running into design inefficiencies. This
means that the subscription in this case would be to world
, and any
further matching is done according to the constraints of this generator
line. Needless to say that multiple subscriptions to the same LDAP query
are also senseless, and should be optimised out.
This does however introduce a responsibility that works in two directions:
-
Whenever a new
dom
is forked (or when one is taken away) the bindings ofuid
resulting from it must be expanded (or removed); -
Whenever a new
uid
is added to (or removed from) any entry matching this pattern, a newuid
must only be forked when there is a fork fordom
in the position presented in this line;
This complicates Pulley's processing of LDAP information, but the declarative style of the script makes it much more straightforward to describe the right thing in the configuration file; and simplicity has the direct benefit of reducing the number of mistakes made.
In general, the number of variables involved determines the complexity of the processing strategy. It is very useful that such complexities are dealt with automatically, and not left to a programmer. Because they are not so much conceptual complexities, but implementation complexities that occur when a more procedural style of description is used. This is the sole reason for wanting to describe the configuration script in a declarative style: it enables automation through back-and-forth calculations.
Note that other lines may add extra cross-relations; further forks do
this, and so do attribute bindings and conditions on them. Sometimes, a
condition could be passed back to the original LDAP query, but that
would only be possible when LDAP is sufficiently expressive; it is
limited to comparisons of attributes with constant with, such as
(uid!=root)
. Such comparisons are not going to be as common as
comparisons between variables, and for that reason (as well as reducing
the number of shared SyncRepl subscriptions) it is not a high priority
to actually implement this traffic-optimising facility.
Note that there is support for constant comparisons in RDNs, as this really does save a lot of for unnecessary entries being sent downstream. It is actually advised to introduce domain names underneath a classification of the kind of information needed for a service, for these reasons of efficiency. The following two structures may seem equivalent, but their SyncRepl subscriptions are cut off at different points, making the latter far more efficient:
UserID:uid <- CommonName="Whitelist", DCList=dom, world
UserID:uid <- DCList=dom, CommonName="Whitelist", world
When configuring the Crank and/or Shaft components it is good to take note of the efficiency variations caused by this directory structure design choice.
TODO: Derive attribute list to be requested.
Data Flow Implementation
The best implementation for Pulley is probably to construct an internal data flow mechanism, and feed it with any changes that come in from the upstream LDAP service. Note that this combines well with an asynchronous style of programming for all LDAP interactions.
The start for all work is an incoming LDAP update. Entries may be added or removed, and attributes may be added or removed. Note that changes can be treated as remove-then-add because the transactional model for LDAP at a minimum places transactional boundaries around object updates, and this will be reflected in the transactional interface to the plugin/internal modules that actually generate configurations.
Obviously, caches should be updated when a change is made to the stored data. Since the semantics of caches are transparency without a need to query for data over LDAP, the constraints to the implementation need not be further specified here.
Variables do not occur in isolation; they are bound in groups by generators and they are used together in conditions and configuration writers. Their partitioning is a driving force to processing updates. The result is a sort of data flow to process updates: a generator sends an update, some data on other variables must be collected from other generators, conditions applied and configurations adapted.
The following steps prepare Pulley for processing LDAP updates:
-
Let Vars be the set of all variables that occur in the configuration script.
-
For each variable v in Vars, find the single generator Generator(v) that binds v. Also construct the inverse, that is the set of variables generated by each generator g, and name it GeneratedVars(g).
-
For each variable v in Vars, find the set of writers Writers(v) that use v in either a prefixed guard or in the actual write statement. Also construct the inverse, that is a set of variables needed by each writer w, and name it WriterVars(w).
-
For each variable v in Vars, find the set of conditions Conditions(v) that refer to v in either a prefixed guard or in the actual condition. Also construct the inverse, that is a set of variables referenced by each condition c, and name it ConditionVars(c).
Based on this representation of the configuration script, we proceed to analyse the structures that will help us handle updates to generator output:
-
Construct VarPartitions, a partition of all variables v based on whether they occur together in one or more conditions. We will write VarPartition(v) to return the element of VarPartitions, so the set of variables, that includes v.
Formally,
VarPartition(v) = { j | j ∈ ConditionVars(Conditions(v)) }
VarPartitions = { VarPartition(v) | v ∈ Vars }.
This partitioning is global to the configuration script, because conditions introduce dependencies among variables in always the same manner. Its use is to indicate cross-influences between variables, so it can be established what else needs to be taken into account when
Note that only conditions influence partitions:
-
Nodes in generators just consume whatever is made available;
-
Bindings in generators just make data available to anyone who is listening;
-
Writers simply take what they are given, without influencing it;
-
Guards on a condition are included in the variable set on which it depends;
-
Guards on generators are of just as little influence as the nodes;
-
Guards on writers are of just as little influence as the rest of the writers.
-
-
For each writer w, find the set WriterImpacters(w) of variables that must be brought together to determine impact on w with
WriterImpacters(w) = { j | i ∈ WriterVars(w), j ∈ VarPartition(i) }.
-
For each writer w, find the set set WriterGenerators(w) of all generators that must be run to collect all its variables in variables set V with WriterGenerators(w) = { Generator(v) | v ∈ WriterImpacters(w) }. Also determine the inverse, which is the function GeneratorWriters(g) with the set of writers that are influenced by the given generator g, formally
GeneratorWriters(g) = { w | v ∈ GeneratorVars(g), w ∈ Writers(v) }.
-
Construct a set PlanningDomain of sets of variables that may be unknown at the same time. The following definition describes all elements in the set:
-
If V ∈ VarPartitions,
then V in PlanningDomain
-
If V ∈ PlanningDomain
and * ∀ v ∈ V ⋅ V != GeneratedVars(Generator(v)) *,
then V GeneredVars(Generator(v)) ∈ PlanningDomain
-
-
Choose a set Planning of tuples (V,g) where
-
V is an element of PlanningDomain; each of these must occur at least once as a first tuple element in Planning
-
g is a generator, and GeneratedVars(g) must overlap with V
These tuples provide a planning of generators g to run when a given set of variables V is still unknown.
There are many ways of defining this Planning, so it is truly an implementation choice. The main challenge would be to find an efficient planning.
A simple approach would be to only insert the first acceptable generator from the configuration script for each element in PlanningDomain, and simply instruct its author to write the most selective rules before less selective ones. More refined choices could take into consideration whether a generator is locally cached, whether its output is expected to be small, and so on. When multiple choices are available for a given variable set in PlanningDomain, then an implementation must choose dynamically from the alternative continuations.
-
-
For every tuple ⟨V,g⟩ in Planning, there may be conditions to be verified. The general assumption is that the to-be-invoked generator g will generate at least the objects called for, and that conditions may further constrain the set. Needless to say that implementation efficiency is determined by how tight the output of g fits the desires. To this end, some conditions can be calculated "prior to" but more accurately during a query to be performed by the (not cached) generator g, and a postcondition that is run over the output of the generator g.
-
For each tuple ⟨V,g⟩ in Planning, derive the set of conditions to apply as PlannedConditions(⟨V,g⟩); the idea is to include those conditions that are possible with the newly generated variables from g, but that at the same time require no remaining unknown variables. Formally,
PlannedConditions(⟨V,g⟩) = { c | v ∈ GeneratedVars(g), c ∈ Conditions(v), ConditionVars(c) ∩ V GeneratedVars(g) = ∅ }.
-
For each tuple ⟨V,g⟩ in Planning, derive a precondition p that implements as tight a selection on the generation process, and pair it with a postcondition q to serve as an alternative to the PlannedConditions outcome.
The precondition can be any logic expression that matches the computational model of LDAP, that is, it selects attributes for matching, but should otherwise be as open as possible. The postcondition is basically the planned condition without any certainties that are already established in the precondition.
This is an implementation choice; the simplest choice would be to set the precondition to True and equate the postcondition to the planned condition.
Formally, the function PlannedPrePostConditions(⟨V,g⟩) returns a tuple ⟨p,q⟩ with a precondition p and a postcondition q to be applied when generator g is applied under unknown variable set V. The correctness criterium to this function is that
{ ⟨V,g,c⟩ | ⟨V,g⟩ in Planning, c ∈ PlannedConditions(⟨V,g⟩) } = { ⟨V,g,p ∧ q⟩ | ⟨V,g⟩ in Planning, ⟨p,q⟩ ∈ PlannedPrePostCoditions(⟨V,g⟩) }.
When a fork is added to the output of a generator g, the following actions are performed:
-
Let O be the tuple with the output added in generator g.
-
WriterLoop:
Iterate over the set of influenced writers w in GeneratorWriters(g). -
Determine the set of unknown variables,
U = WriterVars(w).
-
Use PlannedConditions(⟨U,g⟩) to filter out unwanted elements from O.
-
Loop:
Remove the GeneratedVars(g) from U. -
Jump to
Stop:
when U has become the empty set. -
Overwrite the generator g with one of { g' | ⟨V,g'⟩ ∈ Planning, V=U }.
-
Determine ⟨p,q⟩=PlannedPrePostConditions(⟨U,g⟩).
-
Run generator g while applying the precondition p to its output.
-
Use postcondition q to filter out unwanted elements from O.
-
Replace O by the cartesian product of O and the remaining output.
-
Jump back to
Loop:
. -
Stop:
Produce the output O with writer w. -
Continue the
WriterLoop:
iteration with the next writer w.
When a fork is removed from the output of a generator g, the actions taken occur in just as though that output was going to be freshly added; the actual work done however, will be that the items found are not added to, but when producing the output O with writer w, the action on the writer is not to add entries, but rather to remove them.
ABNF Syntax for Pulley scripts
See RFC 5234 for ABNF specifications.
Lines:
Lexeme processing is assumed to have been applied; whitespace following lexemes, comment lines and empty lines are removed as described in the main text. The remaining lines are processed in the following manner:
line = binding "<-" dnvar annotations
line =/ filter annotations
line =/ varlist "->" IDENTIFIER "(" parmlist ")" annotations
annotations = [ "[" varlist "]" ] [ "*" FLOAT ]
Similar information may be published in LDAP attribute values, each produced
according to the syntax of line
. An LDAP attribute value SHOULD mention
the * FLOAT
annotation, to help ordering the values. The lexemes in LDAP
may be followed by whitespace, as seen fit. Comment lines are also supported
in LDAP.
TODO: Define an objectClass for these lines; define an attributeType for the line format; define an attributeType for inheritance from other LDAP entries by DN; perhaps define an attributeType to flag that an entry is abstract?
Bindings:
binding = [ bindatr ] bindstep *( "," bindstep )
bindstep = "@" dnvar
bindstep =/ bindrdn *( "+" bindrdn )
bindrdn = attributeType "=" ( varnm / const )
bindatr = atmatch *( "+" atmatch )
atmatch = attributeType ":" ( varnm / const )
Filters:
filter = "(" filtbase ")"
filter =/ "(" "!" filter ")"
filter =/ "(" "&" *filter ")"
filter =/ "(" "|" *filter ")"
filtbase = varnm filtcmp varnm
filtbase =/ varnm filtcmp const
filtcmp = "=" / "!=" / "<" / "<=" / ">" / ">="
Note: Filtering is done on variable names, and not as with LDAP, on
attribute values. Filters are only run for existing values, so filters
that check for existence of a value such as (UserID="*")
have no
effect, and (!(UserID=*))
is a complicated way of saying False
.
Note: Equality with string constants involves substring syntax, so
filtering on equality to the string constant "(a=*x*)"
finds attribute
values a
that include a substring "x"
. Apply escaping as prudent.
Variables, constants, parameters:
dnvar = var
value = const | var
varlist = var *( "," var )
var = IDENTIFIER
parmlist = parm *( "," parm )
parm = IDENTIFIER "=" ( value | "[" value *( "," value ) "]" )
const = STRING / INTEGER / FLOAT / BLOB
Imported definitions
From RFC 4514 and RFC 2241.
Lexemes
The lexeme IDENTIFIER
is a sequence of letters, underscores and
digits, not starting with a digit.
A STRING
lexeme is a string of arbitrary characters, enclosed by double
quotes "
and possibly containing escaped characters as specified in
Section 2.4 of RFC 4514, but with the exception that space characters
require no escape (due to the surrounding quotes of the STRING
lexeme).
An INTEGER
lexeme is anything acceptable to POSIX' strtol
function,
using base 0 to indicate that prefixes such as 0
and 0x
are acceptable
but decimal encoding is the default.
A FLOAT
lexeme is anything acceptable to POSIX' strtof
function.
Note that floating point numbers are primarily used for estimations, so any
highler level of precision would be a waste of effort.
A BLOB
lexeme is the encoding of hexadecimal digit pairs preceded by a #
character as described in Section 2.4 of RFC 4514.
TODO: The # character also marks the beginning of a comment
Near-formal Definitions for Virtual Attributes
The following definitions are similar to those for attribute types, except that no OID is assigned to them, because they are not intended to be sent over network connections.
Remember that Pulley changes the capitalisation of attribute names, so these
near-formal specifications reflect this in their NAME
fields.
The DCList
virtual attribute looks similar to RFC 4524’s
associatedDomain
attribute type definition:
NAME 'dCList'
EQUALITY caseIgnoreIA5Match
SUBSTR caseIgnoreIA5SubstringsMatch
SYNTAX 1.3.6.1.4.1.1466.115.121.1.26
The RDNList
virtual attribute looks similar to RFC 4524’s
associatedName
attribute type definition:
NAME 'rDNList'
EQUALITY distinguishedNameMatch
SYNTAX 1.3.6.1.4.1.1466.115.121.1.12
The RDN
virtual attribute looks similar to RFC 4524’s associatedName
attribute type definition:
NAME 'rDN'
EQUALITY distinguishedNameMatch
SYNTAX 1.3.6.1.4.1.1466.115.121.1.12
See RFC 4517 for syntaxes and matching rules.