Policy Engine
Summary
This ADR defines the Management and Execution API and Workflow of the DCM Policy Engine
Motivation
The Policy Engine operates as a specialized microservice within the Data Center Management (DCM) application responsible for governing service creation and modification (e.g., VirtualMachines, Containers). It enables Admins, Tenant-Admins, and Users to inject logic that validates (Approve/Reject), mutates (Defaulting/Altering) and assigns Service Providers to request payloads using an embedded Open Policy Agent (OPA) engine and Rego.
OPA is embedded as a Go library within the Policy Engine process rather than deployed as a separate sidecar service. Rego source code is persisted in the database alongside policy metadata, ensuring policies survive service restarts. On startup, the engine recompiles all stored policies from the database, eliminating the persistence gap inherent in an external OPA deployment.
Goal
Define the flow of how Policies are managed and used by the Policy Engine
- Define Policy types - Global, Tenant, User
- Define Policy management
- How policies should be added/updated
- How policies will be stored
- Define Policy execution
- Policy priority
- Value immutability and constraints
- Determine the enforcement engine and policy language
- For V1, OPA (embedded as a Go library) and Rego will be used
- Other alternatives may be considered in future versions
- Define the input format
- Define the output format
Non-Goals
- Policy implementation
- Actionable OpenAPI specification
- While this ADR references
TenantlevelPolicyandID,Tenantsare not supported in V1
Core Concepts & Definitions
Policy Responsibilities
Every policy may return one or more of the following outputs
- Reject: Requests are approved by default. Policies may decide whether the request should be Rejected.
- Mutation: Modifying the request payload (e.g., injecting default labels) by providing a patch map.
- Field Constraints: Defining the mutability of fields for subsequent policies in the chain.
- Service Provider Selection: Policies may set a value and/or constraints
Policy Scope & Hierarchy (Execution Order)
The execution order is strictly determined by Level first, then Priority.
- Global: (Super Admin) - Runs first.
- Tenant: (Tenant Admin) - Runs second.
- User: (End User) - Runs last.
Within each level, policies are sorted by priority: lower integers indicate higher priority.
The “Rego Contract”
Input
The input payload includes:
spec- The current patched request payload- Assumption - While policies do not have to be specific for Service Types they will need to know the expected content
constraints- The current constraints context (accumulated from prior policies)provider- The currently selected service provider (empty string initially, populated as policies are evaluated)service_provider_constraints- The current service provider constraints (accumulated from prior policies)
Output
Following the policy responsibilities, the output should be comprised of the following elements
rejected (bool) - since requests are approved by default, policies may reject them.
rejection_reason (string, optional) - reason for rejection
selected_provider (string, optional) - the name of the service provider chosen to fulfill the request
service_provider_constraints (object, optional) -
allow_list- list of allowed service provider namespatterns- list of regex patterns for matching allowed providers
patch (map, optional) - a dictionary of the corresponding service type for setting values. Each internal key is optional
constraints (map, optional) - follows JSON Schema (draft 2020-12).
This standard supports:
- Immutable: const
- Numeric constraints: minimum, maximum, multipleOf
- String patterns: pattern, minLength, maxLength
- Enumerations: enum
- Array constraints: minItems, maxItems
- Conditional logic: if/then/else
For the complete validation vocabulary, see the JSON Schema Validation specification.
Policy Code Ownership and Responsibilities
- DCM Admins, Tenant-Admins and Users implement the policies’ REGO code
- DCM Admins, Tenant-Admins and Users are responsible for correct registration of the policies
- DCM Admins, Tenant-Admins and Users are responsible for the accuracy and performance of the policies
- Trying to register a REGO code snippet that fails compilation will fail
System Architecture
The Policy API serves two distinct functions:
- Management Plane: CRUD operations for Policy definitions. Rego source code is persisted in the database and the embedded OPA engine is recompiled after every CRUD mutation.
- Execution Plane: Service requests evaluation against active policies using the embedded OPA engine.
Policy Management
Policy Registration Flow
sequenceDiagram
participant User
participant PolicyEngine
participant Database
User->>PolicyEngine: POST /api/v1/policies
PolicyEngine->>Database: Check unique Name and Priority for policy type
alt Uniqueness check failed
PolicyEngine-->>User: Error response
else Uniqueness check passed
PolicyEngine->>PolicyEngine: Generate UUID
PolicyEngine->>Database: Store policy metadata and Rego code
Note right of Database: UUID, Name, RegoCode,<br/>LabelSelector, Policy Type, Priority
PolicyEngine->>PolicyEngine: Recompile embedded OPA engine
alt REGO compilation failed
PolicyEngine->>PolicyEngine: Preserve previous compiled state
PolicyEngine->>Database: Rollback stored metadata
PolicyEngine-->>User: Error response
else REGO compilation succeeded
PolicyEngine-->>User: Return UUID
end
end
Pseudo API
POST /api/v1/policies
Payload
- Name
- Must be unique at its level. That is:
- All global policies must have unique names
- All tenant policies must have unique names within their tenant
- All user policies must have unique names for their user
- Must be unique at its level. That is:
- Policy Matching Criteria. Treated with AND.
- Label Selector
- Policy Type
- Global, Tenant, User
- Priority
- Must be unique at its level
- A lower number means a higher priority and therefore will be evaluated first
- REGO Code
- Enabled
- Optional. Default
true
- Optional. Default
Response Payload
- Generated UUID
Execution Logic & Flow
- Validate the Policy Name and Priority
- If not unique return an error
- Generate a UUID
- Store the following information in the DB
- UUID
- Name
- Rego Code
- Policy Type
- Priority
- Label Selector
- Recompile the embedded OPA engine with all stored policies
- The package name is resolved from the compiled AST
- Compilation is atomic: if it fails, the previous compiled state is preserved
- If failed, rollback DB and return an error
- Return UUID to caller
GET /api/v1/policies
Return the list of policies. Allow for filtering
GET /api/v1/policies/{policyId}
Return the specific policy
DELETE /api/v1/policies/{policyId}
Delete the specific policy
PUT /api/v1/policies/{policyId}
Update the specific policy. Policy name and type are immutable
Payload
- Policy Matching Criteria
- Priority
- REGO Code
- Enabled
Execution Plane
Sequence
sequenceDiagram
participant User
participant PlacementManager
participant PolicyEngine
participant Database
User->>PlacementManager: Create Service request
PlacementManager->>PolicyEngine: Validate Payload
PolicyEngine->>Database: Get matching policies by serviceType and labelSelector
Database-->>PolicyEngine: List of policies
loop For each policy
PolicyEngine->>PolicyEngine: Evaluate policy (embedded OPA)
PolicyEngine->>PolicyEngine: Enforce constraints
PolicyEngine->>PolicyEngine: Mutate payload
alt Policy rejected or constraint violation
PolicyEngine-->>PlacementManager: Request rejected
PlacementManager-->>User: Request rejected
end
end
PolicyEngine-->>PlacementManager: Success with updated payload
PlacementManager-->>User: Service created
Pseudo API
POST /api/v1alpha1/policies:evaluateRequest
Payload
- Service Instance
- spec - the service specification (flexible schema)
Execution Logic & Flow
The Engine acts as an orchestrator. It evaluates policies using the embedded OPA engine, which holds all compiled Rego modules in memory. Evaluation is concurrent-safe via a read-write lock, allowing policy evaluation to proceed in parallel with policy management operations.
Pipeline Logic (The “Chain of Responsibility”)
The Policy API maintains a
ConstraintContextmap in memory for the duration of the request.Fetch & Sort:
- Query DB for enabled policies matching the request payload based on the policy’s matching criteria.
- Sort by Level (Global -> Tenant -> User) then Priority (Desc).
If no policies matching the request payload were found, the request will return successfully
Iterate for each policy P:
- Evaluate policy using the embedded OPA engine:
- Invoke the policy’s package main rule
- Pass
spec- the current patched request payloadprovider- the currently selected service providerconstraints- the accumulated constraint context (if any)service_provider_constraints- the accumulated SP constraints (if any)
- Check
Reject- If
Rejectistrue, ABORT IMMEDIATELY (Fail Fast). Return 406.
- If
- Validate
Constraints:- A lower-level policy cannot “unlock” a field locked by a higher-level policy.
- If it does, ABORT with “Policy Conflict Error”
- Update
ConstraintContext:- Merge new
Constraintsfrom Policy P intoConstraintContext.
- Merge new
- Validate
Patch:- Validate
PatchagainstConstraintContext. - Example: If
ConstraintContext.regionis immutable and Policy P tries to patch theregion, ABORT with “Policy Conflict Error”
- Validate
- Apply
Patch- Update service_payload with valid patches.
- Validate
ServiceProvider- If Policy P returned a
selected_providerandservice_provider_constraintsexist, validate the selected provider against the constraints.
- If Policy P returned a
- Evaluate policy using the embedded OPA engine:
Finalize: Return the final payload, selected provider, and status to Placement Manager.
- Status is
APPROVEDif the payload was not modified,MODIFIEDif any patches were applied.
- Status is
Constraint Validation Example
- Step 1 (Global Policy):
- Patch: {“billing_tag”: “engineering”}
- Constraint: {“billing_tag”: {“mode”: “immutable”}}
- Result: Payload has billing_tag. Context has billing_tag=immutable.
- Step 2 (User Policy):
- Patch: {“billing_tag”: “marketing”}
- Action: Engine checks Context. billing_tag is immutable.
- Result: Error. The User policy violates the Global constraint.