Policy Engine

Summary

This ADR defines the Management and Execution API and Workflow of the DCM Policy Engine

Motivation

The Policy Engine operates as a specialized microservice within the Data Center Management (DCM) application responsible for governing service creation and modification (e.g., VirtualMachines, Containers). It enables Admins, Tenant-Admins, and Users to inject logic that validates (Approve/Reject), mutates (Defaulting/Altering) and assigns Service Providers to request payloads using an embedded Open Policy Agent (OPA) engine and Rego.

OPA is embedded as a Go library within the Policy Engine process rather than deployed as a separate sidecar service. Rego source code is persisted in the database alongside policy metadata, ensuring policies survive service restarts. On startup, the engine recompiles all stored policies from the database, eliminating the persistence gap inherent in an external OPA deployment.

Goal

Define the flow of how Policies are managed and used by the Policy Engine

Define Policy types - Global, Tenant, User
Define Policy management
- How policies should be added/updated
- How policies will be stored
Define Policy execution
- Policy priority
- Value immutability and constraints
Determine the enforcement engine and policy language
- For V1, OPA (embedded as a Go library) and Rego will be used
- Other alternatives may be considered in future versions
Define the input format
Define the output format

Non-Goals

Policy implementation
Actionable OpenAPI specification
While this ADR references Tenant level Policy and ID, Tenants are not supported in V1

Core Concepts & Definitions

Policy Responsibilities

Every policy may return one or more of the following outputs

Reject: Requests are approved by default. Policies may decide whether the request should be Rejected.
Mutation: Modifying the request payload (e.g., injecting default labels) by providing a patch map.
Field Constraints: Defining the mutability of fields for subsequent policies in the chain.
Service Provider Selection: Policies may set a value and/or constraints

Policy Scope & Hierarchy (Execution Order)

The execution order is strictly determined by Level first, then Priority.

Global: (Super Admin) - Runs first.
Tenant: (Tenant Admin) - Runs second.
User: (End User) - Runs last.

Within each level, policies are sorted by priority: lower integers indicate higher priority.

The “Rego Contract”

Input

The input payload includes:

spec - The current patched request payload
- Assumption - While policies do not have to be specific for Service Types they will need to know the expected content
constraints - The current constraints context (accumulated from prior policies)
provider - The currently selected service provider (empty string initially, populated as policies are evaluated)
service_provider_constraints - The current service provider constraints (accumulated from prior policies)

Output

Following the policy responsibilities, the output should be comprised of the following elements

rejected (bool) - since requests are approved by default, policies may reject them.
rejection_reason (string, optional) - reason for rejection
selected_provider (string, optional) - the name of the service provider chosen to fulfill the request
service_provider_constraints (object, optional) -
- allow_list - list of allowed service provider names
- patterns - list of regex patterns for matching allowed providers
patch (map, optional) - a dictionary of the corresponding service type for setting values. Each internal key is optional
constraints (map, optional) - follows JSON Schema (draft 2020-12).
This standard supports:
- Immutable: const
- Numeric constraints: minimum, maximum, multipleOf
- String patterns: pattern, minLength, maxLength
- Enumerations: enum
- Array constraints: minItems, maxItems
- Conditional logic: if/then/else
For the complete validation vocabulary, see the JSON Schema Validation specification.

Policy Code Ownership and Responsibilities

DCM Admins, Tenant-Admins and Users implement the policies’ REGO code
DCM Admins, Tenant-Admins and Users are responsible for correct registration of the policies
DCM Admins, Tenant-Admins and Users are responsible for the accuracy and performance of the policies
Trying to register a REGO code snippet that fails compilation will fail

System Architecture

The Policy API serves two distinct functions:

Management Plane: CRUD operations for Policy definitions. Rego source code is persisted in the database and the embedded OPA engine is recompiled after every CRUD mutation.
Execution Plane: Service requests evaluation against active policies using the embedded OPA engine.

Policy Management

Policy Registration Flow

    sequenceDiagram
    participant User
    participant PolicyEngine
    participant Database

    User->>PolicyEngine: POST /api/v1/policies
    PolicyEngine->>Database: Check unique Name and Priority for policy type
    alt Uniqueness check failed
        PolicyEngine-->>User: Error response
    else Uniqueness check passed
        PolicyEngine->>PolicyEngine: Generate UUID
        PolicyEngine->>Database: Store policy metadata and Rego code
        Note right of Database: UUID, Name, RegoCode,<br/>LabelSelector, Policy Type, Priority
        PolicyEngine->>PolicyEngine: Recompile embedded OPA engine
        alt REGO compilation failed
            PolicyEngine->>PolicyEngine: Preserve previous compiled state
            PolicyEngine->>Database: Rollback stored metadata
            PolicyEngine-->>User: Error response
        else REGO compilation succeeded
            PolicyEngine-->>User: Return UUID
        end
    end

Pseudo API

POST /api/v1/policies

Payload

Name
- Must be unique at its level. That is:
  - All global policies must have unique names
  - All tenant policies must have unique names within their tenant
  - All user policies must have unique names for their user
Policy Matching Criteria. Treated with AND.
- Label Selector
Policy Type
- Global, Tenant, User
Priority
- Must be unique at its level
- A lower number means a higher priority and therefore will be evaluated first
REGO Code
Enabled
- Optional. Default true

Response Payload

Generated UUID

Execution Logic & Flow

Validate the Policy Name and Priority
- If not unique return an error
Generate a UUID
Store the following information in the DB
- UUID
- Name
- Rego Code
- Policy Type
- Priority
- Label Selector
Recompile the embedded OPA engine with all stored policies
- The package name is resolved from the compiled AST
- Compilation is atomic: if it fails, the previous compiled state is preserved
- If failed, rollback DB and return an error
Return UUID to caller

GET /api/v1/policies

Return the list of policies. Allow for filtering

GET /api/v1/policies/{policyId}

Return the specific policy

DELETE /api/v1/policies/{policyId}

Delete the specific policy

PUT /api/v1/policies/{policyId}

Update the specific policy. Policy name and type are immutable

Payload

Policy Matching Criteria
Priority
REGO Code
Enabled

Execution Plane

Sequence

    sequenceDiagram
    participant User
    participant PlacementManager
    participant PolicyEngine
    participant Database

    User->>PlacementManager: Create Service request
    PlacementManager->>PolicyEngine: Validate Payload
    PolicyEngine->>Database: Get matching policies by serviceType and labelSelector
    Database-->>PolicyEngine: List of policies

    loop For each policy
        PolicyEngine->>PolicyEngine: Evaluate policy (embedded OPA)
        PolicyEngine->>PolicyEngine: Enforce constraints
        PolicyEngine->>PolicyEngine: Mutate payload
        alt Policy rejected or constraint violation
            PolicyEngine-->>PlacementManager: Request rejected
            PlacementManager-->>User: Request rejected
        end
    end

    PolicyEngine-->>PlacementManager: Success with updated payload
    PlacementManager-->>User: Service created

Pseudo API

POST /api/v1alpha1/policies:evaluateRequest

Payload

Service Instance
- spec - the service specification (flexible schema)

Execution Logic & Flow

The Engine acts as an orchestrator. It evaluates policies using the embedded OPA engine, which holds all compiled Rego modules in memory. Evaluation is concurrent-safe via a read-write lock, allowing policy evaluation to proceed in parallel with policy management operations.

Pipeline Logic (The “Chain of Responsibility”)

The Policy API maintains a ConstraintContext map in memory for the duration of the request.
Fetch & Sort:
- Query DB for enabled policies matching the request payload based on the policy’s matching criteria.
- Sort by Level (Global -> Tenant -> User) then Priority (Desc).
If no policies matching the request payload were found, the request will return successfully
Iterate for each policy P:
- Evaluate policy using the embedded OPA engine:
  - Invoke the policy’s package main rule
  - Pass
    - spec - the current patched request payload
    - provider - the currently selected service provider
    - constraints - the accumulated constraint context (if any)
    - service_provider_constraints - the accumulated SP constraints (if any)
- Check Reject
  - If Reject is true, ABORT IMMEDIATELY (Fail Fast). Return 406.
- Validate Constraints:
  - A lower-level policy cannot “unlock” a field locked by a higher-level policy.
  - If it does, ABORT with “Policy Conflict Error”
- Update ConstraintContext:
  - Merge new Constraints from Policy P into ConstraintContext.
- Validate Patch:
  - Validate Patch against ConstraintContext.
  - Example: If ConstraintContext.region is immutable and Policy P tries to patch the region, ABORT with “Policy Conflict Error”
- Apply Patch
  - Update service_payload with valid patches.
- Validate ServiceProvider
  - If Policy P returned a selected_provider and service_provider_constraints exist, validate the selected provider against the constraints.
Finalize: Return the final payload, selected provider, and status to Placement Manager.
- Status is APPROVED if the payload was not modified, MODIFIED if any patches were applied.

Constraint Validation Example

Step 1 (Global Policy):
- Patch: {“billing_tag”: “engineering”}
- Constraint: {“billing_tag”: {“mode”: “immutable”}}
- Result: Payload has billing_tag. Context has billing_tag=immutable.
Step 2 (User Policy):
- Patch: {“billing_tag”: “marketing”}
- Action: Engine checks Context. billing_tag is immutable.
Result: Error. The User policy violates the Global constraint.

Service Provider Health Check KubeVirt SP