Policy Engine

Summary

This ADR defines the Management and Execution API and Workflow of the DCM Policy Engine

Motivation

The Policy Engine operates as a specialized microservice within the Data Center Management (DCM) application responsible for governing service creation and modification (e.g., VirtualMachines, Containers). It enables Admins, Tenant-Admins, and Users to inject logic that validates (Approve/Reject), mutates (Defaulting/Altering) and assigns Service Providers to request payloads using Open Policy Agent (OPA) and Rego.

Goal

Define the flow of how Policies are managed and used by the Policy Engine

  • Define Policy types - Global, Tenant, User
  • Define Policy management
    • How policies should be added/updated
    • How policies will be stored
  • Define Policy execution
    • Policy priority
    • Value immutability and constraints
  • Determine the enforcement engine and policy language
    • For V1, OPA and Rego) will be used
    • Other alternatives may be considered in future versions
  • Define the input format
  • Define the output format

Non-Goals

  • Policy implementation
  • Actionable OpenAPI specification
  • While this ADR references Tenant level Policy and ID, Tenants are not supported in V1

Core Concepts & Definitions

Policy Responsibilities

Every policy may return one or more of the following outputs

  1. Reject: Requests are approved by default. Policies may decide whether the request should be Rejected.
  2. Mutation: Modifying the request payload (e.g., injecting default labels) by providing a patch map.
  3. Field Constraints: Defining the mutability of fields for subsequent policies in the chain.
  4. Service Provider Selection: Policies may set a value and/or constraints

Policy Scope & Hierarchy (Execution Order)

The execution order is strictly determined by Level first, then Priority.

  1. Global: (Super Admin) - Runs first.
  2. Tenant: (Tenant Admin) - Runs second.
  3. User: (End User) - Runs last.

Within each level, policies are sorted by priority: lower integers indicate higher priority.

The “Rego Contract”

Input

The input payload includes:

  • The current patched request payload
    • Assumption - While policies do not have to be specific for Service Types they will need to know the expected content
  • The current constraints
  • User information
    • User ID
    • Tenant ID
  • The service provider (empty at first and populated while evaluating policies)
    • Value
    • Constraints

Output

Following the policy responsibilities, the output should be comprised of the following elements

  • Reject - since requests are approved by default, policies may reject them.

  • Service Provider -

    • Value - the name of the service provider chosen to fulfill the request
    • Constraints - list of allowed SPs, can take a form of Allowlist of Regex
  • Patch - a dictionary of the corresponding service type for setting values. Each internal key is optional

  • Constraints - follows JSON Schema (draft 2020-12).

    This standard supports:

    • Immutable: const
    • Numeric constraints: minimum, maximum, multipleOf
    • String patterns: pattern, minLength, maxLength
    • Enumerations: enum
    • Array constraints: minItems, maxItems
    • Conditional logic: if/then/else

    For the complete validation vocabulary, see the JSON Schema Validation specification.

Policy Code Ownership and Responsibilities

  • DCM Admins, Tenant-Admins and Users implement the policies’ REGO code
  • DCM Admins, Tenant-Admins and Users are responsible for correct registration of the policies
  • DCM Admins, Tenant-Admins and Users are responsible for the accuracy and performance of the policies
  • Trying to register a REGO code snipet that fails compilation will fail

System Architecture

The Policy API serves two distinct functions:

  1. Management Plane: CRUD operations for Policy definitions and synchronization with the Policy Engine.
  2. Execution Plane: Service requests evaluation against active policies using a stored-policy model.

Policy Management

Policy Registration Flow

  sequenceDiagram
    participant User
    participant PolicyEngine
    participant Database
    participant OPA

    User->>PolicyEngine: POST /api/v1/policies
    PolicyEngine->>Database: Check unique Name and Priority for policy type
    alt Uniqueness check failed
        PolicyEngine-->>User: Error response
    else Uniqueness check passed
        PolicyEngine->>PolicyEngine: Generate UUID
        PolicyEngine->>Database: Store policy metadata
        Note right of Database: UUID, Name, ServiceType,<br/>LabelSelector, Policy Type, Priority
        PolicyEngine->>OPA: Push REGO code with UUID
        alt REGO compilation failed
            OPA-->>PolicyEngine: Compilation error
            PolicyEngine->>Database: Rollback stored metadata
            PolicyEngine-->>User: Error response
        else REGO compilation succeeded
            OPA-->>PolicyEngine: Success
            PolicyEngine-->>User: Return UUID
        end
    end

Pseudo API

POST /api/v1/policies
Payload
  • Name
    • Must be unique at its level. That is:
      • All global policies must have unique names
      • All tenant policies must have unique names within their tenant
      • All user policies must have unique names for their user
  • Policy Matching Criteria. Treated with AND.
    • ServiceType
    • Label Selector
  • Policy Type
    • Global, Tenant, User
  • Priority
    • Must be unique at its level
    • A lower number means a higher priority and therefore will be evaluated first
  • REGO Code
  • Enabled
    • Optional. Default true
Response Payload
  • Generated UUID
Execution Logic & Flow
  • Validate the Policy Name and Priority
    • If not unique return an error
  • Generate a UUID
  • Store the following information in the DB
    • UUID
    • Name
    • Service Type
    • Policy Type
    • Priority
  • Push the REGO code to OPA
    • Use the UUID for naming to avoid collisions
    • If failed, rollback DB and return an error
  • Return UUID to caller
GET /api/v1/policies

Return the list of policies. Allow for filtering

GET /api/v1/policies/{policyId}

Return the specific policy

DELETE /api/v1/policies/{policyId}

Delete the specific policy

PUT /api/v1/policies/{policyId}

Update the specific policy. Policy name and type are immutable

Payload
  • Policy Matching Criteria
  • Priority
  • REGO Code
  • Enabled

Execution Plane

Sequence

  sequenceDiagram
    participant User
    participant PlacementManager
    participant PolicyEngine
    participant Database
    participant OPA

    User->>PlacementManager: Create Service request
    PlacementManager->>PolicyEngine: Validate Payload
    PolicyEngine->>Database: Get matching policies by serviceType and labelSelector
    Database-->>PolicyEngine: List of policies

    loop For each policy
        PolicyEngine->>OPA: Evaluate policy
        OPA-->>PolicyEngine: Policy result
        PolicyEngine->>PolicyEngine: Enforce constraints
        PolicyEngine->>PolicyEngine: Mutate payload
        alt Policy rejected or constraint violation
            PolicyEngine-->>PlacementManager: Request rejected
            PlacementManager-->>User: Request rejected
        end
    end

    PolicyEngine-->>PlacementManager: Success with updated payload
    PlacementManager-->>User: Service created

Pseudo API

POST /api/v1/engine/evaluate
Payload
  • Request Payload
  • User ID
  • Tenant ID
Execution Logic & Flow

The Engine acts as an orchestrator. It does not send Rego code during evaluation; it calls pre-loaded modules in OPA.

Pipeline Logic (The “Chain of Responsibility”)
  • The Policy API maintains a ConstraintContext map in memory for the duration of the request.

  • Fetch & Sort:

    • Query DB for enabled policies matching the request payload based on the policy’s matching criteria.
    • Sort by Level (Global -> Tenant -> User) then Priority (Desc).
  • If no policies matching the request payload were found, the request will return successfully

  • Iterate for each policy P:

    • Call OPA:
      • Invoke data.dcm.policy.<P.id>.result
      • Pass
        • CurrentRequestPayload
        • ConstraintContext
        • UserID
        • TenantID
        • ServiceProvider
    • Check Reject
      • If Reject is true, ABORT IMMEDIATELY (Fail Fast). Return 403.
    • Validate Constraints:
      • A lower-level policy cannot “unlock” a field locked by a higher-level policy.
      • If it does, ABORT with “Policy Conflict Error”
    • Update ConstraintContext:
      • Merge new Constraints from Policy P into ConstraintContext.
    • Validate Patch:
      • Validate Patch against ConstraintContext.
      • Example: If ConstraintContext.region is immutable and Policy P tries to patch the region, ABORT with “Policy Conflict Error”
    • Apply Patch
      • Update service_payload with valid patches.
    • Validate ServiceProvider
      • If Policy P returned a ServiceProvider and ServiceProviderConstraints exists, validate it.
  • Finalize: Return the final CurrentRequestPayload and ServiceProvider to Placement Manager.

Constraint Validation Example
  • Step 1 (Global Policy):
    • Patch: {“billing_tag”: “engineering”}
    • Constraint: {“billing_tag”: {“mode”: “immutable”}}
    • Result: Payload has billing_tag. Context has billing_tag=immutable.
  • Step 2 (User Policy):
    • Patch: {“billing_tag”: “marketing”}
    • Action: Engine checks Context. billing_tag is immutable.
  • Result: Error. The User policy violates the Global constraint.