Dokploy/dokploy

github.com/Dokploy/dokploy · audited 2026-06-04 · commit 6a0acd9

31% ERI composite

Dokploy is a self-hostable deployment platform — a Docker/Swarm-based alternative to Vercel/Heroku. Its 31% composite reads like a younger product that nailed a couple of enterprise primitives early and hasn’t yet built out the rest. The shape is consistent: strong where the team made a deliberate architectural choice, thin everywhere the platform-hardening work simply hasn’t happened yet.

Where it’s strong

Implementation & Customization (76%) is the standout — whitelabeling and branding are driven by configuration, not per-customer forks, exactly the pattern that survives scale. Identity & Access (63%) is real: federated SSO is wired through a centralized auth server (@better-auth/sso, plugin-based) with an enterprise SSO surface, rather than a hand-rolled login. After that the scores fall off.

Where the gaps are

The weak dimensions are broad, which is typical for a project at this stage. API & Extensibility (0%): there’s a script that generates an OpenAPI spec and a Swagger UI page, but no checked-in machine-readable contract artifact — so there’s nothing versioned for consumers to build against. Procurement Readiness (0%) and AI / Data Foundation (15%) are effectively greenfield. Engineering Org Resilience (19%): ownership is centralized with thin CODEOWNERS/review coverage — a real bus-factor risk. Tenancy Isolation (21%): tenant keys exist on some tables (projects.organizationId) but aren’t enforced consistently across business records, and Reliability Primitives (23%) lacks the retries/circuit-breakers/idempotency a deployment platform eventually needs.

The read

Dokploy has the two bones that are hardest to retrofit — clean config-driven customization and real SSO — which is a good sign about how the team thinks. But the breadth of low scores means an enterprise thesis here is an investment in platform hardening across many dimensions at once: a checked-in API contract, consistent tenant enforcement, reliability primitives, and ownership distribution. The dimension breakdown below is scored against the audited commit, evidence linked inline.

T1 Thesis Viability

AI / Data Foundation

Versioned data pipelines, pinned model versions, and a real vector or feature store — not scattered cron jobs and model="latest".

15% 17/19 scored

Declarative, tested transformations 33%

1/3 expected sites
Orchestrated pipelines 0%

0/3 expected sites
Data quality validation / contracts 100%

3/3 expected sites
Raw / immutable source layer 0%

0/2 expected sites not present
Data + pipeline versioning 0%

0/2 expected sites not present
Data lineage / provenance 0%

0/2 expected sites not present
Feature management 0%

0/1 expected sites not present
Model version pinning 0%

0/2 expected sites not present
Reproducibility / determinism 0%

0/1 expected sites not present
AI output validation 0%

0/3 expected sites
Grounding / wrongness check 0%

0/2 expected sites not present
Self-correction / feedback loop 0%

0/3 expected sites not present
Evaluation harness + scoring 0%

0/2 expected sites not present
Runnable correctness checks 0%

0/1 expected sites not present
Actionable diagnostics 25%

1/4 expected sites
Positive confirmation 0%

0/3 expected sites not present
Machine-readable contracts 92%

4/4 expected sites

Declarative, tested transformations 33%

The codebase does include declarative, version-controlled transformation logic with unit tests—most clearly for Traefik routing label generation via `createDomainLabels`, which has extensive Vitest coverage including regression and edge cases. However, the broader “compose spec augmentation” pipeline (e.g., `addDomainToCompose`, compose YAML parsing, and write/serialization steps) appears less fully covered by similarly contract-driven transform tests, so the primitive is only partially implemented end-to-end.

high
Add dedicated unit/integration tests for `addDomainToCompose` covering boundary conditions: (a) empty `domains`, (b) missing target service, (c) composeType docker-compose vs stack/swarm, (d) isolatedDeployment/randomize branches, and (e) presence/absence of existing labels (ensuring idempotent unshift behavior and network injection).
- packages/server/src/utils/docker/domain.ts:113-170 — This is where domains are merged into the parsed compose spec (high-risk transformation logic that should be protected by per-transform tests).
med
Add tests for YAML parsing and remote compose loading behavior (`loadDockerCompose`/`loadDockerComposeRemote`) including invalid YAML, missing paths, and stderr/empty stdout cases to ensure the transformation fails closed (returns null) rather than producing incorrect outputs.
- packages/server/src/utils/docker/domain.ts:70-112 — Parsing/loading governs the correctness of downstream transformations; boundary tests are needed.
low
Consider extracting a small “transformation module” surface for compose augmentation (pure functions returning transformed specs, with serialization/writes at the edges) so that the tested transformations are more clearly separated from IO concerns and are easier to audit as a transformation layer.
- packages/server/src/utils/docker/domain.ts:214-252 — `createDomainLabels` is already pure and testable; the same style can be propagated to spec augmentation while keeping IO (read/write/remote) at boundaries.

Orchestrated pipelines 0%

This repo includes an orchestration layer for scheduled/recurring operations using BullMQ: schedules are represented as typed job payloads (Zod), jobs are scheduled centrally with cron repeat patterns, and workers execute them via a dedicated job runner. However, the pipeline layer does not clearly declare explicit retry/attempt/backoff behavior at the queue level, and the job runner catches errors and only logs (which can prevent the orchestrator from reliably marking runs as failed for retries).

high
Add explicit retry behavior to BullMQ job options (e.g., `attempts`, `backoff`, and/or `removeOnFail` strategy) in `jobQueue`/`defaultJobOptions`, and ensure failure states propagate (don’t swallow errors).
- apps/schedules/src/queue.ts:1-80 — Queue/job options currently only set `removeOnComplete` and `removeOnFail`—no retry/attempt/backoff configuration is declared.
- apps/schedules/src/utils.ts:1-273 — `runJobs` wraps execution in `try/catch` and logs errors via `logger.error(error)` without rethrowing; this can inhibit BullMQ from treating the run as failed for retry purposes.
med
Improve run observability by recording structured metadata with each job execution (inputs, schedule/job ids, timezone/cron expression, and any script/command version), and include it in both success/failure logs.
- apps/schedules/src/queue.ts:1-80 — Job creation passes `job` as data and sets repeat pattern, but there is no visible mechanism for attaching versioned execution metadata beyond the job payload.
- apps/schedules/src/workers.ts:1-46 — Worker logs include `job.data`, but there is no evident structured/complete run context (versions/inputs captured for determinism).
low
If you intend true dependency-graph orchestration (DAG), model task dependencies explicitly (e.g., separate job types with `dependsOn` or a workflow engine), rather than using only repeatable single-step jobs.
- apps/schedules/src/queue.ts:1-80 — Scheduling maps directly to a single queue job per cron tick per job type; there is no explicit DAG/dependency structure.

Data quality validation / contracts 100%

This codebase does have data quality validation/contracts: it externalizes input validation into runnable Zod schemas and DB insert schemas (via `drizzle-zod`), and applies them at TRPC ingestion boundaries (router `.input(...)` gates) to reject invalid requests before side effects and persistence.

high
Confirm that every ingestion boundary that leads to persistence/side-effects has a schema gate (TRPC `.input(...)` or equivalent) and not only runtime checks. Add/extend schemas where gaps exist (e.g., ensure all operations that accept ids/paths/files use a shared Zod contract module).
- apps/dokploy/server/api/routers/docker.ts:1-220 — Representative example of correct schema-gated ingestion; use this pattern to audit and fill any missing `.input(...)` gates elsewhere.
med
Add explicit negative-path tests (unit/integration) that assert invalid inputs are rejected (400/typed TRPC error), and that no DB rows are created for invalid payloads.
- packages/server/src/db/schema/notification.ts:220-320 — There are strong schema contracts here; tests would ensure the contracts remain enforced as the API evolves.
low
Where schemas exist (e.g., `uploadFileToContainerSchema`), standardize error messages and error codes so diagnostics clearly indicate which contract failed and why (field-level errors from Zod).
- apps/dokploy/server/api/routers/docker.ts:220-302 — This site already validates and type-checks the file; improving diagnostic consistency would further harden the contract workflow.

Raw / immutable source layer 0%

No immutable/raw landing layer was found. The raw compose handling deletes and recreates the on-disk compose location and writes `docker-compose.yml` directly, which makes the original source non-recoverable for later audits/reprocessing.

high
Introduce an immutable/raw landing directory for raw compose inputs (e.g., store `composeFile` as `raw/docker-compose.yml` or with a content-hash/run-id). Do not delete it on subsequent renders/deploys; only create new versions when raw input changes.
- packages/server/src/utils/providers/raw.ts:1-20 — Current implementation removes and recreates `${outputPath}` before writing `docker-compose.yml`, eliminating any recoverable original source.
high
Update the raw-to-workdir wiring so downstream steps (randomizeSpecificationFile, domain injection, etc.) write into a separate mutable working directory, while the immutable/raw source remains untouched.
- packages/server/src/utils/docker/domain.ts:1-220 — `cloneCompose()` routes `sourceType === 'raw'` through `getCreateComposeFileCommand()` which writes directly to the compose workdir path; immutable raw storage should be added alongside it.
med
Add a manifest/metadata record (at least content hash, timestamp, and sourceType) for each raw compose input version to support reproducibility and audit trails.
- packages/server/src/utils/providers/raw.ts:1-20 — The current raw pipeline only base64-encodes and writes content; there is no evidence of versioned metadata or immutable preservation for audit/replay.

Data + pipeline versioning 0%

No governed “data + pipeline versioning” primitive was found. The codebase implements deployment/job orchestration and records deployment logs and (some) commit-derived identity, but there is no evidence of a versioning system that snapshots/governs both the pipeline logic and the data/state it runs on in a reproducible, traceable way.

high
Introduce a reproducibility contract for each deployment run: (1) persist a pipeline version manifest (build config/toolchain versions + repository code revision/commit) and (2) persist immutable data/state artifact identifiers (dataset snapshot/version or input artifact digests). Store both in the deployment/job database record and propagate them through the queue and execution paths.
- apps/dokploy/pages/api/deploy/[refreshToken].ts:260-310 — Deployment job is the run boundary where the system should externalize pipeline+data versions, but it currently only includes `deploymentHash` (commit-derived) as a log description.
high
Add explicit, queryable linkage between runs, pipeline versions, and data snapshots: create dedicated DB tables/fields (or an external system like DVC/lakeFS) for data artifact versions and pipeline manifests, and reference them from deployments.
- packages/server/src/services/deployment.ts:60-115 — Deployment creation writes deployment status/logPath but does not attach a data snapshot/version or pipeline manifest reference, limiting traceability and reproducibility.

Data lineage / provenance 0%

No data lineage/provenance primitive is implemented as a governed, queryable mechanism. The codebase contains an audit log schema (action/resource tracking) and an Inngest-to-UI mapping that reconstructs some source relationships (events → runs), but there is no automatic, complete lineage of datasets and their transformations, nor any explicit lineage emission that can be validated during change management.

high
Introduce a dedicated lineage/provenance data model and emission points in the pipeline/application layer: for every derived dataset/row returned by transformations (e.g., event/run aggregation), persist (1) source identifiers, (2) transformation/mapping version, (3) schema/version of output, and (4) timestamps and operator/job metadata. Expose it via API so it is queryable and change-management verifiable.
- apps/api/src/service.ts:1-240 — This is where derived job rows are produced from fetched event/run sources; it should be the primary lineage emission site.
med
Extend or complement the existing audit-log mechanism only if it is repurposed for data lineage: add dataset/derivation identifiers and explicit transformation edges. Otherwise, keep audit-log as security/compliance and implement lineage in a separate governed store.
- packages/server/src/db/schema/audit-log.ts:1-95 — Current audit log schema is action/resource oriented and lacks dataset derivation/transform-history semantics.

Feature management 0%

No implementation of the requested “Feature management” primitive exists in this codebase. The only related concept found is an enterprise license-based UI gate, which does not externalize or unify feature definitions for training vs. serving.

high
Confirm the intended scope: if you actually need ML feature-store style management (e.g., Feast/Tecton) for train/serve parity, introduce a dedicated feature layer with a single versioned definition source, and ensure both training and serving read the same compiled feature definitions.
- apps/dokploy/components/proprietary/enterprise-feature-gate.tsx:1-115 — Current “feature” handling is limited to an enterprise license gate; it does not represent shared train/serve feature definitions.
med
If the real requirement is product feature flags (not ML features), rename/re-scope this primitive in your audit rubric (feature flags vs. feature store/feature definitions).
- apps/dokploy/components/proprietary/enterprise-feature-gate.tsx:40-94 — This is explicitly a license-gating component, not a feature definition system.

Vector / embedding store N/A

No managed/vector embedding store primitive is present or wired in this codebase. Although there is an AI text-generation service with structured output validation, the implementation does not compute embeddings, does not persist vectors, and does not query any dedicated vector database.

high
If the product intends to support RAG/semantic search over stored project/app data, introduce a dedicated vector embedding store layer (with explicit schema, indexing, and a model+content-version linkage) and wire it into the request path where retrieval is needed.
- packages/server/src/services/ai.ts:1-258 — Current AI path is generation-only (`generateText`) with structured output validation; it provides a clear contrast point for where a retrieval step (embed→upsert/search→grounded generation) should be inserted.

Model version pinning 0%

Model calls exist (via `generateText`), but there is no evidence of “model version pinning” being enforced: the code passes through a model identifier chosen at runtime (`aiSettings.model` / `input.model`) without ensuring it is an explicit, pinned version (e.g., rejecting `latest`/`stable`-style floating aliases or requiring versioned IDs).

high
Enforce pinned, explicit model IDs at the boundary where models are selected: validate `aiSettings.model` (and `input.model` in `testConnection`) against a policy (reject `latest`/`stable` and require versioned IDs), and/or map friendly names to pinned model IDs from a single config file.
- packages/server/src/services/ai.ts:70-170 — Model ID flows directly from `aiSettings.model` into `generateText` (`provider(aiSettings.model)` → `generateText({ model })`) with no pinning enforcement.
- apps/dokploy/server/api/routers/ai.ts:240-380 — `testConnection` uses `provider(input.model)` and then `generateText` with no validation/pinning.
med
Add an internal “model registry” (versioned config) that stores supported pinned model IDs per provider, and ensure both runtime suggestions and connection tests draw from this registry.
- packages/server/src/services/ai.ts:70-170 — Central invocation point for AI suggestions; best place to route through a pinned model registry.

Prompt / model-call management N/A

I did not find any evidence of prompt/model-call management in this codebase: there are no centralized prompt/config artifacts (no prompt/prompts directories or prompt-management libraries), and there are no detectable LLM model-call sites (no OpenAI/Anthropic model SDK usage tied to chat/completions/generate-style calls). This repo appears to be a non-LLM application (infrastructure/dashboard/workers), so this primitive does not appear applicable here.

high

If/when LLM features are added, introduce a managed, versioned prompt/config layer (single source of truth) and route all model calls through it; otherwise, do not force this primitive into a non-LLM stack.

Reproducibility / determinism 0%

The codebase does not provide a reproducibility/determinism primitive that would allow recreating runs exactly from pinned inputs (e.g., captured seeds, environment/dependency versions, and deterministic execution controls). A clear non-deterministic component exists: password generation uses `Math.random()` without any seed management or run-input capture.

high
Introduce a deterministic randomness mechanism for replayable runs: replace `Math.random()` with a seeded PRNG (e.g., `seedrandom`/custom xorshift) and plumb a `seed` value from run boundaries into `generateRandomPassword`; also persist the seed alongside other run inputs.
- packages/server/src/auth/random-password.ts:1-21 — Current implementation uses `Math.random()` and has no parameterization or seed capture, blocking exact replay.
med
Add run-boundary metadata capture for determinism: at the entry points that trigger nondeterministic operations (auth flows, any “randomize” features, job/worker execution), record (a) seed(s) used, (b) environment variables relevant to behavior, and (c) code/dependency versions (e.g., git SHA + lockfile hash).
- packages/server/src/auth/random-password.ts:1-21 — The absence of any seed/environment capture in the randomness-producing function is the blocker; without run-boundary capture, determinism cannot be achieved.

AI output validation 0%

AI output validation is partially implemented. For `suggestVariants`, the code uses a strict Zod schema at the model boundary via the AI SDK’s structured-output facility (`Output.object({ schema })`), ensuring invalid outputs are rejected before being used. However, other model call paths (e.g., `analyzeLogs`, `testConnection`) return/only lightly check raw model text with no strict schema gate, and there is no demonstrated retry/self-correction loop that reuses the same schema and consistent error messaging.

high
Add strict structured output validation (Zod + `Output.object({ schema })`) to the `analyzeLogs` path so `result.text` is replaced by a schema-gated object (e.g., `{ summary, issues_found, root_cause, suggested_fix }`), rejecting non-conforming outputs instead of passing raw text through.
- apps/dokploy/server/api/routers/ai.ts:251-277 — Model output is returned as raw text (`return { analysis: result.text }`) without schema validation.
high
Tighten `testConnection` output validation to an explicit contract (e.g., require exact `text === 'ok'` or use structured output with a schema) rather than only checking that `result.text` is non-empty.
- apps/dokploy/server/api/routers/ai.ts:302-350 — Only checks `if (!result.text)` before returning success; does not validate expected content/shape.
med
Implement a closed-loop retry around schema validation failures for the structured-output calls (including reusing the same schema and returning a consistent validation error message), rather than failing immediately or only logging.
- packages/server/src/services/ai.ts:170-218 — Schema validation exists, but there is no visible retry/self-correction loop that re-prompts on validation failure with consistent messaging.

Grounding / wrongness check 0%

Grounding / wrongness checking for LLM outputs does not appear to be implemented as a reusable primitive. The `aiRouter` endpoints call `generateText` and directly return `result.text` (or only check it is non-empty) without any judge/validator pass that verifies claims against the provided context/logs.

high
Add a centralized grounding/wrongness-check wrapper for LLM outputs (e.g., `groundingWrongnessCheck({context: input.logs, outputText})`) that either (a) produces a structured verdict per claim supported by the context, or (b) rejects and triggers a bounded retry with an evidence-only prompt and strict schema validation. Wire it into `analyzeLogs` so the endpoint never returns unverified claims.
- apps/dokploy/server/api/routers/ai.ts:180-320 — Model output from `generateText` is returned directly (`analysis: result.text`) with no grounding/judge step.
high
Harden the `testConnection` endpoint with explicit contract validation (e.g., require trimmed `result.text` equals `ok`) and treat mismatch as a failure. This is a minimum wrongness check for even simple outputs.
- apps/dokploy/server/api/routers/ai.ts:180-320 — Prompt says "Reply with 'ok'", but code only checks `if (!result.text)` and does not verify the actual content equals `ok`.
med
Add an automated eval/verification harness (golden tests) for the `analyzeLogs` behavior: a set of log snippets with expected grounded findings, and a CI pass/fail that enforces that the grounding check gates the output.
- apps/dokploy/server/api/routers/ai.ts:180-320 — This endpoint defines the behavior that would require eval gating; currently no evidence of such a harness is present in the code paths shown.

Self-correction / feedback loop 0%

No self-correction / feedback loop exists. While the project uses structured output validation for the main AI suggestion generator (`suggestVariants`) via `Output.object({ schema: fullSchema })`, failures are not used to drive a bounded retry where the specific validation error is fed back into the model prompt and re-checked.

high
Implement a closed self-correction loop inside `suggestVariants`: catch schema/parse/validation errors from the AI SDK/Zod gate, extract the specific error message/path, append it to the prompt (e.g., “Your previous output failed because: <error>. Fix and retry.”), and re-run `generateText` for a bounded number of attempts (e.g., 2-3) before returning a safe fallback (like a deterministic error payload).
- packages/server/src/services/ai.ts:70-230 — Model call + strict schema gate exist, but on failure it only throws (open loop). This is the exact call site where the primitive should close the loop.
med
Ensure the API layer (`apps/dokploy/server/api/routers/ai.ts`) either (a) does not just propagate errors, or (b) returns a structured error that includes the last validation error and attempt count when the loop ultimately fails.
- apps/dokploy/server/api/routers/ai.ts:260-330 — `suggest` mutation calls `suggestVariants(...)` and propagates failures without any retry-with-feedback behavior at the boundary.
low
Add a unit/integration test that forces an invalid model output (or mocks `generateText` to return invalid JSON) and asserts that the second attempt includes the validation error and that attempts are bounded.
- packages/server/src/services/ai.ts:70-230 — The critical behavior to test is the retry-with-error-feedback around the structured output gate.

Evaluation harness + scoring 0%

I could not find any evaluation harness + scoring implementation for AI output quality (no `eval/`, `golden/`, `testset/`, or evaluation runner artifacts, and no code that logs model I/O into a scored, recurring eval process). The codebase does have runtime schema validation and AI calls for suggestion generation and log analysis, but the offline golden-set evaluation/scoring layer described by the primitive is missing.

high
Add an offline evaluation harness for the AI features (starting with `suggestVariants` and `analyzeLogs`): create a versioned golden dataset (inputs like user requests/log samples + expected properties), run model generations in batch, score outputs with deterministic judges, and store results per run (including model/version, prompt version, and seed/params).
- packages/server/src/services/ai.ts:74-258 — Production model call path (`suggestVariants`) that should be evaluated against golden cases and scored; currently only runtime schema gating is visible.
- apps/dokploy/server/api/routers/ai.ts:240-335 — Production model call path (`analyzeLogs`) that should feed into the eval harness + scoring and have run outputs logged for recurring evals.
high
Implement production logging specifically for eval readiness: persist (a) the feature name, (b) prompt/prompt-template version, (c) model identifier + provider, and (d) the structured model output (or model text), along with the input payload. Wire logs to the eval runner so the next recurring eval can replay and score them.
- packages/server/src/services/ai.ts:74-220 — The model invocation (`generateText`) occurs here; this is the natural hook point for logging inputs/outputs/model+prompt versions.
med
Create a documented one-command check/CI job that runs the eval suite, reports pass/fail with clear thresholds, and fails the build on regressions.
- apps/dokploy/server/api/routers/ai.ts:240-335 — A concrete second feature endpoint to include in CI evals once the harness exists (ensures broader coverage beyond just compose suggestions).

Runnable correctness checks 0%

The codebase contains a Vitest configuration and many unit tests, but I could not find a documented one-command “correctness check” entrypoint (or CI workflow) that an agent can run to get an unambiguous pass/fail verdict. Therefore, runnable correctness checks are not externally governed/documented in a way this primitive requires.

high
Add a root-level, documented command that runs the repository’s correctness suite and returns a clear exit code (e.g., a `test` script like `vitest run --config apps/dokploy/__test__/vitest.config.ts`), and ensure CI uses the same command.
- apps/dokploy/__test__/vitest.config.ts:1-35 — This is the existing test configuration that should be invoked by a single runnable command.
med
Add/confirm a CI workflow that runs the same command and fails the build on test failure, providing the positive confirmation needed for this primitive.
- apps/dokploy/__test__/setup.ts:1-44 — The test setup indicates tests are intended to run in CI without a real DB; a CI runner should invoke them reliably.
low
Document the command in a contributor-facing file (README/CONTRIBUTING) alongside expected duration and how to run locally (same command as CI).
- apps/dokploy/__test__/vitest.config.ts:1-35 — Because the test runner behavior is defined here, documentation should point users/agents to the exact command that uses it.

Actionable diagnostics 25%

The codebase does include actionable diagnostics for core authorization failures (structured `TRPCError` with error `code` and `message`). However, some upstream API failures in `apps/api/src/service.ts` are handled primarily via logging and then returning empty/truncated data, which reduces actionability for callers.

high
Standardize upstream API failure handling in `apps/api/src/service.ts`: when `fetchInngestEvents` / `fetchInngestRunsForEvent` encounter non-OK responses, propagate a structured error to callers (with a code/type and fix hint), rather than only logging and returning empty/partial results.
- apps/api/src/service.ts:44-83 — Non-OK response logs `status` and `body` but then `break`s and returns partial data without throwing an actionable error.
- apps/api/src/service.ts:93-125 — Non-OK response logs details but then returns `[]`, leaving the caller without a diagnostic failure signal.
med
Add actionable diagnostics around `resolveRole()` failure modes (especially `JSON.parse(entry.permission)`) so that malformed role permissions produce a clear message and stable error code rather than failing indirectly later.
- packages/server/src/services/permission.ts:73-119 — Role resolution parses `entry.permission` and can fail in ways not shown to be wrapped in structured, user-actionable errors.
low
Ensure environment-variable errors (e.g., missing `INNGEST_BASE_URL`) use the same structured diagnostic pattern (stable code/type) as other API errors, not just a raw `Error` string.
- apps/api/src/service.ts:175-193 — Throws generic `Error("INNGEST_BASE_URL is required...")` when config is missing; improve consistency with stable error codes/types.

Positive confirmation 0%

No explicit “positive confirmation” success contract (e.g., a CI workflow or other governed pass/fail gate that clearly indicates correctness/safe-to-stop) was found in the repository. While the project uses Vitest and includes test files, I did not find any CI/build definition that externally and unambiguously signals test success.

high
Add a GitHub Actions workflow (or equivalent CI) that runs the repo’s canonical test command (e.g., `vitest run` / `pnpm test`) and treats a passing run as the explicit positive-confirmation signal (green status). Ensure the workflow is the single source of truth for the one-command check.
- apps/dokploy/__test__/vitest.config.ts:1-35 — Shows tests are configured via Vitest, so CI can safely rely on `vitest run` exit code as the success/failure signal.
med
Create/verify a single documented root-level script for the “green check” (e.g., `pnpm test`), and ensure it runs the full relevant test suite with deterministic settings (no watch mode).
- apps/dokploy/__test__/vitest.config.ts:1-35 — Vitest configuration exists, but no explicit positive-confirmation wiring (like a documented script/CI gate) was found.

Machine-readable contracts 92%

Machine-readable contracts are present and well-externalized: the repo generates and commits an OpenAPI specification (openapi.json) from the server router, and the Swagger UI consumes the generated OpenAPI document. Additionally, chatwoot integration expectations are expressed via a .d.ts declaration file.

high
Add/ensure a CI check that re-runs apps/dokploy/scripts/generate-openapi.ts and fails if the generated output differs from the committed openapi.json (keeps contract in strict sync with implementation).
- apps/dokploy/scripts/generate-openapi.ts:1-133 — The generator writes openapi.json (source-of-truth pipeline), but without a visible enforced diff check, contract drift is possible.

Not applicable to this codebase: Vector / embedding store, Prompt / model-call management.

Tenancy Isolation

A tenant_id on every business table, row-level security in the database, and tests that prove a cross-tenant request returns 403.

21% 11/12 scored

Tenant key on every record 50%

1/2 expected sites
Database-enforced isolation 0%

0/3 expected sites not present
Default-scoped queries 0%

0/2 expected sites not present
Tenant context at the boundary 100%

2/2 expected sites
Object/blob partitioning 0%

0/8 expected sites not present
Tenant context in async work 0%

0/5 expected sites not present
Per-tenant resource limits 0%

0/2 expected sites
Tenant-scoped key management 0%

0/2 expected sites not present
Admin / role scoping 80%

4/5 expected sites
Uniform not-found vs. forbidden 0%

0/2 expected sites not present
Cross-tenant isolation tests 0%

0/1 expected sites not present

Tenant key on every record 50%

The codebase does have tenant keys on some tables (e.g., `projects.organizationId`), but the primitive is not enforced consistently across all business records at the schema/model layer. At least `deployment` lacks any tenant key column, and `session` does not enforce tenant keying on every row via an organization FK.

high
Add a tenant/owner key column to `deployment` (likely `organizationId` referencing `organization.id`) and ensure all access paths filter by/derive this tenant key from the row (or enforce with DB-level policies).
- packages/server/src/db/schema/deployment.ts:1-55 — Deployments table schema contains no `organizationId`/tenant identifier column.
high
For `session`, decide the intended tenant scoping model and enforce it at rest: either add `organizationId` with an FK (if the session is tenant-bound) or remove `activeOrganizationId` if it is purely UI state, ensuring no tenant-dependent logic relies on an unenforced field.
- packages/server/src/db/schema/session.ts:1-19 — Session table has `activeOrganizationId` but no tenant-scoping FK/key that is guaranteed on every row.
med
Run a repository-wide audit over all `pgTable(...)` writeable schemas to confirm every business table has a required tenant key column (e.g., `organizationId`). For any exceptions, justify and ensure tenant isolation is still enforced by DB-layer defaults (e.g., RLS/forced policies) or by an architectural partition (schema-per-tenant).
- packages/server/src/db/schema/project.ts:1-28 — Demonstrates the intended pattern (tenant key on rows) via `organizationId` on at least one core table.

Database-enforced isolation 0%

Database-enforced isolation (RLS/schema-per-tenant/DB router enforcement that filters by tenant even for table owners) appears to be absent. The codebase defines org/tenant foreign keys on some tables (e.g., `projects.organizationId`), but schemas like `deployment` do not carry a tenant discriminator and no database policy mechanism is evident from the inspected DB schema files. As a result, isolation appears to rely on application-level checks (which is soft isolation and vulnerable to a single missed filter).

high
Add database-level tenant enforcement using Row Level Security (RLS) (or schema-per-tenant / database-per-tenant). For shared tables like `deployment`, enforce policies that automatically restrict rows to the caller’s organization, and ensure the policy is forced (e.g., `FORCE ROW LEVEL SECURITY`) so it applies even for table owners.
- packages/server/src/db/schema/deployment.ts:1-120 — Current `deployment` schema lacks a tenant column, so an RLS policy will need either (a) a tenant column added to the table, or (b) RLS predicates that resolve tenant via joins through related org-scoped tables.
high
Implement RLS policies for every tenant-scoped table (not only top-level ones). Verify by testing: issue queries from a role that is the table owner (or otherwise privileged) without applying org filters in the app query and confirm cross-organization reads are denied.
- packages/server/src/db/schema/project.ts:1-70 — `projects` includes `organizationId`, but this alone is insufficient for the primitive—DB policies must be enforced at the database layer and forced for privileged access.
med
Establish a cross-tenant integration test suite that attempts to list/read/export tenant data using only a different organization context, and assert denial. This should include direct DB-access attempts that bypass application query filters (e.g., through an admin/owner DB role) to validate the defense-in-depth goal.
- packages/server/src/db/schema/deployment.ts:1-120 — Because `deployment` does not expose `organizationId` in the row itself, it is especially important to validate DB-layer enforcement so a missed app filter cannot leak data.

Default-scoped queries 0%

I did not find a default-scoped query mechanism that automatically applies the tenant/org filter at the ORM/repository layer. Instead, tenant isolation appears to rely on schema FKs (e.g., `organizationId` on some tables) and higher-level access checks, while individual data-access functions can perform lookups without an automatic tenant filter.

high
Introduce (or verify) a tenant-aware base repository / default scope for Drizzle queries so that any `find/findMany/select` on tenant-partitioned resources automatically injects `organizationId` (or an equivalent tenant join condition). Ensure the default cannot be bypassed except via an explicitly audited escape hatch.
- packages/server/src/services/deployment.ts:49-68 — Example of an unscoped read path today: `findDeploymentById` queries by `deploymentId` only, with no visible automatic tenant/org constraint.
med
Audit all data-access functions that read by opaque IDs (e.g., `findXById(id)`) and ensure each call site either (a) uses the new default-scoped repository or (b) performs a tenant-scoped join/filter underneath in the data layer.
- packages/server/src/services/deployment.ts:70-89 — `findDeploymentByApplicationId` is another opaque-ID read that currently has no tenant filter visible in the data-access call.

Tenant context at the boundary 100%

This codebase implements “tenant context at the boundary” for the primary API surface (tRPC). `createTRPCContext` derives trusted session/user via `validateRequest(req)`, and `validateRequest` sets `session.activeOrganizationId` from the authenticated membership’s organization id. I did not find evidence of the boundary being (incorrectly) trusted from request params/body for the tRPC path.

high
Confirm all non-tRPC entry points (e.g., Next.js route handlers under `apps/dokploy/pages/api/**`, webhooks, background workers) also re-establish/derive `activeOrganizationId` (or equivalent) from verified identity, rather than accepting client-supplied org/tenant ids.
- apps/dokploy/pages/api/providers/github/setup.ts:1-69 — This API route reads `organizationId`-related values from query/URL state (`state` parsing and `req.query`) to call `createGithub(...)`. It is not part of the tRPC boundary implementation audited above, so it should be checked for tenant-context validation via verified identity.
med
Add/verify an automated test that attempts to call an authenticated API while forcing a different `orgId`/`organizationId`, asserting denial (or that the response is based strictly on `activeOrganizationId` from the membership).
- apps/dokploy/server/api/trpc.ts:1-83 — The boundary depends on `validateRequest(req)`; a regression test should ensure `ctx.session.activeOrganizationId` cannot be influenced by inputs.

Cache key namespacing N/A

I did not find any cache get/set usage where cache keys could be tenant-prefixed (no discovered `cache/redis` key-building patterns for `get/set/mget/mset/...`). The codebase uses Redis primarily for operational concerns (e.g., service setup/Swarm, queue connection) rather than an application-level shared cache with key namespaces, so this tenancy isolation primitive does not appear to be implemented or applicable as a cache-layer concern in this repo.

high
If the system uses a shared application cache in production (e.g., Redis-backed caching via a library like ioredis-cache, node-cache, cache-manager, or custom key/value helpers), locate that cache wrapper and ensure all keys are tenant-prefixed (e.g., `tenant:{id}:...`) and/or enforce cache-level ACLs; then add tests that attempt cross-tenant cache reads.
- packages/server/src/services/redis.ts:1-138 — Shows Redis being managed as a deployable service; no application cache key namespace logic is present here.
- apps/dokploy/server/queues/redis-connection.ts:1-9 — Shows Redis connection configuration for queues; does not include any cache get/set or tenant-scoped key construction.

Object/blob partitioning 0%

Across the blob/object storage interactions used for backups/restores (compose backups, web-server backups, and volume backups), the code constructs S3 object paths using `appName`/`prefix`/`backupFileName` and the configured `destination.bucket`, but does not include a tenant/org identifier in the object key/path. No evidence was found of per-tenant bucket/prefix partitioning (or tenant-bound signed/unguessable URL strategy) being enforced by default at the storage layer for these artifacts.

high
Make storage object keys tenant-scoped by default (e.g., `orgId/` or `tenantId/` prefix) at the point where `bucketDestination`/`s3Path`/`backupPath` are constructed for all backup and restore flows.
- packages/server/src/utils/backups/compose.ts:20-29 — Object key/path is derived from `s3AppName` and `prefix` only; add `organizationId`/`orgId` into the key.
- packages/server/src/utils/backups/web-server.ts:40-67 — S3 path uses `destination.bucket`, `backup.appName`, and `backup.prefix` without tenant scoping.
- packages/server/src/utils/volume-backups/backup.ts:31-45 — Volume backup object key/path lacks tenant/org component; add it into `bucketDestination`.
high
Ensure restores cannot read arbitrary objects across tenants by validating tenant-scoped prefixes (and/or using tenant-scoped paths stored in DB and enforced during restore).
- packages/server/src/utils/restore/compose.ts:20-33 — Restore builds `backupPath` from the shared `destination.bucket` and `backupInput.backupFile` without tenant scoping.
med
Add integration tests that attempt cross-tenant backup artifact read/list/restore using guessed keys/IDs and assert denial (or same-not-found behavior if you intentionally hide existence).
- packages/server/src/utils/backups/compose.ts:29-37 — This is a concrete write site; tests should cover whether a tenant can restore/read another tenant’s objects.

Tenant context in async work 0%

Tenant context in async work is not implemented defensively in this codebase: queue job payloads and async workers (BullMQ) do not carry organization/tenant identifiers, and the worker-side data access helpers load entities by ID only (no tenant constraint visible). As a result, async processing appears to rely on implicit correctness rather than mandatory tenant context re-establishment from the job message.

high
Extend all async queue job payload schemas/types to include tenant context (e.g., organizationId/orgId) and ensure enqueue calls populate it from the verified request/session identity.
- apps/schedules/src/schema.ts:1-28 — QueueJob currently lacks any tenant/org field.
- apps/dokploy/server/queues/queue-types.ts:1-32 — DeploymentJob currently lacks any tenant/org field.
high
In each async worker, re-establish tenant context from the job payload before any DB access, and enforce that DB queries are tenant-scoped by default (either via a tenant-aware repository/base query layer or explicit mandatory tenant filtering).
- apps/schedules/src/workers.ts:1-46 — Worker runs with no tenant context restoration logic.
- apps/dokploy/server/queues/deployments-queue.ts:1-97 — Worker directly calls deployment/update functions with no tenant context restoration.
- apps/schedules/src/utils.ts:1-120 — Async handler performs ID-based entity lookups without tenant scoping visible at this layer.
med
Harden data-access helpers used by async workers to require tenant/org constraints (e.g., change findServerById/findBackupById/findScheduleById/findVolumeBackupById to accept tenant context and apply it in the query), so isolation is enforced beneath application code.
- packages/server/src/services/server.ts:1-169 — findServerById loads by serverId only (no organizationId filter).
- packages/server/src/services/backup.ts:1-91 — findBackupById loads by backupId only (no organizationId filter).

Per-tenant resource limits 0%

This codebase models rate limiting for API keys (rateLimitEnabled/rateLimitTimeWindow/rateLimitMax, etc.) and associates API key metadata with an organizationId. However, I did not find evidence of true per-tenant resource-limit enforcement via tenant-keyed quota/rate-limit buckets. The presence appears closer to per-API-key throttling configuration than per-tenant noisy-neighbor isolation.

high
Locate the enforcement path that actually consumes the API key rateLimitEnabled/timeWindow/max fields (likely in better-auth integration or a request middleware). Verify and implement tenant-scoped limiter keying (e.g., tenant:{orgId}:rate:...) so quota is shared/limited at the tenant level, not only per API key.
- packages/server/src/services/user.ts:470-515 — Only configuration/insertion of rate-limit parameters is visible; enforcement/keying by tenant is not shown here.
med
Add/adjust integration tests to confirm noisy-neighbor resistance: generate multiple API keys in two organizations, saturate limits for Org A, and verify Org B requests still succeed under concurrent load.
- apps/dokploy/components/dashboard/settings/api/add-api-key.tsx:1-220 — Frontend supports configuring the rate limit parameters per API key, making it practical to create test fixtures for two organizations.

Tenant-scoped key management 0%

No evidence of tenant-scoped key management (per-tenant KMS/envelope encryption) was found. Secrets such as SSH private keys and certificate private keys appear to be handled/stored without any tenant-specific envelope-key reference or crypto-erase capable design. Existing code uses global env secrets (e.g., `INNGEST_SIGNING_KEY`) and directly writes sensitive key material to storage paths/files.

high
Introduce tenant-scoped envelope encryption for all persisted/transit key material (e.g., SSH `privateKey`, certificate `privateKey`). Implement a per-organization key-wrapping scheme (BYOK/CMEK-ready) and store only ciphertext + per-tenant key reference metadata.
- packages/server/src/db/schema/ssh-key.ts:1-82 — SSH table includes a `privateKey` column and only ties records to `organizationId`, with no tenant-scoped KMS/encryption integration shown.
- packages/server/src/services/certificate.ts:90-162 — Certificate private keys are written to disk by decoding stored `privateKey` with no tenant-scoped encryption.
med
Add cryptographic context plumbing so the lowest crypto layer always resolves tenant/organization from the verified session context, then uses that tenant’s key to encrypt/decrypt. Prevent any code paths from using a global encryption key for tenant secrets.
- apps/dokploy/server/api/routers/ssh-key.ts:1-133 — Tenant context exists (`ctx.session.activeOrganizationId`), but the secret-handling/encryption layer currently does not show tenant-scoped key usage.
low

Add integration tests that attempt cross-tenant secret access and verify crypto-erase behavior (after tenant deletion, decryption should fail; ciphertext should remain undecryptable).

Admin / role scoping 80%

Admin/elevated roles appear to be correctly scoped to a tenant (organization) membership model. The codebase stores roles on `member` with `organizationId`, derives `activeOrganizationId` from the authenticated user’s verified membership, and tenant-checks are applied at major entrypoints (organization, custom-role, project) by filtering on `ctx.session.activeOrganizationId` / verifying membership before acting.

high
Add/verify a negative integration test that attempts to change/view custom roles or access projects under a different organizationId than `activeOrganizationId`, asserting denial (including “owner/admin” users). This ensures tenant scoping cannot regress silently.
- apps/dokploy/server/api/routers/proprietary/custom-role.ts:13-170 — Custom role APIs are tenant-scoped via `ctx.session.activeOrganizationId`; a cross-tenant denial test would validate this assumption end-to-end.
med
Audit `findUserById` / `findOrganizationById` / other generic admin/owner lookup helpers for missing tenant scoping guarantees in their callers. If these helpers are used in authorization decisions, ensure callers always provide/verify the target organizationId.
- packages/server/src/services/admin.ts:1-36 — `findUserById` is a generic lookup by user id with no organization scoping; correctness depends on callers not using it to bypass tenant checks.
low
Confirm that any remaining “admin presence” or “owner exists” checks are either (a) intentionally global and non-authorizing, or (b) properly constrained to the relevant organization. If these checks ever gate tenant-specific capabilities, they should take an organizationId parameter.
- packages/server/src/services/admin.ts:43-61 — `isAdminPresent` checks `member` for role `owner` without an organization filter; ensure this is not used to grant cross-tenant privileges.

Uniform not-found vs. forbidden 0%

The codebase does not implement a uniform “not-found vs forbidden” behavior. At least for organization and project item fetches, access denied and missing resources result in different tRPC error codes (e.g., "FORBIDDEN" / "UNAUTHORIZED" vs "NOT_FOUND"), which can leak resource existence across organizations/tenants.

high
For every single-resource read path that can be influenced by an id (e.g., organization/project/environment/etc. “one” queries), change authorization-denied outcomes to throw the same tRPC error code/message as the missing-resource outcome (typically "NOT_FOUND"). This should be done consistently at the lowest layer that decides between “not found” and “access denied”.
- apps/dokploy/server/api/routers/project.ts:101-135 — Currently throws "UNAUTHORIZED" when the project id is not in accessibleProjects, while missing resources later throw "NOT_FOUND"—these should be made uniform.
- apps/dokploy/server/api/routers/organization.ts:68-95 — Currently throws "FORBIDDEN" when membership check fails, while missing resources elsewhere in the codebase use "NOT_FOUND"—these should be made uniform.
med
Add integration tests that attempt cross-organization reads for each affected resource type (read by id, list, and any export/export-like endpoints) and assert the client always receives the same not-found response whether the target id exists but is forbidden or truly does not exist.
- apps/dokploy/server/api/routers/project.ts:101-150 — Project fetch combines access checks and DB lookup null-handling with different TRPC codes; a cross-tenant existence test would directly validate the fix.

Cross-tenant isolation tests 0%

No cross-tenant isolation integration/security tests were found. Existing tests under `apps/dokploy/__test__` focus on permissions and environment access logic, but they do not attempt cross-tenant read/write/list/export/async operations to prove denial at the boundary.

high
Add a dedicated cross-tenant isolation security test suite that, for at least these paths, creates two tenants/orgs and verifies that tenant A cannot read/write/list/export tenant B resources: deployments, environments/env vars, backups, schedules, projects/services, audits/logs, and any export endpoints. Include async paths by enqueueing background jobs under tenant A and asserting workers cannot access tenant B without the correct tenant context.
- apps/dokploy/__test__/setup.ts:1-44 — Test setup exists but mocks only DB connectivity; it does not provide any cross-tenant isolation test coverage (no evidence of tenant-crossing security assertions in the found test files).
high
Implement explicit assertions in tests for both read and write denial (including list/export) and ensure responses are uniform (e.g., not leaking existence).
- apps/dokploy/__test__/permissions/check-permission.test.ts:1-187 — Current permission tests mock DB and validate permission logic (unit-style). This is not a cross-tenant isolation proof at the data-access boundary.
med
Extend the test harness to create real or sufficiently representative multi-tenant fixtures (two orgs/members) and run through actual API routes/routers instead of only calling permission helpers with mocked DB.
- apps/dokploy/__test__/permissions/service-access.test.ts:1-133 — Tests exercise service access checks via mocked member data and mocked services/DB; they do not demonstrate enforcement against cross-tenant resource IDs.

Not applicable to this codebase: Cache key namespacing.

Identity & Access

SAML/OIDC libraries, SCIM provisioning endpoints, and a real roles/permissions schema — not a hard-coded isAdmin boolean.

63% 11/11 scored

Federated SSO (SAML/OIDC) 67%

3/3 expected sites
Directory provisioning (SCIM) 0%

0/2 expected sites not present
RBAC modeled as data 33%

1/1 expected sites
Centralized authorization 117%

3/2 expected sites
No hardcoded privilege shortcuts 100%

2/2 expected sites
Deny-by-default 40%

2/5 expected sites
AuthN before AuthZ at the boundary 133%

4/3 expected sites
MFA / step-up auth 0%

0/2 expected sites
Session & token hygiene 100%

4/4 expected sites
Scoped machine credentials 100%

3/2 expected sites
IP allowlists / network constraints 0%

0/1 expected sites

Federated SSO (SAML/OIDC) 67%

Federated SSO is present and wired through a centralized authentication server using `@better-auth/sso` (plugin-based). The app also includes an enterprise SSO management router and an SSO provider data model supporting both OIDC and SAML configuration, plus a user-facing sign-in component that initiates the SSO flow via the auth client and redirects to the IdP flow URL. While the wiring is clear, direct evidence of cryptographic token/assertion validation at the boundary is not explicitly shown in the audited slices (it is likely handled inside the Better-Auth SSO plugin).

high
Add/confirm explicit boundary evidence for cryptographic verification (OIDC token signature verification / JWKS validation; SAML signature/assertion validation, audience/issuer checks) and ensure it occurs before any session creation or authorization logic. Document which module performs verification (Better-Auth SSO internal implementation) and add targeted tests for signature failures and audience/issuer mismatches.
- packages/server/src/lib/auth.ts:1-420 — SSO is wired via the `sso()` plugin, but the audited code slice does not show the actual verification logic, so verification correctness should be confirmed via deeper code reading or tests.
med
Verify the SSO callback/redirect handling path is fully covered end-to-end (including any ACS/callback endpoints if applicable) and ensure every callback result is tied to the correct organization/provider record (per-org connection enforcement).
- apps/dokploy/server/api/routers/proprietary/sso.ts:1-384 — The SSO configuration router exists, but the audited routes do not show explicit ACS/callback endpoints; ensure the callback path is correctly implemented by the auth layer and is tested.
low
Harden SSO admin UX paths with clearer enforceSSO behavior guarantees (e.g., when enforceSSO is enabled, ensure local password/email auth is properly suppressed at the UI and server policy layers, not only on the client).
- apps/dokploy/components/proprietary/sso/sign-in-with-sso.tsx:1-135 — The component supports an `enforce` prop, but enforcement correctness depends on the full server-side policy; verify server behavior under enforced SSO conditions.

Directory provisioning (SCIM) 0%

SCIM directory provisioning (SCIM v2.0 Users/Groups with create/update and, critically, deprovisioning that revokes access) is not implemented anywhere obvious in the codebase. The enterprise identity layer shows SSO/provider management, and user provisioning appears to be invitation/credential-based rather than directory-driven SCIM.

high
Add a dedicated SCIM v2.0 router (e.g., under the same proprietary/enterprise identity area as SSO) exposing the required endpoints for Users and Groups, including PATCH handling for attributes like `active`/`disabled`.
- apps/dokploy/server/api/routers/proprietary/sso.ts:1-260 — Identity/enterprise wiring currently exists for SSO but not SCIM; this is the expected architectural location to add a SCIM sibling surface.
high
Implement a true deprovision/revocation workflow for SCIM offboarding: when SCIM marks a user inactive, revoke their access by disabling their organization membership and invalidating active sessions/tokens (and ensure follow-on authorization checks stop granting access).
- apps/dokploy/server/api/routers/user.ts:220-420 — There is evidence of session deletion during password update, but there is no SCIM-driven inactive/deactivate handler in this user lifecycle path.
med
Add end-to-end tests that simulate SCIM lifecycle: create → update attributes → deactivate → verify access is actually revoked (authorization denied and sessions/tokens no longer work).
- apps/dokploy/server/api/routers/user.ts:1-420 — User lifecycle behavior is tested/implemented around manual flows; SCIM requires new lifecycle tests to prevent access drift.

RBAC modeled as data 33%

RBAC modeled as data is present: the codebase has role/permission data tables (`organizationRole`, `member`) and a centralized, data-driven policy module (`packages/server/src/lib/access-control.ts`) plus custom-role persistence (`proprietary/custom-role.ts`). However, at least one important authorization surface (organization create/update/delete) still uses inline hardcoded role checks instead of routing the decision through the centralized, data-driven permission model.

high
Refactor `apps/dokploy/server/api/routers/organization.ts` to authorize organization mutations (create/update/delete) via the centralized permission/role engine (permission resolution based on `member.role` + `organizationRole.permission`) rather than inline checks against `ctx.user.role` / `userMember.role === 'owner'`.
- apps/dokploy/server/api/routers/organization.ts:1-240 — The handler contains embedded privileged checks using hardcoded role strings (`owner`, `admin`) and direct comparisons (`userMember.role === 'owner'`), which bypass centralized RBAC permission evaluation.

Centralized authorization 117%

The codebase does have centralized authorization: permissions are defined once in `packages/server/src/lib/access-control.ts`, evaluated in a single decision module `packages/server/src/services/permission.ts`, and enforced across the API via a tRPC permission wrapper `withPermission` in `apps/dokploy/server/api/trpc.ts`. Some parts still include direct role-string checks (e.g., owner/admin/member), but the actual resource+action permission checks are centralized through the shared permission engine.

high
Ensure every permission-requiring tRPC procedure uses `withPermission(...)` (or calls `checkPermission/hasPermission`) and remove/avoid inline per-handler authorization logic. Add a lint/test that fails builds when routers add ad-hoc `role`/permission checks instead of routing through the shared permission service.
- apps/dokploy/server/api/trpc.ts:1-266 — This is the centralized chokepoint that should be used broadly; it currently centralizes `checkPermission` for the `withPermission` wrapper.
med
Add auditable logging for every authorization decision at the centralized chokepoint (`checkPermission` / `withPermission`), so decisions are consistently logged for both allow and deny cases (not only where routers happen to import an audit helper).
- packages/server/src/services/permission.ts:1-203 — All allow/deny outcomes are determined here; this is the correct single place to guarantee complete decision logging.
low
Reduce the number of direct `memberRecord.role === "owner" || "admin"` / `ctx.user.role !== "owner" && !== "admin"` checks by funneling any privileged bypass logic through the centralized permission engine, to minimize future drift.
- packages/server/src/services/permission.ts:1-203 — Static role bypass is implemented inside the centralized permission service; consolidating bypass behavior further would reduce the risk of drift between routers and the engine.
- apps/dokploy/server/api/trpc.ts:1-266 — Some procedures enforce admin/owner via role-string checks; centralizing these through the permission engine would make the authorization surface more uniform.

No hardcoded privilege shortcuts 100%

This codebase generally avoids hardcoded superuser privilege shortcuts: privilege is represented via a role string (`user.role`) and enforced via a centralized permission/policy service (`checkPermission` + `role.authorize(...)`). However, there is at least one router-level ad-hoc privileged-role shortcut based on role names (`member.role !== "owner" && member.role !== "admin"`) instead of routing the decision through the centralized permission layer.

high
Remove router-level privileged shortcuts like `member.role !== "owner" && member.role !== "admin"` and replace them with centralized permission checks (e.g., express the requirement in the permission model and enforce via `checkPermission` / `checkServicePermissionAndAccess`).
- apps/dokploy/server/api/routers/schedule.ts:49-78 — Inline privileged-role logic is implemented directly in the router via role-string comparisons instead of a single policy/permission chokepoint.
med
In `permission.ts`, ensure any “privileged static role” behavior (e.g., owner/admin bypass behavior) is consistently modeled within the policy engine, and document it as part of the role/permission model so it remains auditable and not treated as a scattered shortcut.
- packages/server/src/services/permission.ts:86-105 — The service implements special handling for `memberRecord.role === "owner" || memberRecord.role === "admin"`; keep this centralized and avoid duplicating it elsewhere.

Deny-by-default 40%

The codebase implements deny-by-default correctly at the main tRPC boundary (`validateRequest` + 401 on missing session/user) and at the procedure layer (`protectedProcedure` throws UNAUTHORIZED). However, several concrete Next.js API endpoints (health, Stripe webhook, OAuth callbacks, GitHub webhook) are public and rely on per-endpoint trust/verification rather than the shared deny-by-default gate.

high
For every public API entry point (health, Stripe webhook, OAuth callbacks, GitHub webhook), add/verify strong request authentication/verification and ensure there is an explicit, documented trust model (e.g., strict signature verification, state anti-CSRF guarantees). This prevents new endpoints from accidentally remaining open without a deny-by-default rationale.
- apps/dokploy/pages/api/health.ts:1-9 — Public handler with no auth gate.
- apps/dokploy/pages/api/stripe/webhook.ts:1-200 — No session/authz check; correctness depends on webhook signature verification and allowed event types.
- apps/dokploy/pages/api/providers/gitea/callback.ts:1-96 — No session/authz check; relies on `state` parsing and provider lookup/token exchange.
- apps/dokploy/pages/api/providers/github/webhook.ts:1-20 — No auth check; redirects and method gating only.
med
Create a checklist/pattern to ensure new endpoints are either (a) routed through the shared tRPC auth boundary, or (b) explicitly documented as public and secured with the correct verification mechanism (not merely “no session required”).
- apps/dokploy/pages/api/[...trpc].ts:1-31 — Shows the centralized deny-by-default pattern that new endpoints should follow when appropriate.
low
Audit the deployment service separately for consistent deny-by-default semantics across all its exposed endpoints (ensure every route that should be protected has the middleware applied, not just `/deploy` and `/cancel-deployment`).
- apps/api/src/index.ts:1-160 — Middleware enforces `X-API-Key` by default, but only explicitly bypasses `/health` and `/api/inngest`.

AuthN before AuthZ at the boundary 133%

This codebase implements a centralized boundary pattern for AuthN-before-AuthZ: tRPC requests authenticate first via `validateRequest(req)` (cryptographic API key verification and session validation) to populate `ctx.session`/`ctx.user`, and then authorization is enforced by `protectedProcedure`/role/enterprise guards that check `ctx.user`/`ctx.session`. Next.js handler(s) also call `validateRequest` before delegating to the tRPC/OpenAPI stack.

med
Audit any additional non-tRPC entrypoints (e.g., custom REST routes, webhook handlers, background workers with user context) to ensure they either (a) use the same `validateRequest` boundary before authorization, or (b) are explicitly public/non-authorized flows with no authz decisions based on unverified identity.
- apps/dokploy/server/api/trpc.ts:33-79 — Demonstrates the correct pattern for tRPC, which should be mirrored/confirmed for other entrypoints.
- apps/dokploy/pages/api/[...trpc].ts:1-17 — Shows another correct boundary for the Next.js tRPC handler; other handlers should follow this same ordering when they need authz.

MFA / step-up auth 0%

This codebase does implement MFA at sign-in time (2FA redirect + enforced TOTP/backup-code verification, plus 2FA enrollment UI). However, the “step-up auth” requirement for sensitive actions (additional second-factor enforcement for high-risk operations after baseline authentication) is not clearly implemented: the privileged admin procedures appear to rely on ordinary authorization/role checks without an additional step-up boundary.

high
Add an enforceable step-up MFA policy for high-risk routes/procedures (starting with admin/owner operations and any security/privilege-changing actions). Require a fresh second-factor verification (or a time-bound, audited break-glass) before executing the sensitive operation.
- apps/dokploy/server/api/routers/admin.ts:1-56 — Privileged admin procedure is used for sensitive setup (`setupMonitoring`) but there is no visible step-up MFA gate in this handler.
med
Implement server-side enforcement and auditing: record whether the request has a recent step-up verification (timestamp + method), require it in step-up-protected handlers, and include it in the audit log entries for sensitive actions.
- apps/dokploy/server/api/routers/security.ts:1-75 — There is auditing for security actions (`audit(ctx, ...)`) but no indication of a step-up MFA check prior to the sensitive mutation. Extend audit events to include step-up context and enforce the check.
low
Ensure the step-up requirement is consistent across both UI and API entry points (tRPC procedures and any REST endpoints). Avoid client-only gating (UI prompt) as the sole enforcement mechanism.
- apps/dokploy/pages/index.tsx:1-230 — The MFA challenge/verification is shown in the UI, but step-up for sensitive actions must be enforced server-side at the privileged procedure boundary.

Session & token hygiene 100%

Session & token hygiene is implemented via better-auth with server-side sessions backed by a sessions table containing expiresAt and a unique token. The centralized auth configuration sets an explicit session TTL (3 days) and includes session.delete lifecycle handling for logout. Request authentication derives identity only from server-side validated session state (api.getSession).

med
Verify that refresh/rotation and revocation are fully enforced for sessions/tokens beyond expiry (e.g., confirm whether refresh rotates the session token or merely extends it, and whether logout actively deletes/invalidates the stored session row in the backing adapter).
- packages/server/src/lib/auth.ts:260-520 — The config defines session expiry and a session.delete hook, but the code slices reviewed here do not prove token rotation semantics for refresh.
low
Add explicit automated tests asserting that after logout (session.delete), subsequent requests with the same cookie/token are rejected.
- packages/server/src/lib/auth.ts:260-520 — A logout audit hook exists; tests would ensure the underlying revocation behavior matches the intended hygiene guarantee.

Scoped machine credentials 100%

This codebase does implement scoped machine credentials via an `apikey` table and an API-key verification path (`x-api-key`) in `packages/server/src/lib/auth.ts`. Keys are database-backed (not a single shared global secret) and can be revoked via deletion. However, the scope/least-privilege enforcement appears to rely primarily on `organizationId` binding and member role, while explicit use of the `apikey.permissions` field for fine-grained authorization is not visible in the reviewed boundary wiring.

high
Ensure API-key scopes/permissions are actually enforced at authorization time. Concretely: when `validateRequest` maps an API key to a session, propagate/derive the API key's scoped `permissions` (from `apikey.permissions`) into authorization checks, and/or enforce deny-by-default based on those permissions for every sensitive endpoint.
- packages/server/src/lib/auth.ts:420-520 — API key validation loads `apikeyRecord` and uses `apikeyRecord.metadata.organizationId` and `member.role` to set authorization context; reviewed code does not show usage of `apikey.permissions` for fine-grained scope enforcement.
- packages/server/src/db/schema/account.ts:220-252 — The schema includes a `permissions` field on `apikey`, which should be exercised for scoped/least-privilege enforcement but is not demonstrated in the boundary mapping we inspected.
med
Add explicit checks that API keys respect `enabled` and `expiresAt` (either via `api.verifyApiKey` behavior or an additional DB check). This ensures revoked/expired keys cannot authenticate even if the upstream verification library changes behavior.
- packages/server/src/db/schema/account.ts:220-252 — `enabled` and `expiresAt` exist on the `apikey` table, so the system should guarantee they are enforced in the auth path.
- packages/server/src/lib/auth.ts:420-520 — The boundary verifies the API key with `api.verifyApiKey`, then fetches the DB record, but the inspected mapping does not show explicit enforcement of `enabled`/`expiresAt` at the application layer.

IP allowlists / network constraints 0%

The repository includes CIDR matching logic (`isIPInCIDR`) and a Traefik `ipWhiteList` middleware type, but there is no concrete per-tenant IP allowlist enforcement wired into the request path before business logic. In the main Traefik security middleware construction, only BasicAuth middleware is created/managed, not IP allowlists.

high
Identify the tenant/app configuration source for network constraints (e.g., per-server or per-organization settings) and wire a Traefik `ipWhiteList` middleware into the same security chokepoint used for BasicAuth (`createSecurityMiddleware` / middleware chain attachment). Ensure the allowlist executes before any downstream handlers for protected routes/services.
- packages/server/src/utils/traefik/security.ts:1-90 — Middleware wiring point currently only adds/updates `basicAuth` users and attaches `auth-${appName}` via `addMiddleware(...)`; no IP allowlist middleware is created here.
med
Add automated tests that validate deny-by-default behavior for requests from disallowed client IPs at the edge (Traefik), including cases where allowlist is configured per tenant and where it is empty/disabled.
- apps/dokploy/__test__/traefik/server/update-server-config.test.ts:1-139 — Existing Traefik config tests cover redirect middleware application/rollback, but there are no tests covering IP allowlist deny-by-default behavior.

Compliance Code Patterns

Envelope encryption, enforced TLS, validated inputs, and zero secrets anywhere in the full git history.

40% 9/11 scored

Encryption in transit 0%

0/5 expected sites not present
Encryption at rest 33%

1/3 expected sites
Secrets management 67%

2/2 expected sites
Input validation at boundaries 100%

7/7 expected sites
Injection-safe data access 100%

3/3 expected sites
Data classification & PII handling 0%

0/4 expected sites not present
Access logging on protected routes 0%

0/2 expected sites
Retention & secure deletion 8%

1/4 expected sites
Secure defaults / hardening 56%

2/3 expected sites

Encryption in transit 0%

I did not find evidence that encryption in transit (TLS forced everywhere, including internal hops) is enforced by the application code paths visible in this repo. The main server bootstrap creates a plain HTTP server (`http.createServer`) and starts it with `server.listen` without TLS wrapping in the same file, and an internal token exchange uses a configurable base URL without enforcing HTTPS.

high
Enforce TLS at the application edge entrypoint: replace the plain HTTP server bootstrap with an HTTPS server (or ensure the reverse proxy terminator is configured to redirect all HTTP to HTTPS and to set HSTS), and ensure this is consistently applied for every listener (including WebSocket upgrades).
- apps/dokploy/server/server.ts:33-49 — Plain HTTP server created and listened to (`http.createServer`, `server.listen`) with no TLS enforcement visible on this path.
high
Enforce HTTPS for all internal/external service-to-service HTTP calls: in `fetchAccessToken`, validate that `baseUrl` is https:// (or upgrade it), and fail closed (do not allow http) for token exchanges and any similar provider integrations.
- apps/dokploy/pages/api/providers/gitea/callback.ts:24-44 — Token exchange uses `fetch(`${baseUrl}/login/oauth/access_token`)` where `baseUrl` may point to an internal URL; no HTTPS enforcement/validation is shown.
med
Harden transport security headers/redirects for every public endpoint: ensure HSTS is set (and HTTP→HTTPS redirects happen) for user-agent flows that use `res.redirect`, and ensure the redirect targets are always HTTPS.
- apps/dokploy/pages/api/providers/github/webhook.ts:7-18 — Webhook redirects without visible HSTS/HTTPS enforcement.
- apps/dokploy/pages/api/providers/github/setup.ts:45-69 — Setup redirects without visible HSTS/HTTPS enforcement.
med
Add TLS configuration for the monitoring service listener (or ensure the deployment layer wraps it with TLS): confirm that `app.Listen` is behind HTTPS and that clients cannot reach this service over plain HTTP.
- apps/monitoring/main.go:145-165 — Monitoring server starts listening without visible TLS configuration in this code path.

Encryption at rest 33%

Encryption-at-rest appears partially implemented: there is explicit symmetric encryption/decryption for 2FA `secret` and `backupCodes` (including a re-encryption migration script). However, the database schema shows other sensitive credentials stored as plaintext text columns (e.g., account tokens/password and API keys), and no code paths for their at-rest field encryption were identified in this audit slice.

high
Implement and enforce field-level at-rest encryption for all sensitive columns shown in the schema as plaintext text fields (at minimum: `account.accessToken`, `account.refreshToken`, `account.idToken`, `account.password`, `two_factor.secret`, `two_factor.backupCodes`, and `apikey.key`), ensuring the same encryption is preserved for backup/export paths.
- packages/server/src/db/schema/account.ts:1-60 — Sensitive `account` credential columns are defined as plaintext text columns.
- packages/server/src/db/schema/account.ts:210-235 — `two_factor.secret` and `backupCodes` are defined as plaintext text columns.
- packages/server/src/db/schema/account.ts:235-252 — `apikey.key` is a required plaintext text column.
high
Remove or gate any insecure legacy defaults for encryption/auth secrets (hardcoded fallback secrets), and require runtime injection from environment/Docker secrets so encryption keys are not embedded in code.
- packages/server/src/lib/auth-secret.ts:1-29 — Contains a hardcoded legacy secret fallback (`HARDCODED_LEGACY_SECRET`) when BETTER_AUTH_SECRET / BETTER_AUTH_SECRET_FILE are not provided.
med
Add automated checks/tests to verify encrypted-at-rest guarantees: (1) encrypted ciphertext is stored for each sensitive column, (2) decrypt round-trips work, and (3) exports/backups do not introduce plaintext values.
- apps/dokploy/scripts/migrate-auth-secret.ts:51-83 — Current evidence of correct encryption is in a migration script; tests should be broadened to cover normal read/write flows and backup/export paths.

Centralized key management N/A

This codebase does not show evidence of centralized key management (managed key store + rotation/revocation) in the current codebase structure. Searches for KMS/Vault/KeyVault/SecretsManager-like dependencies and key-rotation/revocation symbols returned no results. As additional context, a full-history secret scan found multiple committed secrets in the repo history, reinforcing that key/secret handling appears ad-hoc rather than centrally governed and rotated via a managed key service.

high
Implement centralized key management using a managed KMS/Vault/KeyVault service, and refactor encryption key usage to fetch keys from the managed store only (no local/shadow keys). Add scheduled rotation and a tested emergency revocation workflow.
- repo (code graph query results):N/A — The codebase contains no detectable wiring to KMS/Vault/KeyVault/SecretsManager-style libraries and no detectable key rotation/revocation symbols, indicating the primitive is not implemented.
high
Remove/rotate any compromised secrets/keys found in git history; treat them as leaked and replace with runtime-secret retrieval from the managed secret/key system.
- repo (git history secret scan results):N/A — gitleaks detected many commits containing secrets (e.g., tokens/passwords), which strongly suggests missing centralized, centrally rotated key/secret management.

Secrets management 67%

This codebase has a partial secrets-management implementation: it supports runtime injection of sensitive values via environment variables and Docker secret files (e.g., BETTER_AUTH_SECRET_FILE and POSTGRES_PASSWORD_FILE). However, it also contains legacy plaintext fallbacks (a hardcoded better-auth secret and a hardcoded production database URL) that undermine the “never hardcoded” requirement, so the control is only partially enforced.

high
Remove insecure hardcoded fallbacks for BETTER_AUTH_SECRET and the production DATABASE URL. Enforce that the secret must be provided via BETTER_AUTH_SECRET or BETTER_AUTH_SECRET_FILE; otherwise fail fast at startup.
- packages/server/src/lib/auth-secret.ts:9-22 — Warns and falls back to HARDCODED_LEGACY_SECRET when env/file is not set (critical plaintext secret fallback).
high
Enforce POSTGRES_PASSWORD_FILE (or a dedicated secret manager integration) and delete plaintext hardcoded DB credentials. Fail startup if neither DATABASE_URL nor POSTGRES_PASSWORD_FILE is set (or if DATABASE_URL is set, require operational controls).
- packages/server/src/db/constants.ts:24-45 — Uses a hardcoded production postgres://dokploy:... URL when POSTGRES_PASSWORD_FILE is absent.
med
Add/verify rotation documentation and operational checks to ensure secrets are reloaded appropriately (e.g., container restart behavior) and that secret-file paths are used in all deployment manifests.
- packages/server/src/lib/auth-secret.ts:9-19 — The intended secure migration path is via Docker secret (BETTER_AUTH_SECRET_FILE), implying rotation requires updating the secret and restarting/redeploying.

No secrets in git history N/A

The codebase does not meet the requirement of “no secrets in git history”: it contains a hardcoded authentication secret fallback in `packages/server/src/lib/auth-secret.ts` (legacy default). Any such committed secret should be treated as compromised and rotated.

high
Remove the hardcoded legacy secret fallback and require `BETTER_AUTH_SECRET` or `BETTER_AUTH_SECRET_FILE` to be provided via environment variables/Docker secrets in all non-test deployments; then rotate any previously deployed secret(s) derived from the hardcoded value.
- packages/server/src/lib/auth-secret.ts:1-29 — Hardcoded fallback value is used when env/Docker-secret values are not set (in non-test environments), which violates the “no secrets in git history” requirement.
med
Use the full-history secret scan results to identify every committed credential/token/password/private key pattern, validate whether any are real (not placeholders/test fixtures), and rotate them; then rewrite history (if required by policy) and enforce secret scanning in CI.
- packages/server/src/lib/auth-secret.ts:1-29 — This is a concrete, confirmed real secret-in-source example; the same remediation should be applied repo-wide using the scan to locate all others.

Input validation at boundaries 100%

This codebase applies input validation at the API boundary using tRPC `.input(...)` with Zod schemas (defined under `packages/server/src/db/schema/*`). The router handlers consume `input` only after schema validation, and invalid inputs are handled through the tRPC/Zod error plumbing.

high
Continue enforcing a `.input(api*Schema)` on every tRPC procedure that reads `input.*` fields; add `.input(...)` for any remaining procedures that currently read request-derived values without a corresponding Zod schema.
- apps/dokploy/server/api/routers/compose.ts:1-420 — Example of correct wiring: multiple compose endpoints all use `.input(api*Schema)` before using untrusted `input`.
- apps/dokploy/server/api/routers/settings.ts:1-220 — Example of correct wiring: admin endpoints validate request input via `.input(apiServerSchema)` / `.input(apiEnableDashboard)`.
med
Audit schema coverage for range/constraint completeness (e.g., min/max, enums, ID formats) in all `api*` schemas to ensure boundary checks are not only present but also strict enough.
- packages/server/src/db/schema/compose.ts:1-220 — Schemas show strong constraints (e.g., `z.string().min/max`, `z.enum`, and `min(1)` for IDs), indicating the intended pattern.

Injection-safe data access 100%

The codebase applies injection-safe data access for the observed DB interaction paths: Drizzle ORM query builders are used for Node/TS routes (binding parameters), and the Go monitoring DB layer consistently uses `?` placeholders with argument binding. No evidences of SQL injection anti-patterns (string-concatenated untrusted values into SQL) were found in the inspected sites.

med
Extend auditing to the remaining DB access points (other routers/services) to ensure there are no raw SQL strings that interpolate request data (e.g., via `db.query.*` with `sql.raw(...)` or similar).
- apps/dokploy/server/api/routers/user.ts:40-110 — Positive example (parameterized via Drizzle expressions); continue this check across other routers.

Data classification & PII handling 0%

No clear implementation of “Data classification & PII handling” (sensitivity tagging + masking/minimization for PII in logs/audit/exports) was found. While audit logging and access-log cleanup exist, audit records write userEmail directly and there is no evidence of field-level redaction or sensitivity-based filtering on log persistence or retrieval.

high
Add a centralized PII/sensitivity classification and redaction layer for log/audit writes. Concretely: (1) implement masking/redaction for userEmail (e.g., store hash/pseudonym or redact) and (2) scrub/limit auditLog.metadata to remove/avoid PII before JSON.stringify/persistence.
- packages/server/src/services/proprietary/audit-log.ts:14-60 — userEmail/userRole/metadata are stored without masking or minimization.
high
Enforce masking on the audit-log retrieval boundary: ensure getAuditLogs either omits sensitive fields or returns masked values depending on authorization, and ensure metadata is filtered/sanitized before returning to clients.
- packages/server/src/services/proprietary/audit-log.ts:62-96 — getAuditLogs queries and returns auditLog rows without any redaction step.
med
Harden the global logger configuration with redaction/serializers so PII never reaches logs by default (rather than relying on each callsite).
- apps/api/src/logger.ts:1-11 — pino is configured but no redact/serializer logic is present.
med
Review and add sanitization to the access-log ingestion/storage path. Cleanup jobs alone are not sufficient; ensure sensitive fields (IPs, user identifiers, tokens) are redacted at write-time or prior to export.
- packages/server/src/utils/access-log/handler.ts:1-82 — Only truncation/cleanup is implemented; no redaction/sanitization control is shown.

Access logging on protected routes 0%

The codebase implements an audit logging mechanism for authenticated/sensitive actions using `audit(ctx, ...)` → `createAuditLog(...)`, which records a unique actor identifier (`ctx.user.id`) into the audit log table. However, access logging is not consistently applied to every authenticated/protected route: at least `settings.getWebServerSettings` and `application.one` are authenticated endpoints that return data/perform authorization without calling `audit(ctx, ...)` on those request paths.

high
Add audit/access log calls to authenticated read routes like `settings.getWebServerSettings` and `application.one` so that every protectedProcedure path emits an auditable entry with `ctx.user.id` (actor attribution).
- apps/dokploy/server/api/routers/settings.ts:60-90 — Protected route `getWebServerSettings` returns settings without any `audit(ctx, ...)` call on this path.
- apps/dokploy/server/api/routers/application.ts:95-170 — Protected route `one` performs checks and returns the application but does not call `audit(ctx, ...)`.
med
Consider centralizing the audit enforcement in tRPC middleware for `protectedProcedure`/`adminProcedure` (e.g., a single wrapper that logs on every successful/meaningful request), to avoid future “present-but-not-on-every-path” drift.
- apps/dokploy/server/api/trpc.ts:1-220 — Defines `protectedProcedure` and other procedure wrappers; moving/adding audit emission here would help guarantee coverage across all protected routes.
low
Revisit the “fire-and-forget safe” error swallowing in `createAuditLog` if audit completeness is required for compliance questionnaires; ensure failures are observable (e.g., metrics/alerts) even if they don’t break the main operation.
- packages/server/src/services/proprietary/audit-log.ts:1-96 — Audit insert errors are swallowed (`catch` only logs to console). For compliance assurance, add monitoring/alerting or reliable fallback.

Retention & secure deletion 8%

The codebase does implement retention-style cleanup jobs (metrics row deletion on a time cutoff, scheduled access-log truncation, and S3 volume-backup retention via rclone deletions). However, for the 'secure deletion' portion, there is no evidence of cryptographic wipe or explicit secure-disposal guarantees reaching backups/derived data beyond ordinary delete operations. As a result, enforcement exists, but secure deletion quality appears weak.

high
Define and implement secure deletion semantics for backup/log/derived artifacts: document deletion guarantees (including impact on backups and retention copies), and where cryptographic wipe is required, use encryption-at-rest with per-item keys that can be revoked/destroyed so that deleted data becomes unrecoverable (including in backup systems).
- packages/server/src/utils/volume-backups/utils.ts:83-119 — Older backups are deleted from S3 via rclone delete, but there is no cryptographic wipe / key revocation mechanism shown.
- packages/server/src/utils/access-log/handler.ts:20-45 — Access-log cleanup truncates by keeping only last 1000 lines; it does not show secure wipe of overwritten/archived contents.
med
Add explicit deletion-on-request and ensure it cascades to all derived/related data and any external storage locations (e.g., backups, exports).
- apps/monitoring/database/cleanup.go:8-38 — Metrics cleanup is scheduled and time-based; no deletion-on-request API or cascade-to-backups logic is shown here.
med
For each retention job, add/configure measurable retention policy parameters (time window in days/months) and verify correctness via tests: e.g., ensure log retention is time-based (not just 'last N lines') and confirm cutoff behavior matches compliance requirements.
- packages/server/src/utils/access-log/handler.ts:11-45 — Cleanup keeps only the last 1000 lines; there is no explicit retention window in days/hard cutoff tied to timestamps.

Secure defaults / hardening 56%

Secure defaults/hardening is partially implemented: Traefik middleware forces HTTP→HTTPS redirects and auth logging is disabled in production. However, the codebase also ships unsafe defaults: Better-Auth cookie hardening is explicitly weakened (`useSecureCookies: false`, `secure: false` under `!IS_CLOUD`) and Traefik’s API is configured as insecure (`api.insecure: true`). These indicate production hardening is not consistently enforced across all paths/config paths.

high
Harden Better-Auth cookie attributes for production: change `advanced.useSecureCookies` and `defaultCookieAttributes.secure` to be Secure=true when running in production (and ensure this is not relaxed merely based on `!IS_CLOUD`).
- packages/server/src/lib/auth.ts:1-90 — Cookie hardening is explicitly disabled for the non-cloud branch: `useSecureCookies: false` and `defaultCookieAttributes.secure: false`.
high
Disable Traefik insecure API/dashboard in production: set `api.insecure` to false (and ensure dashboard access is protected via auth/middleware, not exposed insecurely).
- packages/server/src/setup/traefik-setup.ts:240-434 — `api: { insecure: true }` appears in both default Traefik config builders.
med
Audit debug/verbose error exposure across the full runtime path (Next.js and API handlers). Ensure production builds never return stack traces or verbose debug responses, and that any dev-mode behavior is fully gated by `NODE_ENV === 'production'`.
- apps/dokploy/server/server.ts:1-81 — Only sets `dev = process.env.NODE_ENV !== "production"` for Next.js; additional hardening for error/debug responses is not established in this bootstrap slice.

Not applicable to this codebase: Centralized key management, No secrets in git history.

Audit, Governance, Residency

An append-only audit_events table, a queryable audit API, and per-region infrastructure keyed on each tenant’s region.

28% 7/10 scored

Dedicated audit event store 100%

3/3 expected sites
Append-only / tamper-evidence 0%

0/3 expected sites not present
Comprehensive event coverage 93%

5/5 expected sites
Queryable, provable audit access 0%

0/3 expected sites
Audit retention & separation of duties 0%

0/3 expected sites not present
No cross-region leakage 0%

0/4 expected sites not present
Data-subject rights (export & erase) 0%

0/2 expected sites not present

Dedicated audit event store 100%

This codebase includes a dedicated, structured audit event store (`audit_log`) with a dedicated schema and persistence layer. Sensitive governance/security mutations (e.g., security resource changes and custom role create/update/remove) are written to the dedicated audit store via a central `audit(ctx, ...)` helper, and there is a tenant-scoped audit log read endpoint. However, the audit writer explicitly swallows errors (fire-and-forget), and there’s no evidence of tamper-evidence (e.g., append-only enforcement or hash chaining) in the audited code slices.

high
Make audit recording reliability provable: replace the current “fire-and-forget safe” behavior (errors swallowed) with a mechanism that guarantees persistence (or a compensating strategy) and surfaces failures to an auditable monitoring channel. Evidence: audit writes are currently wrapped in a try/catch that only logs to console.
- packages/server/src/services/proprietary/audit-log.ts:1-96 — `createAuditLog` catches all errors and only `console.error(...)`, which can cause gaps in the audit trail without an enforced failure/alert mechanism.
high
Enforce audit immutability/tamper-evidence at the storage layer: add/verify DB constraints (no UPDATE/DELETE paths), and ideally implement integrity validation (e.g., append-only with hash chaining/signatures) so an auditor can detect tampering.
- packages/server/src/db/schema/audit-log.ts:1-95 — Schema defines an audit table and indexes, but the observed slices do not show append-only enforcement or any hash-chain/integrity mechanism.
med
Expand and validate coverage for all sensitive actions beyond the slices confirmed: ensure permission changes, exports/downloads, and authentication events (login/logout/session changes) all emit structured audit events through the same dedicated store. Use `audit(ctx, ...)` usage coverage to identify any sensitive handler that lacks it.
- apps/dokploy/server/api/utils/audit.ts:1-32 — Central audit helper exists; completeness depends on every sensitive action correctly calling this helper. The code slices reviewed show some usage, but do not prove full coverage.

Append-only / tamper-evidence 0%

The codebase has a structured audit_log table and a create/get API for audit entries, but there is no provable append-only / tamper-evidence implementation. The audit record schema lacks integrity/hash-chain fields and the write/read code does not implement or verify tamper-evidence (e.g., prev-hash chaining, signing, or integrity validation).

high
Add tamper-evidence to the audit_log record: introduce integrity fields (e.g., prev_hash and record_hash computed over the full canonical event payload + prev_hash) and store them immutably with every append. Optionally sign records (or sign hash roots) for stronger non-repudiation.
- packages/server/src/db/schema/audit-log.ts:1-41 — The audit log schema defines event fields but contains no integrity/tamper-evidence columns (e.g., prev_hash/record_hash/signature) that would allow alteration detection.
high
Enforce append-only behavior at the database layer for the audit evidence store: restrict UPDATE/DELETE on audit_log (via migrations/DB permissions, triggers, or storage engine policies). If deletes are required for lifecycle, require them to be restricted and separately audited and handled via a controlled purge mechanism.
- packages/server/src/db/schema/audit-log.ts:1-95 — The audit store is defined here, but no append-only/immutable constraint or deletion/update restriction mechanism is visible in the schema.
med
Implement verification support on the read/export path: when retrieving audit logs, compute/verify hash-chain continuity (or validate signatures) so an auditor can detect any gap/alteration in the returned evidence sequence.
- apps/dokploy/server/api/routers/proprietary/audit-log.ts:1-68 — The router only returns queried audit logs; there is no visible integrity verification or tamper-evidence exposure/validation.
low
Improve audit-log write semantics to ensure tamper-evidence computation is deterministic: canonicalize metadata serialization (stable JSON canonicalization) before hashing so that re-serialization cannot produce different hashes for the same logical event.
- packages/server/src/services/proprietary/audit-log.ts:1-46 — metadata is serialized using JSON.stringify without guaranteeing stable canonical ordering; this can break deterministic hashing unless canonicalization is added.

Comprehensive event coverage 93%

This codebase implements a dedicated structured audit event store (audit_log) and a central audit() helper that records action/resource events with tenant and actor attribution. Sensitive actions such as custom role create/update/delete, application create, and various admin settings operations are instrumented with audit() calls, and there is a tenant-scoped, permission-guarded API to read/paginate audit events. Overall, implementation is solid, though createAuditLog is “fire-and-forget safe” (errors are swallowed), which can reduce audit completeness guarantees.

high
Strengthen audit completeness guarantees: change createAuditLog so that failures are not silently swallowed (or at least provide an out-of-band alerting mechanism and/or a durable retry queue), since current behavior can produce gaps without an auditable signal.
- packages/server/src/services/proprietary/audit-log.ts:1-96 — createAuditLog wraps insertion in try/catch and only console.error’s errors; the main operation proceeds regardless, risking unprovable missing events.
med
Expand/verify coverage for other sensitive surfaces (especially login/logout, permission grant/revoke, and data exports) by confirming each such handler emits audit() with appropriate resourceType/resourceId and that auditLogRouter’s action enum covers them.
- packages/server/src/db/schema/audit-log.ts:1-95 — AuditAction includes login/logout but this audit dimension should confirm handlers for these actions exist and call audit().
low
Add documentation/tests that assert each security-relevant route calls audit() (coverage regression tests), ensuring future changes don’t remove audit instrumentation.
- apps/dokploy/server/api/utils/audit.ts:1-25 — The audit() helper is the single call site intended for consistent instrumentation; it can be targeted by route-level assertions.

Queryable, provable audit access 0%

Queryable, tenant-scoped audit access exists: there is a structured `audit_log` store and a tRPC `auditLogRouter.all` endpoint that returns paginated, filtered audit entries for the caller’s active organization. However, the primitive’s “provable” requirements are not met in code that was found: there is no demonstrated export/evidence packaging endpoint, no cryptographic/tamper-evident integrity fields, and the audit write path swallows failures (reducing confidence in completeness).

high
Add an auditable export path for audit evidence (e.g., `auditLog.export`) that produces independently-verifiable evidence for a selected tenant scope and time range, with an export format (and integrity checks) suitable for customer/auditor verification.
- apps/dokploy/server/api/routers/proprietary/audit-log.ts:1-68 — Current router only defines a `.query` (read/list). No export endpoint is present in the observed audit router file.
high
Introduce cryptographic/tamper-evident integrity for audit records (e.g., hash-chain over entries, or signed records with verification data) and persist the integrity fields in the audit table.
- packages/server/src/db/schema/audit-log.ts:1-95 — Schema defines audit fields and `createdAt`, but no integrity/signature/hash fields are present.
med
Remove silent failure behavior for audit writes (or make failures auditable and fail-safe). Ensure audit insertion failures cannot silently cause gaps without detectable signals.
- packages/server/src/services/proprietary/audit-log.ts:1-32 — `createAuditLog` swallows errors (console.error) and does not propagate failures, which weakens provability/completeness of the audit trail.

Audit retention & separation of duties 0%

The codebase implements a structured audit-log storage model (audit_log) and provides a tenant-scoped, permission-gated read API and an audit() helper that writes audit entries. However, for the specific primitive “Audit retention & separation of duties”, there is no provable retention enforcement (no TTL/purge/retention window job/config found in the audited parts of the code), and no evidence that insiders/admins cannot shorten retention or that audit deletion/purging is itself audited. Therefore this primitive is not provably satisfied.

high
Implement and document enforced audit-log retention: add a scheduled purge/archival job (or DB-level TTL/partition drop) that deletes/archives only after the required compliance window. Ensure the retention window is stored in a protected configuration (not editable by the same role that can administer the system) and that the purge action itself emits an audited event to the audit-log evidence store.
- packages/server/src/db/schema/audit-log.ts:1-95 — Audit log schema exists but shows no TTL/retention/lifecycle enforcement fields or constraints in the observed code.
- packages/server/src/services/proprietary/audit-log.ts:1-96 — Audit-log service shows insert/read only; retention/purge/TTL enforcement code is not present in the observed implementation.
high
Add separation-of-duties controls around retention changes and audit deletion: introduce role-gated APIs/admin endpoints for retention configuration (e.g., require a dedicated 'auditLog:adminRetention' permission), and add server-side authorization checks preventing admins who can manage infrastructure from being able to shorten the audit retention window or delete audit-log rows without generating an auditable record.
- apps/dokploy/server/api/routers/proprietary/audit-log.ts:1-68 — Current router shows audit-log READ permission gating, but there is no evidenced retention/admin path or separation-of-duties enforcement for changing retention.
med
Strengthen audit-log write reliability so missing audit events are detectable: avoid silent swallowing of audit logging failures (currently errors are swallowed). Instead, route failures to an auditable fallback and/or fail fast for privileged auditing paths, or at least record an internal “audit_log_write_failed” event in a separate evidence channel.
- packages/server/src/services/proprietary/audit-log.ts:1-96 — createAuditLog() explicitly swallows errors (fire-and-forget safe) which undermines provability of completeness for governance/audit evidence.

Data residency / region pinning N/A

The codebase does not implement data residency / region pinning for tenants. While there is a `region` input in the UI for external “destination” configuration (e.g., S3 region/endpoint selection), there is no evidence of a tenant-level region attribute that drives region-keyed data placement, compute placement, or internal routing to keep all tenant data in a pinned region.

high
Introduce first-class tenant/org data residency region modeling (e.g., tenant.region / data_residency_region) and enforce it in the data-access layer and deployment/compute orchestration paths, with all write/read operations routed by that region.
- apps/dokploy/server/api/utils/audit.ts:1-32 — Audit/logging is org-scoped only (`organizationId`); no tenant region concept is present here to drive region pinning.
high
Ensure every tenant data sink (primary DB, backups/snapshots, replication, analytics/export pipelines, and any scheduled backup jobs) is region pinned and cannot fall back to a global/unpinned default. Add explicit region selection/validation to backup orchestration.
- apps/dokploy/components/dashboard/settings/destination/handle-destinations.tsx:10-60 — The only `region` surfaced so far is a destination configuration field (for external endpoints), not an internal region-pinning control for tenant data/compute placement.
med
Add region-aware configuration validation so that if a tenant is assigned to a pinned region, the system rejects or constrains configurations that would cause cross-region writes (including external destination/backup targets).
- apps/dokploy/components/dashboard/settings/destination/handle-destinations.tsx:120-180 — Destination connection strings include `region` conditionally, which suggests the system can target different regions for external storage—this should be coupled to tenant residency policy rather than remaining an ungoverned UI input.

No cross-region leakage 0%

I did not find evidence that the codebase enforces “no cross-region leakage” across backup/export/derived sinks. While destinations include a `region` attribute and backup code uploads to S3 using rclone flags derived from that destination region, there is no visible tenant-level residency constraint or validation that would block cross-region placement for backup sinks—so the primitive is not provably implemented.

high
Introduce tenant-scoped residency policy and enforce it at every data-sink configuration point (at least backup destination create/update and at job execution time). Concretely: validate `destination.region` (and any endpoint/provider settings that imply geography) against the tenant’s allowed in-region set; reject or fail jobs when they don’t match.
- packages/server/src/utils/backups/utils.ts:80-126 — rclone flags (including `--s3-region`) are derived directly from `destination` without any residency validation visible here.
- packages/server/src/utils/backups/web-server.ts:18-136 — `rclone copyto` uploads to the configured S3 destination path/region with no cross-region block shown.
- packages/server/src/db/schema/destination.ts:1-99 — `destination.region` is a free-form field in the destination config surface; add constraints tied to tenant residency.
high
Add an execution-time “region gate” to backup job runners (fail closed). Even if a bad destination slips into configuration, the job should stop before calling rclone/exports.
- packages/server/src/utils/backups/web-server.ts:18-136 — This runner is where the external upload is executed; place the region check immediately before constructing/running the upload command.
med
Perform a full sink inventory and extend the enforcement beyond object-store backups: ensure derived stores, scheduled volume backups, and any compose/restore/export paths also validate destination region/routing constraints.
- apps/dokploy/server/api/routers/backup.ts:1-260 — Backup creation/scheduling is a control point; apply the same residency validation workflow for all backup types and subsequent restore/export mechanisms.

Data-subject rights (export & erase) 0%

No end-to-end data-subject rights system (DSR) for **export & erase** is provably implemented in this codebase. There is a direct admin-style `removeUserById` that deletes the `user` row, but the expected DS-rights workflow (export-all, erase-on-request with cascade to backups/derived stores, and auditable proof) is not evidenced.

high
Implement a dedicated data-subject export endpoint/handler that, for a verified subject, aggregates and returns all personal data from every relevant primary and derived store (not just one table). Ensure the export request and results are tied to an auditable DS-rights event.
- packages/server/src/services/admin.ts:92-101 — This service shows only `removeUserById` (erase-like direct delete) and does not evidence any DS export handler.
high
Replace/augment `removeUserById` with a DSR erase handler that performs cascade deletion across related personal-data tables (account/org membership, API keys, schedules, backups/derived stores) and also triggers/records erasure for backups/retention mechanisms as required by the primitive. Emit a structured, queryable, immutable audit record specifically for DS erasure.
- packages/server/src/services/admin.ts:92-101 — Current implementation is a direct `db.delete(user).where(eq(user.id, userId))` with no demonstrated cascade/backup/derived-store erasure and no demonstrated DS-erasure audit mechanism in the shown code.

Customer-controlled keys N/A

No implementation of customer-controlled encryption keys (BYOK / customer-managed keys with per-tenant import, rotation, and revocation suitable for crypto-shredding) was found. While the codebase includes customer-supplied key material for other purposes (e.g., SSH key management), it does not implement per-tenant encryption key governance: there are no tenant-managed KMS/KV references, key-import/rotation/revocation flows, or crypto-shredding mechanisms tied to a customer-controlled key lifecycle.

high
Confirm whether the platform encrypts tenant data at rest using per-tenant keys and whether any BYOK requirements exist for customers. If BYOK is required, implement a crypto-governance layer that stores per-tenant customer key references (e.g., KMS/KV URIs), supports customer key import/update, scheduled rotation, and customer-triggered revocation (crypto-shred), and wire it into all encryption/decryption paths.
- apps/dokploy/server/api/routers/ssh-key.ts:1-133 — Shows a key-management API for SSH keys (create/read/update/delete) but this is not encryption-key governance for the platform’s data-at-rest encryption (no KMS/BYOK/key-rotation/revocation semantics).
- packages/server/src/db/schema/ssh-key.ts:1-82 — Defines persistence for SSH keys (public key stored; private key explicitly not stored), which is separate from the customer-controlled encryption key primitive being audited.

Sub-processor / data-flow transparency N/A

No in-repo, versioned sub-processor / data-flow transparency artifact (e.g., a sub-processor inventory list for DPAs) was found in this codebase. While the code appears to include third-party integrations (e.g., Stripe and OpenAI SDK imports), there is no corresponding declared, auditable, versioned inventory of sub-processors/data flows that could be used to substantiate DPA governance for this primitive.

high

Add a versioned, in-repo sub-processor / data-flow inventory (e.g., SUBPROCESSORS.md or SUBPROCESSORS.json) that lists each third party, the specific data categories touched, the purpose, the data-flow direction(s), and the relevant service endpoints (e.g., Stripe webhooks; OpenAI calls). Ensure it is kept current as integrations change (PR checklist + CI validation).
high

Cross-check the inventory against the actual SDK usages (e.g., Stripe/OpenAI) and ensure the inventory entries match the code paths where data is sent to those vendors.
med

If the organization uses DPA templates, make the DPA sub-processor section dynamically reference the versioned inventory (or at minimum require an explicit, reviewable update when integrations are added/changed).

Not applicable to this codebase: Data residency / region pinning, Customer-controlled keys, Sub-processor / data-flow transparency.

T2 Execution Velocity

Performance Primitives

A caching layer, an async job runtime, connection pooling, and indexes on the columns that actually need them.

42% 11/11 scored

Redundant work in loops 0%

0/1 expected sites not present
Bounded interfaces 0%

0/1 expected sites not present
Memoization / caching 67%

1/1 expected sites
Resource reuse / pooling 33%

1/3 expected sites
Off-critical-path execution 0%

0/2 expected sites
Lookup data structures 100%

1/1 expected sites
Batching round-trips 0%

0/1 expected sites not present
Shared-state synchronization 100%

1/1 expected sites
Bounded concurrency / backpressure 50%

1/2 expected sites
Lazy / minimal computation 100%

2/2 expected sites
Streaming over buffering 17%

1/6 expected sites

Redundant work in loops 0%

I did not find any spot where this codebase correctly applies the optimization pattern for 'redundant work in loops' (e.g., batching/hoisting expensive work out of an inner loop). However, there is at least one concrete should-be site: per-iteration DB writes inside the container metrics collection loop.

high
Batch container metric persistence instead of calling cm.db.SaveContainerMetric(metric) once per iteration. For example: accumulate metrics in a slice during the loop, then insert them with a single bulk insert and/or a single transaction after the loop.
- apps/monitoring/containers/monitor.go:99-151 — The inner loop parses/filters containers and then calls cm.db.SaveContainerMetric(metric) for each metric, causing repeated DB I/O per iteration.

Bounded interfaces 0%

I found no correctly-bounded collection-returning public surface for this primitive. In particular, `fetchTemplatesList(...)` returns an unbounded array result from a JSON endpoint without any limit/cursor mechanism, indicating the bounded-interfaces primitive is not implemented at this code surface.

high
Change `fetchTemplatesList` to be bounded: add `limit` (and ideally cursor/offset) parameters, pass them to the upstream API if supported, or enforce a hard cap client-side (e.g., slice to a maximum) and document it.
- packages/server/src/templates/github.ts:38-76 — `fetchTemplatesList` returns `Promise<TemplateMetadata[]>` and returns the full `templates` response mapped to `TemplateMetadata[]` with no limit/cursor/iterator parameter.

Memoization / caching 67%

The codebase contains a correct memoization/caching implementation: a module-level in-memory TTL cache for `getTrustedOrigins`. The cache is bounded and reused within a time window, but invalidation is only time-based (not immediately updated when underlying trusted origins change).

high
Consider stronger invalidation than TTL alone for correctness/freshness. For example, clear/update `trustedOriginsCache` when trusted origins are modified (e.g., in the SSO/issuer update flows), or switch to a cache keyed by a version/updatedAt timestamp from the DB.
- packages/server/src/services/admin.ts:86-140 — Cache invalidation is purely TTL-based (`expiresAt`), and there is no explicit cache reset when trusted origins are changed elsewhere.
med
Reduce stale failure risk: when `runQuery()` throws, the code returns `trustedOriginsCache?.data ?? []`. Consider preserving last known good values but only if they are still within TTL (or explicitly mark stale).
- packages/server/src/services/admin.ts:108-126 — On error, it returns `trustedOriginsCache?.data` regardless of expiry; TTL freshness is not enforced on error paths.

Resource reuse / pooling 33%

The codebase has one clear, correct instance of resource reuse/pooling: a lazily-initialized, module-level singleton websocket/trpc client on the client side. Elsewhere (notably server websocket deployment log streaming), expensive handles are created per websocket connection (`new Client()` for SSH and `spawn('tail')`), which are candidates for pooling/reuse but are not implemented as such.

high
For server-side deployment log streaming, evaluate reusing SSH clients and/or maintaining a bounded pool keyed by (serverId/connection params) to avoid creating `new Client()` per websocket connection when feasible. If full connection pooling is unsafe, consider at least caching established clients within a bounded scope and reusing them for multiple exec streams.
- apps/dokploy/server/wss/listen-deployment.ts:63-165 — Creates a fresh `sshClient = new Client()` per websocket connection and immediately connects/execs.
med
For the non-cloud tailing path, consider reducing per-connection `spawn('tail')` overhead—e.g., a shared tailer per (logPath, serverId) with multiplexed websocket subscribers, or a bounded process pool.
- apps/dokploy/server/wss/listen-deployment.ts:97-143 — Spawns `tail` for each websocket connection and forwards stdout/stderr to that websocket.
low
Keep the existing client-side singleton approach, but ensure it is robust across tab lifecycle changes (e.g., reconnection semantics, correct teardown on unmount if needed).
- apps/dokploy/utils/api.ts:21-52 — Singleton websocket client is implemented; verify reconnection/close behavior aligns with expected lifecycle.

Off-critical-path execution 0%

This codebase does implement off-critical-path execution using BullMQ queues/workers for scheduled jobs and deployments (workers run `runJobs(...)` / deployment tasks asynchronously). However, some TRPC handlers (notably in the Postgres router) still perform deployment work inline via `await deployPostgres(...)`, which is exactly the anti-pattern this primitive targets.

high
Refactor the Postgres TRPC handlers to enqueue deployment work instead of calling `deployPostgres(...)` inline. Concretely: replace `await deployPostgres(input.postgresId)` in `saveExternalPort` and the `deploy` mutation with a queue `add(...)` call that a worker processes (mirroring `apps/dokploy/server/queues/deployments-queue.ts`).
- apps/dokploy/server/api/routers/postgres.ts:211-240 — Anti-pattern site: `await deployPostgres(input.postgresId)` is awaited inside the request handler.
- apps/dokploy/server/api/routers/postgres.ts:241-260 — Anti-pattern site: the `deploy` mutation returns `deployPostgres(input.postgresId)` from the TRPC handler.
med
Ensure the queued deployment job handler updates status and handles failures idempotently (retry-safe). If the existing deployment worker already provides this, reuse it; otherwise, wrap slow operations with retry/idempotency guarantees similar to the worker-based architecture.
- apps/dokploy/server/queues/deployments-queue.ts:1-97 — Worker callback performs deployment/rebuild actions and status updates, which should be the single execution path for these operations.
low
Optionally standardize background execution behavior for TRPC endpoints (e.g., always return an immediate “queued” response plus job id/state, and stream logs separately where needed) to avoid future inline regressions.
- apps/dokploy/server/api/routers/preview-deployment.ts:75-111 — Preview redeploy already follows the enqueue pattern via `myQueue.add(...)`, which can be used as a template for other routers.

Lookup data structures 100%

The codebase does apply the Lookup data structures primitive at least in authorization logic: membership is checked via `.has` on a precomputed `accessibleIds` collection (Set-like behavior). No other clearly verifiable lookup-vs-linear-scan hot loop patterns were confirmed from the evidence gathered.

high
Audit for remaining anti-patterns: in hot loops, replace repeated `array.includes(...)`, `array.find(...)`, or `array.some(...)` calls over the same collection with precomputed `Set`/`Map` lookups. Start with authorization and filtering paths that run per request or per item.
- apps/dokploy/server/api/routers/application.ts:108-133 — This is a good example to replicate (precomputed membership via `.has` for `accessibleIds`). Use it as a template for other membership checks.
med
Add/ensure helper utilities return lookup-ready structures (e.g., `getAccessibleServerIds` returning a Set) so call sites avoid converting or scanning arrays repeatedly.
- apps/dokploy/server/api/routers/application.ts:108-133 — Call site assumes `.has` exists on `accessibleIds`, indicating the upstream helper already provides a lookup structure—extend this pattern to other similar helpers.

Batching round-trips 0%

I did not find any implementation of true “batching round-trips” (i.e., coalescing many per-item I/O operations into fewer grouped/bulk calls). The main discovered hot path performs one fetch per item (event → runs) rather than batching those run lookups into a single bulk request.

high
Replace the per-event `fetchInngestRunsForEvent(ev.id)` round-trips with a bulk endpoint (if Inngest supports fetching runs for multiple events at once) or add/introduce a server-side batching layer that issues fewer grouped requests (e.g., `fetchInngestRunsForEvents(eventIds[])`).
- apps/api/src/service.ts:220-240 — This is where round-trips become linear: each loop element fetches runs for a single event id.
med
If the upstream API cannot batch, implement batching at the boundary by chunking event ids into groups (e.g., fetch 10–50 event ids per grouped call) using whatever aggregation technique is available (parallelism is not batching; it only hides latency).
- apps/api/src/service.ts:220-240 — Current implementation uses per-item fetches inside `Promise.all`; chunking would reduce total I/O operations if a grouped transport exists.

Shared-state synchronization 100%

The codebase contains one clear, correct application of shared-state synchronization: a per-service volume backup lock (`flock`/fallback) wrapped around the stop/backup/upload/start workflow to prevent concurrent backup executions from interfering with shared resources.

low
Add/verify tests or runtime instrumentation around the locking behavior (e.g., ensure the lock is always released on failure paths and that lockPath naming cannot collide unexpectedly between services).
- packages/server/src/utils/volume-backups/backup.ts:65-144 — The lock is implemented inside the generated shell wrapper; improving coverage around failure/release semantics would further strengthen the correctness story.

Bounded concurrency / backpressure 50%

The codebase uses BullMQ workers and (in the schedules service) configures an explicit `concurrency` cap, which provides bounded concurrency for scheduled backup jobs. However, the deployment worker in the dokploy server does not set any explicit concurrency/backpressure options, leaving deployment work potentially unbounded at the worker level.

high
Add an explicit BullMQ worker concurrency/backpressure configuration to the deployments worker (e.g., `concurrency: <number>` and/or queue-level settings) so deployment tasks cannot run with unbounded in-flight execution.
- apps/dokploy/server/queues/deployments-queue.ts:1-97 — The `new Worker('deployments', handler, { autorun: false, connection: redisConfig })` options omit any `concurrency` cap/backpressure control.
med
Review whether the existing backup worker `concurrency: 100` is appropriate for backpressure (e.g., lower it or make it configurable per environment/resource type) to prevent resource exhaustion under load.
- apps/schedules/src/workers.ts:1-46 — The backup workers cap concurrency at 100; verifying the value provides effective backpressure under worst-case load is necessary.

Lazy / minimal computation 100%

The codebase applies Lazy/Minimal Computation correctly in a few key boundary spots: UI data fetching is gated by permissions and derived collections are computed lazily; server-side aggregation limits the amount of follow-up I/O by capping the event subset before fetching runs.

high
Audit other API/service functions that do fan-out I/O (fetching secondary resources per item) for early bounding (slice/limit) and conditional short-circuiting similar to `fetchDeploymentJobs`.
- apps/api/src/service.ts:220-240 — Shows the desired pattern (cap `toFetch` before `Promise.all` fetches). Replicate/standardize this across other fan-out fetch flows.
med
In UI components, prefer `enabled` gating and `useMemo` early-return patterns for any derived sort/filter logic tied to optional data (similar to `recentDeployments`).
- apps/dokploy/components/dashboard/home/show-home.tsx:90-140 — Demonstrates both gating (permission-based) and lazy derived computation (early-return `useMemo`).

Streaming over buffering 17%

The primitive exists: `readMonitoringConfig(readAll=false)` correctly uses streaming + bounded early termination for `access.log`. However, the same area has a buffering anti-pattern when `readAll=true` (whole-file `readFileSync`). Several other config/compose/template loaders read whole files/outputs into memory before parsing, which would ideally be streamed or size-limited for arbitrarily large inputs.

high
Fix the `readAll === true` path in `readMonitoringConfig` to avoid whole-file buffering: either always stream with `readline` (and collect/return via a bounded mechanism), or implement a hard size/line limit and/or return an iterator/chunked response instead of a single full string. Evidence: the `readAll=false` branch already demonstrates the preferred streaming pattern.
- packages/server/src/utils/traefik/application.ts:120-191 — The function uses streaming only when `!readAll`; otherwise it buffers the full log via `fs.readFileSync(configPath, 'utf8')`.
med
For compose/config/template loaders that currently do `readFileSync` / remote `cat` + full-string parsing, add explicit size/line guards and consider streaming parsing where feasible. At minimum, enforce maximum allowed file size before buffering/`parse()`.
- packages/server/src/utils/docker/domain.ts:52-122 — Compose loading uses whole-file reads (`readFileSync` and remote `cat`), then parses from the complete string.
- packages/server/src/utils/traefik/application.ts:20-120 — Traefik config loading reads complete YAML into memory before parsing.
- packages/server/src/templates/index.ts:75-132 — Template compose content is handled as full strings end-to-end (read/return/write), which should be guarded for large/untrusted inputs.

Reliability Primitives

Retries, circuit breakers, idempotency keys, health checks, and a runbook for each service.

23% 11/11 scored

Timeouts 0%

0/4 expected sites
Retry with backoff + jitter 0%

0/1 expected sites not present
Idempotency 0%

0/2 expected sites not present
Circuit breaking / fail-fast 0%

0/1 expected sites
Graceful degradation / fallback 100%

2/2 expected sites
Error handling & propagation 0%

0/4 expected sites
Deterministic resource cleanup 50%

1/2 expected sites
Atomicity / all-or-nothing 0%

0/3 expected sites
Input / boundary validation 57%

4/7 expected sites
Failure isolation / bulkheading 0%

0/2 expected sites not present
Graceful shutdown 44%

3/3 expected sites

Timeouts 0%

The codebase does implement timeouts for some external calls (notably `fetch` via `AbortSignal.timeout(...)` and a TCP readiness check via `socket.setTimeout` + an overall cap). However, several high-risk I/O boundaries still lack timeouts—especially external HTTP calls to the jobs service, Gitea token refresh, and SSH/remote command execution used for docker log/command streaming.

high
Add a timeout/abort signal to all `fetch(...)` calls that contact external services (e.g., wrap with `AbortSignal.timeout(<ms>)`), starting with `apps/dokploy/server/utils/backup.ts` where the jobs-service endpoints are called without any deadline.
- apps/dokploy/server/utils/backup.ts:1-55 — Unbounded `fetch` calls to `${process.env.JOBS_URL}`; no `AbortSignal` passed.
high
Bound Gitea token refresh HTTP requests with an `AbortSignal.timeout(...)` so `refreshGiteaToken` can’t hang indefinitely on a slow/unresponsive Gitea endpoint.
- packages/server/src/utils/providers/gitea.ts:60-93 — External `fetch(tokenEndpoint, ...)` without `AbortSignal.timeout`.
high
Add explicit timeouts to the SSH remote execution path: ensure the Promise rejects on connection timeout and command timeout, and always clean up (`conn.end()`) in the timeout/error path.
- packages/server/src/utils/process/execAsync.ts:141-220 — Remote SSH command runs without any timeout bound (`conn.exec` is used without a deadline).
med
For the docker log WebSocket SSH streaming path, enforce timeouts on both SSH connect and the `exec` command; ensure the WebSocket handler terminates/cleans up when the timeout triggers.
- apps/dokploy/server/wss/docker-container-logs.ts:1-180 — WebSocket SSH flow uses `client.connect(...)` + `client.exec(...)` without a timeout/abort mechanism.

Retry with backoff + jitter 0%

The codebase does not implement the targeted reliability primitive (exponential backoff + jitter with a capped budget). A Postgres readiness loop exists and retries transient failures with a fixed sleep delay, but it does not include exponential backoff and jitter—so it remains an unmatched should-be site.

high
Replace the fixed-delay retry in wait-for-postgres.ts with exponential backoff plus jitter (and keep the existing TIMEOUT_MS as the capped budget). Also consider retrying only for transient connection errors (e.g., ECONNREFUSED/timeout), and fail fast for non-transient misconfigurations.
- apps/dokploy/wait-for-postgres.ts:45-83 — Contains the retry loop for Postgres TCP connectivity and applies a constant sleep (await sleep(RETRY_DELAY_MS)) without exponential backoff or jitter.

Idempotency 0%

No clear idempotency/deduplication primitive or pattern (e.g., Stripe event-id guard, request-id dedup, or upsert/uniqueness-based mutation protection) was found in the code paths that perform retryable external-triggered mutations. The Stripe webhook handler is the main high-risk entry point: it mutates the database and triggers side effects but does not deduplicate webhook events.

high
Make the Stripe webhook handler idempotent by persisting processed `event.id` (or Stripe `event.type`+unique identifiers) in a DB table with a unique constraint, and return 200 immediately when an event was already processed. Ensure this check happens before any mutation (including server-status updates and notification emails).
- apps/dokploy/pages/api/stripe/webhook.ts:56-404 — This handler switches on `event.type` and performs DB updates and calls to `sendInvoiceEmail` / `sendPaymentFailedEmail` and `updateServersBasedOnQuantity`, but there is no observed dedup check for the Stripe event id before mutations.
high
Strengthen server status updates to be idempotent under replay: either (a) rely on the webhook event dedup gate above, or (b) make `activateServer`/`deactivateServer` conditional updates that only write when the status actually needs to change (and consider optimistic checks with row counts).
- apps/dokploy/pages/api/stripe/webhook.ts:330-373 — `updateServersBasedOnQuantity` repeatedly calls `activateServer`/`deactivateServer` across a loop. On webhook replay, these calls will be re-executed unless guarded.

Circuit breaking / fail-fast 0%

A `CircuitBreakerMiddleware` type/interface exists for Traefik configuration, but there is no evidence of an application-level circuit breaker (tracking failure rate, opening, and probing before closing) wrapping calls to unreliable dependencies. At least the Gitea token refresh flow performs external `fetch` with basic error handling but no circuit-breaker fail-fast behavior.

high
Introduce an application-level circuit breaker around external dependency calls (start with token refresh calls like Gitea/GitLab providers). Ensure it tracks failure rate, opens to fail immediately, and uses a probe/half-open state before closing; also add a timeout to bound hangs.
- packages/server/src/utils/providers/gitea.ts:20-102 — External `fetch` call to Gitea token endpoint occurs inside `refreshGiteaToken`; errors are caught and logged, but the function returns null without any circuit-breaker fail-fast mechanism.
med
If Traefik-level circuit breaking is intended, ensure the middleware is actually wired into generated Traefik dynamic config (not just typed). Add/verify production config generation that sets Traefik’s circuit breaker parameters and attaches the middleware to relevant routers/services.
- packages/server/src/utils/traefik/file-types.ts:1-220 — Only the type/interface for `CircuitBreakerMiddleware` is present; this does not confirm middleware wiring into runtime Traefik config.

Graceful degradation / fallback 100%

The codebase does include graceful degradation/fallback. The clearest implementation is in `packages/server/src/services/admin.ts`: it caches trusted origins and, if the DB fetch fails, returns cached (stale-bounded by TTL) or an explicit empty list. A second instance safely degrades trusted providers loading by returning `[]` on error.

med
Search for other places where the system loads optional configuration/auxiliary lists (e.g., auth-related allowlists, feature metadata, dashboard-only data) and ensure their error branches return an explicit reduced default or cached value instead of throwing.
- packages/server/src/services/admin.ts:123-167 — Shows the established pattern (catch + cached/default return) that should be replicated for other non-critical dependencies.

Error handling & propagation 0%

The primitive exists (there are correct error-handling patterns), but several fallible operations still convert failures into generic booleans or false without logging/wrapping (notably empty catch blocks). Where the code does use try/catch, it often logs and propagates (e.g., updateGitea and rawConfig parsing).

high
Replace empty catches that silently convert failures into false (dockerSwarmInitialized/dockerNetworkInitialized) with either (a) logging plus returning false, or (b) returning false while wrapping/propagating an error with context depending on caller expectations.
- packages/server/src/setup/setup.ts:16-24 — Empty catch in dockerSwarmInitialized drops the underlying error.
- packages/server/src/setup/setup.ts:40-47 — Empty catch in dockerNetworkInitialized drops the underlying error.
high
For containerExists, stop silently collapsing inspect failures into 'false not exists'; log and/or rethrow with context (or return a structured result that distinguishes 'not found' vs 'inspect failed').
- packages/server/src/utils/docker/utils.ts:101-109 — catch {} returns false without context, losing the reason for failure.
med
For the health-check hook, log fetch errors (or propagate them into state) rather than swallowing them and returning false; at minimum include context (e.g., URL, error message) to avoid silent failures during polling.
- apps/dokploy/hooks/use-health-check-after-mutation.ts:35-44 — catch { return false } discards the underlying fetch error.

Deterministic resource cleanup 50%

The primitive exists in parts of the codebase: `runWebServerBackup` deterministically cleans up its temp directory via `finally`, including on error paths. However, other acquired resources are not deterministically cleaned up: the log write stream in the backup flow is only explicitly ended on the success path, and the SSH connection in `execAsyncRemote` is not ended when `conn.exec(...)` itself fails before stream ‘close’/connection error handlers fire.

high
In `runWebServerBackup`, deterministically close/destroy `writeStream` in a `finally` that covers both the inner success path and the outer `catch` path. Ensure it runs even when errors occur after the stream is created (e.g., before `writeStream.end()` currently happens only on success).
- packages/server/src/utils/backups/web-server.ts:36-129 — A `writeStream` is created and used throughout, but `writeStream.end()` is only called on the success path; the outer `catch` writes an error and calls `writeStream.end()` but there is no `finally` guaranteeing stream closure across all internal early exits.
high
In `execAsyncRemote`, ensure `conn.end()` is called when `conn.exec(command, (err, stream) => ...)` hits the immediate `if (err) { reject(...); return; }` branch (i.e., add cleanup there or use a shared `finally`/guard).
- packages/server/src/utils/process/execAsync.ts:153-173 — The `conn` is acquired, but in the `conn.exec(..., (err, stream) => { if (err) { ... reject(...); return; } ... })` branch there is no `conn.end()` call before rejecting.
med
Standardize this pattern: whenever the code acquires a handle (fs stream, SSH client, db client, temp dir), ensure release is in the same lexical scope via `try/finally` (or equivalent) at the acquisition site.
- packages/server/src/utils/backups/web-server.ts:41-123 — Use the existing correct pattern from `runWebServerBackup`’s temp dir `finally` cleanup as a template for other resources like `writeStream`.

Atomicity / all-or-nothing 0%

Atomicity is present in at least one place: notification creation/update correctly wraps related multi-table inserts in a single DB transaction. However, several deployment/backup creation flows do multi-step sequences (external log setup + DB writes) without any transaction spanning the whole operation, meaning failures can leave the system in an observable half-completed or inconsistent state.

high
For deployment creation flows, wrap the entire related DB mutation sequence in a single transaction (e.g., insert the deployment/backup row and any corresponding status updates), and ensure the catch path does not introduce a second inconsistent row without compensating/rollback logic. If external side effects (remote mkdir/echo) must remain, gate them to occur after the DB transaction commits, or use a transactional outbox / compensating action pattern.
- packages/server/src/services/deployment.ts:86-193 — Try path does external log setup + inserts deployment row; catch inserts an additional 'error' row and updates application status with no transaction boundary covering the sequence.
- packages/server/src/services/deployment.ts:220-338 — Preview deployment creation shows the same pattern: external setup + insert in try; catch inserts error row + updates preview status, without transaction.
- packages/server/src/services/deployment.ts:338-430 — Backup creation shows the same pattern: external setup + insert in try; catch inserts error row, without transaction.
med
Avoid inserting a second “error” deployment/backup record after a failure that occurs mid-flow unless it is strictly designed to be consistent. Prefer updating the originally inserted record to `error` (within the same transaction) rather than inserting a new one.
- packages/server/src/services/deployment.ts:86-193 — The code inserts a deployment row in the try block and, upon catching, inserts another deployment row with `status: "error"`.

Input / boundary validation 57%

This codebase includes boundary validation for several high-risk public API entry points—especially webhook handlers (GitHub and Stripe) and OAuth (Gitea). These entry points validate required headers/query parameters, verify cryptographic signatures, and reject invalid/unsupported event types before acting. However, other API entry points (e.g., deploy/[refreshToken] and deploy/compose/[refreshToken]) are also trust boundaries that should validate refreshToken/query-derived inputs at the boundary; based on the portions read, the handlers rely heavily on downstream DB lookups and internal branching rather than explicit “shape/range” validation for those entry inputs.

high
Add explicit boundary validation for the deploy webhook entry points’ query params (refreshToken) before using them in DB queries and downstream logic. For example: reject missing refreshToken, array refreshToken, and non-string/empty values with a clear 400 response.
- apps/dokploy/pages/api/deploy/[refreshToken].ts:1-220 — Entry point reads req.query.refreshToken and immediately uses it in a DB lookup (eq(applications.refreshToken, refreshToken as string)) without an explicit boundary check for presence/shape.
- apps/dokploy/pages/api/deploy/compose/[refreshToken].ts:1-140 — Compose webhook entry point reads req.query.refreshToken and uses it in eq(compose.refreshToken, refreshToken as string) without explicit shape validation.
med
Harden request-body derived values used in branching logic (e.g., req.body.commits, refs, repository/owner fields) with explicit type/shape checks at the boundary, not only through optional chaining and comparisons. This prevents malformed payloads from flowing into arrays (flatMap) or string operations.
- apps/dokploy/pages/api/deploy/[refreshToken].ts:1-220 — Uses req.body?.commits?.flatMap((commit:any)=>commit.modified) and other req.body nested fields to compute deployment decisions.
- apps/dokploy/pages/api/deploy/compose/[refreshToken].ts:1-140 — Similarly uses req.body?.commits?.flatMap(...) and branch/ref extraction from potentially untrusted payloads.
low
For completeness, apply a consistent error response contract for invalid webhook payload shape (400 with a non-sensitive message) across all webhook providers, so bad inputs fail fast and uniformly.
- apps/dokploy/pages/api/deploy/[refreshToken].ts:220-430 — There are multiple early returns with specific messages, but the entry-input validation for query-derived values could be brought closer to the strictness used in gitea/stripe/github handlers.

Failure isolation / bulkheading 0%

The codebase does not implement clear failure-isolation/bulkheading for independent workloads sharing a common queue/worker resource. Jobs for different logical subsystems (applications vs compose vs previews; deploy vs redeploy) are handled by a single shared BullMQ queue/worker, without visible per-partition resource caps or separate isolation boundaries. Additionally, the worker error branch catches errors but only logs them, rather than ensuring isolation via bounded, partitioned execution.

high
Introduce bulkheading by splitting the shared `deployments` queue into separate queues/workers for the independent workload classes (e.g., applications vs compose vs previews, and/or deploy vs redeploy), and/or enforce per-partition concurrency limits. Ensure each queue has its own Worker instance and (ideally) its own redis/pool configuration if that’s part of the failure mode.
- apps/dokploy/server/queues/deployments-queue.ts:17-94 — Single `Worker('deployments', ...)` runs all job variants in one worker execution context; no partitioned execution/isolation boundary is visible.
med
Add explicit safeguards to prevent one partition from starving the rest: separate queue names, set BullMQ worker `concurrency` per queue, and consider per-tenant/per-application rate limiting (so a hot application can’t monopolize the shared worker).
- apps/dokploy/server/queues/queueSetup.ts:12-24 — `myQueue` is a single global queue used for multiple logical partitions (filtering by applicationId/composeId), indicating a shared resource without bulkheading.
low
Improve the worker failure path to propagate/retry in a controlled manner (with backoff and bounded retries) rather than only `console.log`. While this isn’t bulkheading by itself, it reduces the chance of repeated failures consuming the shared worker capacity.
- apps/dokploy/server/queues/deployments-queue.ts:73-77 — The catch block only logs `console.log('Error', error);` and does not trigger a controlled isolation-friendly failure strategy (e.g., bounded retries, dead-lettering, or partition-aware throttling).

Graceful shutdown 44%

The codebase contains a `gracefulShutdown` primitive in `apps/schedules/src/index.ts`, and there is at least one SIGTERM handler elsewhere (`apps/dokploy/server/queues/queueSetup.ts`). However, the shutdown logic frequently ends with `process.exit(0)` immediately after closing some resources, and the main dokploy server (`apps/dokploy/server/server.ts`) does not show any termination-signal draining/closing wiring at the process entry point—risking dropped in-flight HTTP/WS work and incomplete flushing during shutdown.

high
Replace immediate `process.exit(0)` in shutdown handlers with a proper sequence: stop accepting new requests (close/stop the HTTP listener), drain in-flight requests and WS connections, stop background job processing, await all worker/queue shutdown promises, and only then exit (optionally with a bounded timeout).
- apps/schedules/src/index.ts:67-76 — `gracefulShutdown` awaits worker `.close()` but then calls `process.exit(0)` immediately.
- apps/dokploy/server/queues/queueSetup.ts:31-36 — SIGTERM handler calls `myQueue.close()` and then immediately calls `process.exit(0)`.
high
Add a termination-signal handler at the dokploy main server entry (`apps/dokploy/server/server.ts`) that coordinates shutdown of: HTTP server (`server.close()`), all WebSocket servers, and the background workers/cron jobs started in this entry point.
- apps/dokploy/server/server.ts:1-81 — This file starts the long-running HTTP server and initializes WS and background work, but the shown code does not register SIGTERM/SIGINT graceful-drain logic.
med
For BullMQ shutdown (`queueSetup.ts`), ensure the handler also prevents new jobs from being enqueued/processed during shutdown, and waits for running jobs to finish (or cancels with a bounded grace period) rather than only closing the queue handle.
- apps/dokploy/server/queues/queueSetup.ts:31-36 — Current handler only calls `myQueue.close()` then exits; it does not demonstrate draining/waiting for active jobs.

API & Extensibility

A checked-in OpenAPI spec, versioned routes, a webhook system with retries and signing, and tenant-scoped rate limits.

0% 7/10 scored

Machine-readable API contract 0%

0/2 expected sites not present
Versioning & backward compatibility 0%

0/3 expected sites
Programmatic auth with scopes 0%

0/3 expected sites
Idempotent writes 0%

0/5 expected sites not present
Consistent pagination & filtering 0%

0/1 expected sites
Outbound events / webhooks 0%

0/3 expected sites not present
Consistent errors & status codes 0%

0/6 expected sites not present

Machine-readable API contract 0%

The repository includes logic to generate and serve an OpenAPI contract (a generation script and a Swagger UI page), but the machine-readable API contract artifact (e.g., checked-in `openapi.json`/`openapi.yaml`) is not present in the repo. Therefore, third-party consumers cannot rely on a stable, discoverable, checked-in spec without running generation.

high
Check in the generated OpenAPI artifact (e.g., commit `openapi.json` at the repo root, or `openapi.yaml` under the API module) and ensure it covers the full public TRPC surface exposed via `appRouter`. Wire CI to regenerate and fail if the committed spec drifts from the router.
- apps/dokploy/scripts/generate-openapi.ts:1-133 — Generation targets a repository file path (`../../../openapi.json`), implying a contract artifact is intended—but it is not checked in.
med
Make the contract versioning explicit in the spec (title/version already exists in the generator) and expose a stable docs URL that points to the committed artifact (not only a runtime TRPC call).
- apps/dokploy/scripts/generate-openapi.ts:1-133 — Generator hardcodes version info and baseUrl/docsUrl; this should align with the committed artifact’s published URLs.
- apps/dokploy/pages/swagger.tsx:1-116 — Swagger UI currently depends on `api.settings.getOpenApiDocument` from TRPC at runtime.

Versioning & backward compatibility 0%

The codebase generates an OpenAPI document (with version metadata), but there is no clear HTTP versioning strategy, no deprecation/sunset markers, and no version negotiation in the actual public endpoints defined in apps/api/src/index.ts (e.g., /deploy, /cancel-deployment, /jobs are unversioned). As a result, backward compatibility appears to rely on convention rather than an explicit, discoverable contract governance mechanism.

high
Introduce an explicit versioning strategy for the public HTTP service in apps/api/src/index.ts (e.g., /v1/deploy, /v1/cancel-deployment, /v1/jobs) and define deprecation/sunset headers (e.g., Deprecation, Sunset) plus migration links when changing request/response schemas or error shapes.
- apps/api/src/index.ts:77-120 — Shows unversioned public write endpoint POST /deploy with no deprecation/sunset policy.
- apps/api/src/index.ts:120-173 — Shows unversioned public write endpoint POST /cancel-deployment with no deprecation/sunset policy.
- apps/api/src/index.ts:175-211 — Shows unversioned public list endpoint GET /jobs with no deprecation/sunset policy.
med
Align contract governance with the OpenAPI generation: ensure the generated spec covers the full published HTTP surface and that breaking changes trigger a new version only (additive-only evolution for minor changes), with contract tests to prevent silent breaking schema drift.
- apps/dokploy/scripts/generate-openapi.ts:18-31 — OpenAPI generation includes a version field, but the existence of a versioning/back-compat policy for HTTP endpoints is not demonstrated.

Programmatic auth with scopes 0%

The codebase does have programmatic API-key authentication using an `x-api-key` header verified by `better-auth` and integrated into the shared request validation (`validateRequest`). However, the implementation does not clearly enforce per-credential scopes/permissions on the request path; it authenticates the key and associates an organization, but scope enforcement is not evident in the request context setup. This means a third-party integrator may be able to authenticate, but cannot reliably rely on stable, per-credential scope semantics (and revocation/rotation/scoped least-privilege behavior is not clearly consumable from the enforced contract).

high
Enforce per-API-key scope/permissions in `validateRequest`: load the API key’s `permissions` (or whatever scope model is intended), and map it into `ctx.user`/authorization checks so every protected endpoint consistently applies least-privilege based on the credential’s scopes.
- packages/server/src/lib/auth.ts:240-520 — Shows `validateRequest` verifying `x-api-key`, loading the `apikey` record, extracting `organizationId` from `apiKeyRecord.metadata`, loading the member, and constructing a mock `session`/`user`—but there is no visible step that applies API-key-specific permissions/scopes to the authorization model.
high
Add/confirm API key scope fields are actually written and used: ensure API-key creation (`createApiKey` + related endpoints) stores scopes/permissions in a dedicated, enforceable place (not only UI-only fields), and that `validateRequest` reads those exact fields.
- packages/server/auth-schema2.ts:60-130 — The `apikey` table includes `permissions`, `expiresAt`, and last-used/rate-limit fields, but the request validation path must explicitly use them to provide a true scoped-credential primitive.
med
Create a stable public contract for API credentials (scope model + revocation/rotation semantics + last-used + rotation cadence expectations) and expose it consistently in the API docs/spec (or the existing Swagger route page if it is intended to be consumable).
- apps/dokploy/pages/swagger.tsx:1-120 — A Swagger page exists (indicating intent for a machine-discoverable contract). The scoped-credential primitive should be discoverable there with the scope model and auth scheme, not just implemented internally.

Per-tenant rate limiting N/A

I could not find any implementation of per-tenant/per-organization/per-consumer rate limiting in the server API edge. Searches for rate-limit/ratelimit/throttle/limiter/bucket4j-style middleware and related symbols returned no wiring, and the API routing appears to be handled via tRPC procedures without any tenant-scoped limiter middleware or standardized 429 + limit/remaining headers.

high

Add an API-edge rate limiting middleware for tRPC requests that keys the bucket by the tenant/organization/consumer identifier (e.g., ctx.session.activeOrganizationId or authenticated user/app key), and returns 429 with standardized Limit/Remaining headers plus retry guidance.
med

Implement and enforce a shared error/response contract for rate-limit rejections (machine code + correlation/request id) so integrators can programmatically detect rate limiting and back off safely.
low
Document the rate limiting policy in the generated OpenAPI/trpc/OpenAPI doc surface (or a dedicated API docs page), including limits, scope, and header semantics.
- apps/dokploy/pages/swagger.tsx:1-116 — Swagger spec is served dynamically from tRPC, but this file does not show any rate-limit policy or headers being modeled/enforced at the API layer.

Idempotent writes 0%

Idempotent writes are not implemented as a public, consumable contract in the examined public mutation endpoints. The code handling for `/deploy`, `/cancel-deployment`, `/create-backup`, `/update-backup`, and `/remove-job` performs side-effecting work without reading an idempotency key or providing replay/deduplication behavior.

high
Add support for an idempotency key on each public mutation endpoint (start with `/deploy` and `/create-backup`). Accept a standard header (e.g., `Idempotency-Key`) and persist the key->result mapping (or key->operation status) in a durable store scoped to the authenticated principal/org. On retry, return the original result instead of repeating side effects.
- apps/api/src/index.ts:1-215 — Shows `/deploy` enqueues work without any idempotency-key handling.
- apps/schedules/src/index.ts:1-117 — Shows backup scheduling endpoints perform side effects without any idempotency-key handling.
high
Ensure timeouts/retries produce replayable behavior: store request input hash (or canonicalized payload) alongside the idempotency key; if the same key is reused with different inputs, return a distinct conflict error (e.g., HTTP 409) rather than executing a new operation.
- apps/api/src/index.ts:1-215 — Write endpoints currently have no mechanism to detect/reject conflicting retries.
med
Extend the same idempotency contract consistently to the remaining mutation endpoints (`/cancel-deployment`, `/update-backup`, `/remove-job`) so integrators can safely retry across the whole write surface.
- apps/schedules/src/index.ts:1-117 — All shown schedules endpoints are side-effecting and currently retry-unsafe.
- apps/api/src/index.ts:1-215 — Cancellation endpoint currently has no idempotency behavior.

Consistent pagination & filtering 0%

Pagination/filtering exists for at least one collection (audit logs) using bounded `limit` and `offset` plus multiple filters, but the required cursor-based convention is not implemented there. No list endpoint was found to correctly and consistently apply cursor pagination + a shared filter convention.

high
Introduce cursor-based pagination for the audit-log list endpoint (and any other list endpoints), using a shared convention for cursor parameter name (e.g., `cursor` / `nextCursor`) and returning a deterministic `nextCursor` (or `hasMore`) alongside results.
- apps/dokploy/server/api/routers/proprietary/audit-log.ts:1-68 — The list query currently accepts `limit` and `offset` only; no cursor fields are part of the contract.
- packages/server/src/services/proprietary/audit-log.ts:1-96 — The implementation paginates with `limit` + `offset` (`findMany` with `limit, offset`) instead of cursor conditions (e.g., createdAt < lastSeenCreatedAt).
med
Define and reuse a common filtering contract for list endpoints (same query parameter names and semantics across collections), and ensure all list endpoints interpret filters consistently (e.g., date range boundaries, string match mode).
- packages/server/src/services/proprietary/audit-log.ts:1-96 — Audit logs demonstrate filter support (userId/userEmail/resourceName/action/resourceType/from/to), but the pagination mechanism and contract are not aligned with the cursor convention expected by this primitive.

Outbound events / webhooks 0%

No general-purpose outbound events/webhooks delivery primitive (subscription store + delivery worker + HMAC-signed, versioned payloads + exponential-backoff retry capped with alerting + idempotent delivery + documented event catalog) was found. The codebase does perform outbound POSTs for notifications to third-party services (e.g., Discord/Slack/custom endpoints), but these calls are not implemented as a webhook event delivery primitive with signing, retries, idempotency, and a stable public event contract.

high
Implement a dedicated outbound webhook/event system: (1) subscription store (per tenant/credential), (2) background delivery worker, (3) HMAC signing of every payload with versioned schema, (4) exponential-backoff retries with a max attempt limit then flag-and-alert, (5) idempotent delivery using a persisted delivery id/key, and (6) an event catalog + redelivery semantics in the public API.
- packages/server/src/utils/notifications/utils.ts:1-394 — Outbound notification delivery is done via direct fetch POST calls to webhook URLs, without any shared webhook event contract, signing, versioning, retry/backoff, or idempotency guarantees visible in the delivery utility.
med
Add consistent integrity and failure-handling around outbound webhook delivery: include an HMAC signature header, payload version field, retry policy, and correlation/delivery ids returned/stored per attempt.
- packages/server/src/utils/notifications/utils.ts:80-170 — Multiple notification senders call fetch() directly and either throw or log errors; there is no common retry/backoff/idempotency/signed-payload mechanism evident in the utility.
low

Document a public, stable webhook event catalog (events, payload schemas, signature algorithm/header, retry semantics, and redelivery rules) and ensure the entire exported route inventory is covered by the spec.

Consistent errors & status codes 0%

A consistent, machine-parseable error envelope with uniform status-code semantics and a correlation id on every error does not appear to be implemented as a shared response-layer primitive. API endpoints return endpoint-specific ad-hoc JSON error shapes (e.g., Unauthorized 401, invalid API key 403, various 4xx/5xx bodies) without a common error contract that clients can integrate against without per-endpoint handling.

high
Introduce a single, shared error response envelope for all API surfaces (tRPC/Next API route handler and the Hono service). Include: HTTP status, a stable machine error code, a human message, and a correlation/request id on every error response.
- apps/dokploy/pages/api/[...trpc].ts:1-31 — Returns ad-hoc `{ message: "Unauthorized" }` for auth failures; no shared envelope or correlation id.
- apps/api/src/index.ts:52-63 — Returns ad-hoc `{ message: "Invalid API Key" }` with 403; no machine code or correlation id.
high
Standardize status-code mapping across the API: 400 for malformed input, 401/403 for auth, 409 for idempotency conflicts (if used), 422 for semantic/validation failures, 429 for throttling, and restrict 5xx to true server faults (no 200 fallbacks on error cases).
- apps/api/src/index.ts:165-203 — `GET /jobs` mixes 400/503 and returns `[]` with HTTP 200 on failure, which undermines consistent client behavior.
med
Ensure correlation id propagation: generate a correlation id at the top of request handling (middleware) and attach it to all error responses; optionally also log it.
- apps/api/src/index.ts:1-215 — Multiple endpoints return errors directly via `c.json(...)` without a request-scoped correlation id.
med
Update the OpenAPI generation/error mapping to reflect the shared error envelope (so integrators can rely on documented error schemas and codes).
- apps/dokploy/pages/api/[...trpc].ts:1-31 — OpenAPI handler is created, but auth errors are returned ad-hoc before the tRPC/OpenAPI error mapping.

Sandbox / test mode N/A

No consumer-facing Sandbox / test-mode contract (i.e., a documented sandbox base URL plus test keys and isolated test data) was found in the codebase. The only obvious “test keys” are for the repo’s own unit/integration tests (Vitest), not for external integrators.

high
Add a documented sandbox/test environment contract for integrators: publish a sandbox base URL, create dedicated test credentials/keys with least-privilege scopes, and ensure sandbox data is isolated from production (with clear lifecycle/reset behavior).
- apps/dokploy/__test__/vitest.config.ts:1-35 — The current 'test' credentials are only wired into Vitest (project test environment), not into any public sandbox configuration.
med
Add explicit configuration entries for sandbox mode (e.g., SANDBOX_BASE_URL / SANDBOX_API_KEY) and ensure runtime code selects sandbox endpoints/keys when SANDBOX_MODE is enabled, with documentation in the repo (README/docs).
- apps/dokploy/__test__/vitest.config.ts:1-35 — Currently, the only configuration clearly labeled 'test' is test-runner configuration; no equivalent consumer-facing sandbox wiring is evidenced.

Extension points / plugins N/A

No stable, documented, versioned extension/plugin interface (with a registry and isolation from core) was found. The codebase appears to support configurable integrations (e.g., registries/providers) and internal Next.js API routes/webhooks, but not a third-party plugin contract that an external developer can implement and register without forking.

Not applicable to this codebase: Per-tenant rate limiting, Sandbox / test mode, Extension points / plugins.

Integration Depth

Per-system adapters behind one shared interface with bi-directional sync — not per-customer scripts held together with spreadsheets.

33% 9/10 scored

Shared integration abstraction 0%

0/6 expected sites not present
Metadata-driven mappings 67%

3/4 expected sites
Per-integration reliability 0%

0/2 expected sites not present
Sync state & reconciliation 0%

0/1 expected sites not present
Inbound validation & normalization 50%

3/4 expected sites
Per-tenant integration credentials 33%

3/3 expected sites
Per-integration observability 0%

0/6 expected sites not present
Connector breadth for the category 61%

5/6 expected sites
Build-vs-buy posture 83%

4/4 expected sites

Shared integration abstraction 0%

The codebase has provider-specific integration implementations and provider-specific TypeScript interfaces/types (e.g., `gitea-utils.ts`, and separate token/clone/branch/repo logic for Gitea/GitHub/GitLab). However, there is no clearly shared integration abstraction: no single common adapter interface backed by a canonical data model that multiple distinct provider integrations implement consistently. As a result, the integration surface appears architected as separate per-provider “snowflakes” rather than a shared interface + canonical entities.

high
Introduce a shared integration abstraction layer: define a common adapter interface (per provider) that covers at least (1) OAuth credential lifecycle (authorize/callback handling + token refresh), (2) canonical repo/branch retrieval, and (3) canonical actions (e.g., clone/checkout inputs).
- packages/server/src/utils/providers/gitea.ts:1-240 — Current Gitea implementation bundles auth + cloning + connection checks; these behaviors should be expressed through a shared contract.
- packages/server/src/utils/providers/github.ts:1-226 — Current GitHub implementation bundles auth + cloning + permission checks; should implement the shared adapter interface.
- packages/server/src/utils/providers/gitlab.ts:1-260 — Current GitLab implementation bundles auth + cloning + repo/branch listing; should implement the shared adapter interface.
high
Define and enforce canonical domain entities used by all adapters (at minimum: ProviderIdentity, Credential/Token state, RepositoryRef, BranchRef). Add validation/normalization at the adapter boundary so each provider maps external representations into the canonical model.
- apps/dokploy/utils/gitea-utils.ts:1-88 — Gitea-specific response/entity interfaces exist, but the absence of corresponding shared canonical entities across other providers indicates the canonical-model layer is missing.
med
Refactor provider HTTP surfaces (e.g., `/api/providers/*/authorize` and `/callback`, webhook endpoints) to delegate to the shared integration abstraction instead of duplicating provider-specific OAuth/event logic in route handlers.
- apps/dokploy/pages/api/providers/gitea/authorize.ts:1-42 — Provider-specific authorization flow logic should be routed through the shared OAuth adapter contract.
- apps/dokploy/pages/api/providers/gitea/callback.ts:1-96 — Provider-specific token exchange + credential persistence should be standardized through the shared adapter.
- apps/dokploy/pages/api/providers/github/webhook.ts:1-20 — Webhook ingestion should map provider webhook payloads into canonical integration events via a shared adapter.

Bidirectional sync N/A

No bidirectional sync primitive (read+write synchronization of external system entities with sync state/reconciliation) was found. The provider endpoints observed are for OAuth/setup and webhook routing/redirects, not for maintaining two-way synchronization with external systems.

high
If the product requires two-way synchronization with external systems (e.g., mirroring issues/PRs/resources back into Dokploy, or updating Dokploy state from webhook events), add an explicit bidirectional sync adapter layer per external provider (shared interface, canonical entities, write-back handlers).
- apps/dokploy/pages/api/providers/github/webhook.ts:1-20 — Webhook path currently performs only a redirect; it’s a likely place where bidirectional sync/event-driven writes should start if the use case requires it.
med
Implement a persistent sync state mechanism (cursor/watermark) and idempotent upserts for incremental sync, plus drift handling and failure handling per provider/adapter.
- apps/dokploy/pages/api/providers/gitea/callback.ts:1-96 — Current token callback lacks any sync-state/cursor/reconciliation logic, indicating setup-only behavior.

Metadata-driven mappings 67%

Metadata-driven mappings appear in the SSO provider configuration flow: the UI constructs per-provider `mapping` objects inside `oidcConfig`/`samlConfig`, the server schema validates them, and the SSO router forwards them as config updates. However, the shared runtime interpreter for applying these mappings during authentication is not evident in the visible shared SSO service layer (only generic helpers are present), so implementation depth looks partial.

high
Locate and verify the runtime interpreter that applies `oidcConfig.mapping` / `samlConfig.mapping` to identity claims during SSO login. Ensure mapping evaluation is centralized (one canonical function/service) and not reimplemented per connector/flow.
- packages/server/src/services/proprietary/sso.ts:1-47 — Current shared SSO service code shows no mapping application logic—only generic provider/header/origin helpers—so the runtime mapping interpreter likely lives elsewhere or is missing.
med
Confirm end-to-end persistence and versioning: ensure `mapping` is stored exactly as validated by `ssoProviderBodySchema` and that updates are migration-safe (e.g., handle mapping schema changes gracefully).
- packages/server/src/db/schema/sso.ts:1-134 — Schema defines mapping sub-objects under `oidcConfig.mapping` and `samlConfig.mapping` with optional/required fields; validate that the actual persisted data matches and is used consistently.
low
Reduce UI-time conditional mapping logic (e.g., Azure vs non-Azure) by expressing differences as config variants or defaults computed server-side, so mapping rules remain metadata-driven.
- apps/dokploy/components/proprietary/sso/register-oidc-dialog.tsx:120-205 — OIDC mapping object is selected via an `isAzure` condition and then embedded into config; this could be refactored to defaults/metadata if a server-side interpreter exists.

Per-integration reliability 0%

No correctly implemented per-integration reliability pattern was found. While the codebase uses BullMQ workers for deployment tasks and configures Stripe SDK network retries, the integration execution paths do not show the required combination of (1) retry with backoff, (2) a dead-letter queue/parking area for items that still fail, and (3) alerting/visibility per integration when retries are exhausted.

high
Add retry-with-backoff + DLQ/parking for failures inside the BullMQ deployment worker. Concretely: configure BullMQ job attempts/backoff and ensure failed jobs are routed to a dedicated dead-letter queue (or are persisted and re-processed), and emit alerting/metrics when retries are exhausted.
- apps/dokploy/server/queues/deployments-queue.ts:1-97 — Worker catches all errors and only logs, without any visible retry policy or dead-letter parking.
med
For Stripe webhook processing, implement an event processing failure strategy: on handler failures, enqueue the webhook event payload/id to a DLQ (or separate retry queue) and ensure the system reports alerting when max attempts are exceeded. Consider idempotency so replays are safe.
- apps/dokploy/pages/api/stripe/webhook.ts:1-240 — Only `maxNetworkRetries: 3` is configured on the Stripe client; no dead-letter/parking behavior is shown for downstream processing failures.

Sync state & reconciliation 0%

No implementation of a reusable “Sync state & reconciliation” primitive (stored cursor/watermark + idempotent upserts + drift detection between systems) was found. The codebase includes external-system fetching and event pagination, but it does not persist integration sync state or perform reconciliation across runs.

high
Add durable per-integration sync state (cursor/watermark) and reconciliation for the external event polling path: persist the last processed position (e.g., Inngest cursor/internal_id or receivedAt) in DB, use it on the next run for incremental fetch, and implement idempotent upserts into your internal job/event tables. Optionally add drift detection (e.g., compare expected vs observed counts/status transitions, or detect missing/changed runs and repair).
- apps/api/src/service.ts:64-116 — Pagination uses a local variable `cursor` and fetches events/runs without any durable stored watermark or cross-run reconciliation logic.

Inbound validation & normalization 50%

The codebase contains a validation/normalization pattern (Zod schemas with constraints and transforms) and some integration-boundary validation for external inputs (Stripe webhook signature + event-type allowlist; GitLab/Gitea OAuth callbacks validate query parameters and token exchange results before persisting). However, the full inbound validation & normalization primitive (canonical modeling + dedup + quarantining of bad records) does not appear consistently at the integration boundaries—e.g., GitHub’s webhook endpoint is effectively a redirect-only handler with no payload validation/canonicalization.

high
Introduce a consistent inbound boundary contract for external integration endpoints: (1) validate payload/query via schema (e.g., Zod), (2) normalize into a canonical internal input DTO, (3) add idempotency/dedup (event id / OAuth replay protection), and (4) quarantine bad records (store failures with reason, do not silently redirect/throw away). Start by standardizing Stripe webhook, GitHub webhook, and the OAuth callbacks into the same pattern.
- apps/dokploy/pages/api/stripe/webhook.ts:1-120 — Already validates signature and event type, but no schema-driven canonical normalization and no dedup/quarantine visible in the inspected boundary logic.
- apps/dokploy/pages/api/providers/github/webhook.ts:1-20 — Endpoint performs no real validation/normalization of inbound request besides method check; it redirects regardless of input.
- apps/dokploy/pages/api/providers/gitlab/callback.ts:1-65 — Validates `code` and token presence, but no visible dedup/idempotency or quarantine mechanism at the boundary.
med
For each integration boundary, add explicit idempotency: Stripe should key off Stripe event id; OAuth callbacks should add replay protection (e.g., one-time state storage / nonce verification) so repeated callbacks don’t re-write tokens or cause inconsistent state.
- apps/dokploy/pages/api/stripe/webhook.ts:1-120 — Processes events directly after signature verification; no dedup/idempotency key handling is visible in the shown handler.
- apps/dokploy/pages/api/providers/gitea/callback.ts:1-96 — Parses `state` and exchanges `code` for tokens, but no replay/idempotency protection is visible in the shown logic.
low
Leverage existing Zod schemas in `packages/server/src/db/validations/*` as the canonical normalization layer (or create integration-specific schemas) so inbound external data is consistently normalized before persistence.
- packages/server/src/db/validations/domain.ts:1-129 — Demonstrates Zod validation + normalization via transforms; extending this approach to integration inbound DTOs would improve consistency.

Per-tenant integration credentials 33%

This codebase does perform per-provider/per-tenant credential refresh for GitHub/GitLab/Gitea integrations (token refresh is keyed by a specific provider id and persisted back to the corresponding credential row). However, the implementation appears to store sensitive OAuth/app credentials directly in database columns (e.g., `refreshToken`, `accessToken`, `clientSecret`, `privateKey`) rather than retrieving them from a dedicated secret manager per tenant, so the “secret-store + revocable per tenant” portion of the primitive is not demonstrated.

high
Move integration credential material (OAuth client secrets, refresh tokens, GitHub private keys) out of plain DB columns into a per-tenant secret manager (e.g., Vault/AWS Secrets Manager/GCP Secret Manager). Change refresh flows to fetch credentials from the secret manager on-demand and ensure rotation/revocation operates per tenant/integration.
- packages/server/src/db/schema/gitea.ts:1-42 — Credentials are stored as DB columns (`clientSecret`, `accessToken`, `refreshToken`, `expiresAt`) rather than secret-manager references.
- packages/server/src/db/schema/gitlab.ts:1-38 — Credentials are stored as DB columns (`secret`, `accessToken`, `refreshToken`, `expiresAt`) rather than secret-manager references.
high
Update the refresh endpoints (`refreshGiteaToken`, `refreshGitlabToken`) to use secret-manager lookups keyed by organizationId + providerId, and ensure failures revoke/disable invalid tokens for that tenant/provider.
- packages/server/src/utils/providers/gitea.ts:19-89 — Refresh uses stored DB `refreshToken` and persists updated tokens back to DB; no secret-manager abstraction is involved.
- packages/server/src/utils/providers/gitlab.ts:1-44 — Refresh uses stored DB `refreshToken` and persists updated tokens back to DB; no secret-manager abstraction is involved.
med
Confirm/strengthen revocation semantics when a tenant removes or rotates an integration: ensure secret-manager deletion/disablement is performed and that subsequent token refresh attempts fail safely for that tenant/provider.
- packages/server/src/db/schema/git-provider.ts:1-75 — The tenant boundary (`organizationId`) exists, but credential revocation guarantees are not shown; credentials appear to live in integration-specific tables.

Per-integration observability 0%

No per-integration observability (per-provider metrics/status/last-sync) was found in the code paths related to external integrations (GitHub/Gitea/Stripe). The integration handlers/utilities primarily rely on redirects and console.log/console.error/warn without emitting structured per-integration metrics, updating a stored last-sync/status, or otherwise surfacing health/throughput/failures to ops.

high
Introduce a shared per-integration telemetry contract (e.g., IntegrationHealth/SyncStatus store + metrics emitter) and wire it into each integration entrypoint and adapter operation (webhook handlers, OAuth flows, token refresh, provider API calls).
- apps/dokploy/pages/api/providers/github/webhook.ts:1-20 — Shows an integration entrypoint that currently has no success/failure/latency instrumentation.
- apps/dokploy/pages/api/providers/gitea/callback.ts:1-96 — Shows provider callback failure handling via console.error without per-integration telemetry.
high
Persist per-integration last attempt + last success + last failure reason (and timestamps) in the database (or monitoring store), and ensure ops/customer UIs can read it.
- apps/dokploy/pages/api/providers/gitea/callback.ts:1-96 — Token exchange and DB update failures currently only redirect/log; add persisted status updates around these steps.
med
Add metrics emission per integration operation: counters for success/failure by provider + latency histograms by operation (e.g., token refresh, webhook processing, repo listing).
- packages/server/src/utils/providers/github.ts:1-226 — All GitHub API interactions are present, but there is no metric/timer emission around them.
med
Standardize error handling so failure reasons are structured (error codes) and are included in telemetry (instead of only console.warn/error strings).
- packages/server/src/utils/providers/github.ts:1-226 — Errors are swallowed into console.warn/returns or throws; restructure to record typed failure reasons.
- apps/dokploy/pages/api/stripe/webhook.ts:1-220 — Stripe webhook processing uses console/logging and HTTP responses; extend to structured telemetry per event type and processing stage.

Connector breadth for the category 61%

This codebase does maintain connector breadth via explicit, enumerated provider catalogs (notably S3-compatible destination providers) and via dedicated runtime provider entrypoints under /pages/api/providers/ (e.g., GitHub webhook, Gitea authorize/callback, GitLab callback). However, the breadth story across “vertical table-stakes” categories (identity/CRM/data warehouse/etc.) is not directly confirmable from the inspected evidence; it should be validated by building a full connector inventory beyond Git providers and storage/billing surfaces.

high
Create a connector inventory report (for the audit denominator): enumerate all distinct external systems supported by runtime connector entrypoints (all /pages/api/providers/** routes, plus any server-side provider modules) and compare against vertical table-stakes relevant to this product (identity, CRM, data warehouse, etc.).
- apps/dokploy/components/dashboard/settings/destination/constants.ts:1-134 — Shows one category catalog (S3-compatible destinations) exists; use this pattern to locate other breadth catalogs/inventories and ensure table-stakes are covered.
med
For each supported external system, confirm breadth is consistent between UI catalogs and runtime onboarding surfaces (authorize/callback/webhook). Missing parity (catalog lists provider but runtime endpoints absent) is a common breadth gap.
- apps/dokploy/pages/api/providers/github/webhook.ts:1-20 — Example of a runtime provider surface that should be cross-checked against any UI catalog/list for GitHub.
med
Ask the team for (and document) known “table-stakes gaps” explicitly (e.g., whether identity/CRM/data-warehouse connectors are intentionally out-of-scope). This primitive should be treated as a lightly sourced breadth follow-up where evidence is completed with product context.
- apps/dokploy/components/dashboard/settings/destination/constants.ts:1-134 — Provider coverage is provably broad for storage destinations, but this alone does not establish coverage for other table-stakes categories.

Build-vs-buy posture 83%

The codebase clearly implements owned (first-party) integration depth for external Git providers (GitHub, Gitea, GitLab) via custom OAuth/token and API logic in per-provider modules. Evidence does not indicate reliance on an embedded third-party iPaaS/connectors platform for these integrations (supporting a build posture rather than buy/rented depth).

high
Confirm whether there is (or should be) a shared canonical adapter interface across providers (e.g., a GitProvider contract implemented by GitHub/Gitea/GitLab). Right now, the build posture is present, but cross-provider consistency of the abstraction/interface is not fully evidenced from the inspected files.
- apps/dokploy/utils/gitea-utils.ts:1-88 — Defines provider interfaces/types (e.g., GitProvider interface and related types), which should be checked for reuse/implementation consistency across other providers.
- packages/server/src/utils/providers/github.ts:1-60 — GitHub connector logic is implemented directly in provider utils (Octokit/GitHub App auth) without evidence of implementing a common interface in the lines inspected.
med
Do a quick inventory check for any embedded third-party integration platforms (Nango/Zapier/Workato/etc.) or generic connector outsourcing. If none exist, document the build-vs-buy intent and acceptable scope/boundaries in an integration strategy note for future maintainers.
- packages/server/src/utils/providers/github.ts:1-60 — Owned implementation is visible here (direct Octokit/CreateAppAuth + token acquisition).
- packages/server/src/utils/providers/gitea.ts:1-120 — Owned implementation is visible here (direct fetch to token endpoint + persistence via updateGitea).
- packages/server/src/utils/providers/gitlab.ts:1-95 — Owned implementation is visible here (direct fetch to /oauth/token + persistence via updateGitlab).

Not applicable to this codebase: Bidirectional sync.

Deployability

CI/CD as code, infrastructure as code, per-environment isolation, and a one-command local boot.

38% 11/11 scored

Reproducible one-command build 0%

0/2 expected sites not present
Automated CI pipeline 50%

1/2 expected sites
Automated deployment (CD) 100%

4/4 expected sites
Infrastructure as code 0%

0/2 expected sites not present
Environment isolation 0%

0/3 expected sites not present
Local/production parity 44%

2/3 expected sites
Config & secrets externalized per env 0%

0/3 expected sites
Decouple deploy from release 0%

0/4 expected sites not present
Reversibility / rollback 67%

3/3 expected sites
Delivery cadence (DORA proxy) 89%

3/3 expected sites
Deploy-tooling ownership 67%

3/3 expected sites

Reproducible one-command build 0%

A deterministic dependency mechanism exists (pnpm lockfile + `pnpm install --frozen-lockfile` in the Docker build). However, the repository does not provide the core “reproducible one-command build” primitive: there is no root-level documented single command / bootstrap script that a developer can run from a clean clone to build and boot locally. Onboarding currently emphasizes a VPS curl|bash install.

high
Add a root bootstrap entry point that supports a clean clone and one-command local build+boot (e.g., `make dev` / `./setup.sh` / `just up`) and document it in README. Prefer using the existing Dockerfile (or a docker-compose/devcontainer) so the command is truly one-shot.
- README.md:18-34 — README currently documents a VPS install curl|bash workflow instead of a clean-clone local one-command build+boot.
high
Ensure the one-command path includes env + required dependencies setup in a reproducible way (generate/populate `.env` from templates, and start required services like Postgres/Redis via versioned configuration). If you use the Dockerfile, pair it with a versioned `docker-compose`/dev orchestration or scripts that bring up dependencies automatically.
- package.json:1-27 — Root scripts expose build/start, but there is no single documented command that performs full bootstrap (env + services) for a clean clone.
- apps/dokploy/.env.example:1-4 — An env template exists, but the current repo onboarding does not wire it into a reproducible one-command bootstrap.
med
Add/confirm CI gating for a “local build” equivalent (not just Docker image builds): run the build steps in CI (install with frozen lockfile + compile/test) so the one-command process has green evidence on main.
- .github/workflows/dokploy.yml:1-80 — The existing workflow focuses on building/pushing Docker images; it does not demonstrate a developer-equivalent clean-clone local build+boot gate in code.

Automated CI pipeline 50%

An automated CI pipeline exists for pull requests: `.github/workflows/pull-request.yml` runs build/test/typecheck automatically on every PR targeting main/canary. However, the push-based workflow (`.github/workflows/deploy.yml`) appears to focus on building and pushing Docker images on push, and does not run tests/typecheck—so CI coverage for direct pushes/main staying green is incomplete relative to the primitive definition.

high
Add a push-triggered CI workflow (or extend `deploy.yml`) that runs the same `pnpm build`, `pnpm test`, and `pnpm typecheck` steps on every push to `main` and `canary` (and make it an explicit required check for merge).
- .github/workflows/deploy.yml:1-109 — Triggered on push to main/canary, but the jobs shown only build and push Docker images (no test/typecheck steps).
med
Ensure the CI checks produced by the PR workflow are actually required for merge protection in repository settings (so merges cannot proceed when build/test/typecheck fails).
- .github/workflows/pull-request.yml:1-52 — This workflow defines the checks (`pr-check` matrix with build/test/typecheck), but merge gating depends on branch protection configuration outside the YAML.

Automated deployment (CD) 100%

Automated deployment (CD) exists and is implemented as a code-driven event pipeline: GitHub/SCM webhooks validate incoming events and automatically create/enqueue deployment jobs; an Inngest-backed deployment service executes the `deploy(...)` function and emits completion/failure events. Additionally, GitHub Actions workflows automate build/release artifacts on main, supporting an automated release-to-deploy workflow.

high
Add/confirm a documented end-to-end “release → webhook → Inngest deploy → production” path for all supported trigger types (push vs tag vs docker/git source types), including required environment variables and how to test the pipeline safely in staging.
- apps/dokploy/pages/api/deploy/github.ts:120-230 — This is the automatic GitHub trigger + enqueue surface; documentation should explicitly cover its expectations (autoDeploy/triggerType/branch/tag matching).
- apps/api/src/index.ts:1-195 — This is the execution stage; users need a clear, repeatable test harness so deploys are confidently automated.
med
Ensure the CD pipeline has an explicit rollback/revert strategy per deployment job type (e.g., database/media/image rollbacks) and that failures produce actionable remediation (not just events).
- apps/api/src/index.ts:1-115 — The pipeline emits `deployment/failed`, but rollback semantics are not shown in the audited slices; confirm and codify rollback behavior.

Infrastructure as code 0%

No infrastructure-as-code (IaC) implementation was found in this repository: the git artifact scan shows no Terraform/CloudFormation/Pulumi/Helm/wrangler/serverless/K8s-manifest IaC files. The GitHub workflows present are focused on building/pushing Docker images and release automation, but they do not include or reference reproducible, versioned infrastructure definitions to provision/manage environments.

high
Add versioned IaC for the production environment (choose the stack your target platform expects—e.g., Terraform for cloud resources, or Helm/Kubernetes manifests for cluster workloads) and make it the single source of truth for environment provisioning.
- .github/workflows/deploy.yml:1-109 — Deploy workflow currently only builds and pushes Docker images; no IaC-driven environment provisioning/reconciliation is present.
high
Wire the deploy/CD workflow to run IaC changes (plan/apply) for each environment with a reproducible path from a clean checkout; ensure prod is reproducible from the same IaC and supports drift control.
- .github/workflows/dokploy.yml:1-242 — Workflow automates image build/release/sync, but does not execute or reference any IaC that would provision/manage environments.
med
Create environment directory structure and golden templates for dev/staging/prod (separate state backends/secrets per env), so the infrastructure path to production is code-reviewed and reusable.
- .github/workflows/deploy.yml:1-109 — Current automation lacks any environment-specific IaC entrypoints; introducing env-scoped IaC directories would address reproducibility.

Environment isolation 0%

No clear implementation of 'Environment isolation' (separate dev/staging/prod with isolated data + credentials for the platform itself) was found. While the project supports user-defined 'environments' and has a compose-level 'isolatedDeployment' option to prevent Docker collisions, the repository does not show stage-specific deployment configuration/templates or a dev/staging/prod isolation model for credentials/data at the application/infra boundary.

high
Introduce explicit stage separation for the platform deployment (dokploy itself): add stage-specific env templates (e.g., apps/dokploy/.env.staging.example and .env.production.example), ensure secrets/endpoints are injected per stage (no localhost/prod defaults committed), and wire stage selection into the startup/deployment flow.
- apps/dokploy/.env.example:1-4 — Current config example is single-environment/local-focused (DATABASE_URL points to localhost, NODE_ENV=development) and does not demonstrate stage isolation.
high
Clarify and enforce the mapping between 'dokploy environments' and deployment stages (dev/staging/prod). If they are meant to represent stages, implement first-class stage fields and restrict sharing of underlying accounts/volumes/data across stages; remove/adjust the hard block that prevents a 'production' named environment.
- apps/dokploy/server/api/routers/environment.ts:1-260 — The API blocks creating an environment named 'production', suggesting stage isolation is not implemented as a dev/staging/prod model.
med
Extend isolation beyond Docker collision avoidance: ensure stage selection affects credentials/accounts/state used during deployments (e.g., per-stage docker registry credentials, per-stage database endpoints, per-stage backup/restore targets), rather than only compose network/volume naming.
- packages/server/src/utils/builders/compose.ts:1-146 — isolatedDeployment affects docker network connect and env file content generation, but it does not demonstrate dev/staging/prod credential/data isolation for the overall platform.

Local/production parity 44%

Local/production parity mechanisms do exist: the repo includes a VS Code devcontainer that forwards the expected runtime service ports and uses pinned Node and PNPM versions. However, the devcontainer’s Dockerfile is not the same as (or obviously derived from) the production Dockerfile—there is likely some runtime drift (e.g., different Node image variants and production-specific tooling/config expectations).

high
Make the devcontainer build from the same production Dockerfile (or a shared base) so the runtime truly matches. Concretely, point devcontainer.json to the production Dockerfile (or extract a common “runtime base” Dockerfile used by both).
- .devcontainer/devcontainer.json:1-54 — devcontainer.json builds from Dockerfile='Dockerfile' in the .devcontainer folder, not the root production Dockerfile; this is the main parity decision point.
- .devcontainer/Dockerfile:1-21 — Local base image is node:24.4.0-bullseye-slim and installs only a minimal toolchain; production Dockerfile is different and includes additional setup.
med
Ensure local config loading mirrors production behavior: document and align which env files/variables are used at runtime (names and semantics), and avoid relying on untracked local/manual setup.
- Dockerfile:1-73 — Production sets NODE_ENV=production and copies .env.production into the image; local should use the same variable expectations and behavior.
- apps/dokploy/.env.example:1-4 — Local example config currently hardcodes a localhost DATABASE_URL, which may not reflect production deployment configuration patterns.
low
Add a quick parity check script/README entry (e.g., 'start-dev' that runs the same container command used in production plus migrations/seed steps) to make it easy to recreate the same runtime locally.
- .devcontainer/devcontainer.json:1-54 — The devcontainer exists, but the parity guarantees would be strengthened by an explicit “same command as prod” developer workflow.

Config & secrets externalized per env 0%

The codebase shows partial adoption of environment-driven configuration (e.g., Traefik port configuration via process.env and presence of .env.example templates). However, there are key anti-patterns for this primitive: at least one production/cloud URL is hardcoded in code (https://app.dokploy.com), and deployment-version defaults (e.g., TRAEFIK_VERSION) are not fully externalized, requiring code changes to adjust per environment.

high
Externalize the cloud base URL used in getDokployUrl into configuration (env var / per-env config), and remove the hardcoded https://app.dokploy.com literal.
- packages/server/src/services/admin.ts:1-167 — getDokployUrl returns a hardcoded production cloud URL when IS_CLOUD is true.
high
Make TRAEFIK_VERSION fully configurable per environment (remove or minimize hardcoded defaults). Prefer requiring TRAEFIK_VERSION (or a validated config struct) to be provided by env/config rather than defaulting to "3.6.7" in code.
- packages/server/src/setup/traefik-setup.ts:1-260 — Ties the deployment Traefik image tag to process.env.TRAEFIK_VERSION with a hardcoded fallback "3.6.7".
med
Reduce embedded environment-dependent endpoint defaults in createDefaultServerTraefikConfig by moving internal hostnames/URL templates into env/config (or clearly documenting and parameterizing them through the existing config layer).
- packages/server/src/setup/traefik-setup.ts:1-260 — Creates serviceURLDefault using a fixed internal hostname and only PORT from env, which can lead to env-specific URL behavior being driven by code defaults.

Decouple deploy from release 0%

No implementation of decouple-deploy-from-release was found. While the codebase includes an `isEnabled` concept, it appears to be permission/environment gating for UI rather than a feature-flag/rollout system that separates deployment from activation. The GitHub workflows build/publish images and create releases without showing flag-based or percentage/canary rollout control for production activation.

high
Introduce a real feature-flag/rollout mechanism (library + server-side gating) and wire it into production activation points. For example, guard new/changed backup scheduling behavior in `backupRouter.create` behind a flag checked at request time, with percentage/canary support.
- apps/dokploy/server/api/routers/backup.ts:1-120 — Production mutation logic executes directly without any rollout/flag gate, so deploy would effectively activate the behavior immediately.
high
Add rollout-aware gating to user-facing UI routes/entries that should only become available after rollout. Replace/augment permission-only gating with flag-based activation (e.g., show/enable buttons and navigation only when the rollout flag is active).
- apps/dokploy/components/layouts/side.tsx:40-120 — `isEnabled` here is driven by permissions and `isCloud`, not release activation control.
med
Connect CI/CD releases to progressive activation. If you keep publishing images for `main`, ensure production traffic/users only see new behavior via flags or canary routing rather than all-or-nothing exposure tied to `latest`.
- .github/workflows/deploy.yml:1-109 — Workflow builds/pushes images on branch pushes; no progressive activation controls are shown.
- .github/workflows/dokploy.yml:1-242 — Workflow generates a release on `main` and produces `latest` images; no flag-based rollout/activation controls are shown.

Reversibility / rollback 67%

This codebase does implement a production rollback path for deployments (API + service-layer Swarm service update using stored `fullContext` and an image tag), with healthcheck/rollback config wired into the Swarm TaskTemplate. However, migration reversibility is not demonstrated: the migration script runs forward migrations only and provides no rollback/down/reverse support, which undermines full “reversibility without corruption” for data/schema changes.

high
Add reversible, backward-compatible migrations and a documented rollback workflow (down/reverse migrations or a strategy like expand/contract with versioned compatibility). Ensure the rollback process (or deploy job) runs the matching reverse step when rolling back a release.
- apps/dokploy/server/db/migration.ts:1-21 — Forward-only migration runner (`migrate(...)`) with no reverse/down path.
med
Strengthen rollback readiness in the rollback API/executor: add/trigger canary or explicit verification steps (e.g., wait for service health/replica readiness, run smoke checks, capture metrics/log correlation IDs) before marking rollback successful.
- apps/dokploy/server/api/routers/rollbacks.ts:1-70 — Rollback entrypoint currently triggers rollback execution and audit, but no explicit verification/canary gate is visible in this layer.
low
Ensure failed rollback attempts are handled transactionally and surfaced clearly (e.g., if `service.update`/createService fails, return a structured error and preserve rollback state for investigation).
- packages/server/src/services/rollbacks.ts:260-323 — Rollback Swarm update is wrapped in try/catch but primarily logs `console.error` and then creates a service; the higher layers don’t show a structured rollback status/verification contract.

Delivery cadence (DORA proxy) 89%

Delivery cadence appears strong. Git history shows frequent commits/merges and steady tagging. On the repo side, CI automation for building/pushing container images is triggered on pushes to main and canary, and the canary→main promotion PR flow is automated when version changes—together indicating a mature, incremental release process rather than occasional big-bang releases.

high
Add/verify an explicit CD step that deploys to a staging/preview environment on each main/canary change (or PR) rather than only building/publishing images and creating releases. Ensure this is automated and reversible (e.g., environment-per-PR or preview deployments).
- .github/workflows/dokploy.yml:1-242 — Current workflow content shown includes building/pushing images, combining manifests, and creating releases; it does not (in the observed excerpt) demonstrate an automated staging/preview deployment step.
med
Instrument the release workflow to tighten the small-batch feedback loop (e.g., build/test gates + link deploy artifacts to runs) so that the time-to-production remains consistently short, not just the commit cadence.
- .github/workflows/deploy.yml:1-109 — Workflow builds and pushes images but (in the shown excerpt) does not clearly show test gates or downstream production/staging rollout wiring.

Deploy-tooling ownership 67%

Deploy/tooling ownership (single-engineer CI/CD time-bomb risk) appears to be mitigated: git history over deploy/CI paths shows 15 distinct authors. However, the top author share is still high (0.89), so while it’s not a strict single-owner failure mode, pipeline ownership is meaningfully concentrated.

high
Reduce concentration of ownership for CI/CD workflows by explicitly assigning codeowners/review rotation for .github/workflows/* (especially deploy.yml and dokploy.yml).
- .github/workflows/dokploy.yml:1-242 — Main release/build/publish automation lives here; ownership concentration increases operational risk despite multi-author involvement overall.
med
Add lightweight workflow-level smoke checks (or reuse existing PR quality signals) that validate the deploy pipeline itself (e.g., verify required secrets/metadata are present in non-publishing mode) so more contributors can safely make changes.
- .github/workflows/deploy.yml:1-109 — Primary image build/push automation; adding guardrails makes pipeline changes more approachable for a wider set of maintainers.

T3 Exit Cleanliness

Engineering Org Resilience

No single-author critical paths: git-blame concentration, CODEOWNERS coverage, and reviewer diversity across the codebase.

19% 6/10 scored

Critical-path bus factor 0%

0/5 expected sites
Ownership clarity 0%

0/1 expected sites
Documentation density ("why") 0%

0/6 expected sites not present
Operational runbooks 0%

0/3 expected sites not present
Onboarding reproducibility 111%

4/3 expected sites
Decision history legibility 0%

0/2 expected sites not present

Critical-path bus factor 0%

The repo shows partial safeguards against critical-path bus-factor risk: there is substantial executable knowledge in deployment tests, but ownership is centralized by default in CODEOWNERS (single default owner) and there are no organizational runbooks/ADRs present (per org-artefacts scan). The deployment pipeline itself (queue worker, deploy/cancel utilities, control-plane router, GitHub webhook trigger) is critical-path; however, I could not confirm distributed co-ownership for each specific critical component from code-only evidence, so only the presence of strong deployment tests was confidently identified as a durability mechanism.

high
Add explicit multi-owner coverage for critical-path directories (deployment queue, deploy/cancel utilities, deployment router, GitHub webhook trigger) in CODEOWNERS, ensuring at least 2 humans (>=3 if possible for the most critical surfaces like deploy + incident response).
- .github/CODEOWNERS:1-3 — Default owners are centralized to a single user, which undermines critical-path co-ownership guarantees.
high
Introduce operational runbooks for the deployment pipeline (what to check when deployments fail, where to find logs, common failure modes, rollback/cancel procedures).
- apps/dokploy/__test__/deploy/application.real.test.ts:1-220 — While tests exist, operational procedures are still needed; executable knowledge alone doesn’t answer “what to do during an incident” when the system is in an unexpected state.
med
Map each critical-path code site to a corresponding test suite and requirement checklist (e.g., webhook->queue job creation->worker execution->status updates->log tailing), and ensure tests assert the most failure-prone branches and error messages.
- apps/dokploy/server/queues/deployments-queue.ts:1-97 — Worker execution logic has many branches (application/compose/preview; deploy/redeploy; status updates) that should be pinned by targeted tests.
- apps/dokploy/server/utils/deploy.ts:1-84 — Deploy/cancel utilities handle remote calls and error mapping; these branches should be covered to reduce single-author knowledge dependence.

Single-author hotspots N/A

No single-author hotspots were detected. In the last 12 months, the git-history hotspots scan returned an empty `danger_files` list (i.e., no high-churn files were simultaneously limited to ≤2 lifetime distinct authors). Therefore, there were no concrete file sites to verify via code inspection.

med

Re-run the hotspots scan for a different window (e.g., 6 months and 24 months) to confirm no emergence of new single/dual-owner gravity wells, then inspect any newly flagged danger files with `code_read` to ensure intent is captured in tests/docs.

Review diversity N/A

This primitive is about *review/merge process diversity* (i.e., whether work lands via PRs and is integrated by multiple people). In this repo, there is evidence from git-history signals that PRs exist and are merged by many humans (distinct_mergers_human=27), but there are no corresponding, codebase-locatable artifacts/config files (e.g., .github/CODEOWNERS or workflow files) that implement or enforce review diversity. Since the audit requires file+line evidence for “present in the codebase,” the primitive is treated as absent for the purposes of this report.

high

Add/verify process artifacts that make review diversity enforceable at the repo level (e.g., branch protection rules requiring PRs, CODEOWNERS to spread ownership, and required status checks). Then ensure PR merges involve multiple human integrators rather than a single gatekeeper.
med

If review diversity is already happening in practice, capture it in repo configuration so it is auditable from the codebase (e.g., required reviewers, CODEOWNERS, and CI checks).

Ownership clarity 0%

An ownership manifest exists (.github/CODEOWNERS), but it is not implemented with ownership clarity as defined: it assigns all paths to one default owner and provides no explicit per-critical-path ownership groups (and thus cannot satisfy the >=2-people requirement). No correctly-applied ownership-clarity sites were found.

high
Replace the single default CODEOWNERS entry with explicit ownership mappings for the repo’s critical path segments (at least apps/* and packages/* subtrees), and ensure each critical mapping lists 2+ people (or a team handle) so knowledge is not concentrated behind one individual.
- .github/CODEOWNERS:1-3 — Current state: only a wildcard default owner for everything, with a single handle; no per-critical-path or multi-owner mapping.

Retained vs. departed knowledge N/A

This primitive is not implemented as an artifact/mechanism in the codebase. While git-history signals indicate a non-trivial departed-authorship share (recency-based proxy), there is no code/runbook/ownership process artifact here that specifically captures “retained vs departed knowledge” (i.e., ensures critical knowledge remains after authors leave).

high
Create and maintain knowledge-capture artifacts for any critical areas with elevated departed authorship risk: add/expand ownership manifest (more than one owner per area), and add runbook/ADR-style rationale for operational and architectural decisions so knowledge is not tied to single authorship history.
- .github/CODEOWNERS:1-3 — Only one default owner is listed; without additional owners/knowledge artifacts, the project is vulnerable to knowledge concentration when that person becomes unavailable.
med
Add onboarding checklists and service setup/run instructions for critical paths that explicitly include: where the authoritative operational understanding lives, what tests/commands validate behavior, and who the current co-owners are.
- CONTRIBUTING.md:1-197 — The contributing guide contains setup/build/test expectations, but does not provide critical-path operational/runbook knowledge capture or a retained-vs-departed continuity plan.

Documentation density ("why") 0%

Across the repo’s tracked documentation artifacts, the content is primarily “how to run/configure/contribute” (setup steps, endpoints, environment variables, minimal READMEs). I did not find durable architecture/design rationale documentation that explains the system’s decisions (“why”), so the documentation density primitive does not appear to be correctly implemented anywhere in the codebase.

high
Create durable architecture/decision “why” docs (e.g., an ADR folder and at least one architecture/design overview) and link them from the root README and CONTRIBUTING. Ensure they cover major components (API, scheduler, monitoring, server setup) and explain tradeoffs, not just instructions.
- README.md:1-65 — Root README lacks architecture/design rationale and only points to external docs.
- CONTRIBUTING.md:1-120 — Contribution guide is process-focused and does not point to durable architecture/decision rationale.
high
Augment each critical service README (API, schedules, monitoring) with a short “Architecture / Why this design” section describing core design choices, constraints, and integration rationale (e.g., callback/threshold design, scheduling semantics, API structure).
- apps/api/README.md:1-9 — Only local dev/run instructions; no design rationale.
- apps/schedules/README.md:1-9 — Only local dev/run instructions; no design rationale.
- apps/monitoring/README.md:1-155 — Documents configuration/endpoints but not the rationale behind key behavioral choices.
med
Add a minimal “documentation map” (what docs exist, where to look for ‘why’, how to update them when changing architecture). Put it in README/CONTRIBUTING so new contributors learn the durable rationale locations.
- CONTRIBUTING.md:1-120 — No section describes where architecture rationale lives or how to maintain it.

Operational runbooks 0%

Operational runbooks are not present as tracked artifacts anywhere in the repository (absent "runbook" category). While the code contains critical operational workflows (deployment webhook/queueing, restore pipelines, and the Compose service layer), there are no corresponding written runbooks to guide deployment, incident response, or recovery.

high
Create runbooks for each critical service/workflow that can be incident-triggered: (1) Compose deployment webhook/queueing (include replay/retry procedures, watch-path/branch mismatch handling, and queue/job inspection), (2) Compose/database restore procedure (include rclone pipeline expectations, DB-type-specific credentials/verification, and remote-vs-local execution notes), and (3) the Compose service lifecycle (how to validate compose/service loading and diagnose remote exec/compose spec issues).
- apps/dokploy/pages/api/deploy/compose/[refreshToken].ts:1-210 — Webhook/API handler that gates auto-deploy and enqueues deployments; should be matched with an operational runbook for deploy/incident/recovery.
- packages/server/src/utils/restore/compose.ts:1-103 — Implements backup restore execution; runbook should document exact operational recovery steps.
- packages/server/src/services/compose.ts:1-220 — Compose service layer; runbook should document operational checks for compose lifecycle failures.
med
Add an ownership/coverage manifest that names runbook owners (and backup owners) for the runbook-covered workflows, and ensure those owners actually participate in maintaining the docs (to mitigate the gravity-well risk of a single person “who just knows”).
- .github/CODEOWNERS:1-3 — A CODEOWNERS file exists; use it (or extend it) to cover runbook responsibilities rather than relying on implicit knowledge.

Onboarding reproducibility 111%

Onboarding reproducibility is partially present: there is real written onboarding material (CONTRIBUTING.md) that includes a runnable setup entrypoint and the commands to reach a local dev server, and the referenced setup script exists in code. However, the 'clean clone to productive' flow is not purely one-command (it requires at least `dokploy:setup`, plus additional commands like `server:script` and `dokploy:dev`), so ramp-up may still rely on knowing which follow-up commands/options matter most.

high
Add a single canonical 'from clean clone to productive' command in the onboarding docs (e.g., `pnpm run dokploy:up` that internally runs setup + migrations + starts the dev server, or clearly document that the experience is inherently multi-step and why).
- CONTRIBUTING.md:103-123 — Docs require multiple commands after `pnpm run dokploy:setup` (`server:script`, then `dokploy:dev`), so it’s not strictly one-command reproducible.
med
Link the docs to the exact setup semantics (what the setup script does/doesn’t cover). For example, clarify in CONTRIBUTING.md which parts are Docker swarm/network/traefik/Redis/Postgres initialization vs. which parts are app boot/migrations, so new engineers don’t need a person to infer gaps.
- apps/dokploy/setup.ts:1-40 — The setup script performs infrastructure initialization and pulls traefik, but the docs still call extra commands afterwards—clarifying the division of responsibility will improve reproducibility.
low
Add a short 'known-good' verification checklist to onboarding (e.g., expected log lines, health checks, or a smoke test endpoint) to make the doc path objectively verifiable without narration.
- CONTRIBUTING.md:108-123 — Docs provide the access URL but not objective verification signals beyond 'go to localhost:3000'.

Tests as executable knowledge N/A

The codebase has a substantial and meaningful test suite under `apps/dokploy/__test__`, and tests act as executable knowledge: they include detailed assertions about key business logic (environment variable resolution, template processing including secret/JWT generation, and deployment workflow behavior). Based on the sampled files read, test intent is captured in runnable form rather than only smoke-level checks.

med
Pick the highest-business-risk flows (e.g., deployment/template processing endpoints) and ensure each has at least one focused regression test with clear “inputs → expected outputs” assertions (similar to the env/template suites) plus one integration-style test (like the deploy real test) that covers the workflow wiring.
- apps/dokploy/__test__/env/environment.test.ts:1-120 — Shows the desirable pattern for executable knowledge: precise expectations for correctness and edge cases.
- apps/dokploy/__test__/templates/config.template.test.ts:1-220 — Shows the desirable pattern for executable knowledge: multiple correctness dimensions asserted (structure + key values).
low
For any remaining high-level ‘real’ tests (that rely on Docker/filesystem/exec), standardize naming and comments to document what is intentionally mocked vs. executed for real, to keep the executable knowledge durable over time.
- apps/dokploy/__test__/deploy/application.real.test.ts:1-120 — This file already documents what is mocked vs executed for real; extending this convention helps preserve test intent.

Decision history legibility 0%

The repo shows a convention for commit message formatting (CONTRIBUTING), but durable decision records (ADRs) are absent, and the key setup/infrastructure code paths do not include recoverable decision rationale in a way that would be reliably reconstructible after departure. Therefore, this primitive is treated as not genuinely and correctly applied anywhere concrete in this codebase.

high
Create ADRs (or equivalent durable decision records) for the major infrastructure/setup decisions: (1) why swarm/network are initialized with the specific address/network settings, (2) why the setup order in apps/dokploy/setup.ts is the chosen dependency order.
- apps/dokploy/setup.ts:1-40 — Orchestration sequence should be backed by explicit decision rationale so changes can be safely made without relying on author memory.
- packages/server/src/setup/setup.ts:1-48 — Hard-coded addresses/names/drivers and idempotency strategy are decision points that need durable explanation.
med
Ensure commit history consistently carries WHY (not only WHAT) for setup/infrastructure changes: require explanatory bodies for PR commits touching docker/swarm/network/traefik/dependencies.
- apps/dokploy/setup.ts:1-40 — This file is the likely target for future infrastructure changes; without strong decision-history legibility, edits become risky.
low
Add brief in-code rationale comments at the decision points (e.g., address choice, network driver choice, idempotency approach) as a fallback to complement history/ADRs.
- packages/server/src/setup/setup.ts:1-48 — Currently, the functions implement behavior without capturing the reasoning behind the choices.

Not applicable to this codebase: Single-author hotspots, Review diversity, Retained vs. departed knowledge, Tests as executable knowledge.

IP & OSS License Hygiene

An SBOM in CI, no AGPL/GPLv3 in the dependency tree, CVEs triaged by severity, and no outside-contributor commits without IP assignment.

27% 11/12 scored

Software bill of materials 0%

0/3 expected sites not present
License compliance 17%

1/2 expected sites
Known-vulnerability scan 0%

0/2 expected sites not present
Known-exploited CVEs 0%

0/2 expected sites
Dependency usage & reachability 50%

1/2 expected sites
Dependency freshness 22%

2/3 expected sites
Upstream maintenance 0%

0/3 expected sites not present
Remediation velocity 0%

0/3 expected sites not present
Supply-chain integrity 108%

5/4 expected sites
Dependency-confusion resistance 100%

4/3 expected sites
IP ownership / provenance 0%

0/2 expected sites not present

Software bill of materials 0%

I did not find any SBOM generation practice wired into the codebase (no syft/cyclonedx/spdx-style generation script referenced in package.json, and no SBOM-related filenames like cyclonedx/syft/spdx were detectable in the code graph). The repo does have committed dependency manifests/lockfiles (pnpm-lock.yaml and Go go.mod), but the primitive (producing and publishing an SBOM as a release artifact) appears absent.

high
Add an SBOM generation script and wire it into CI/release. For example: generate CycloneDX or SPDX using syft (or equivalent) for the pnpm workspace and the Go module(s), then publish the resulting SBOM artifact (e.g., sbom.json / sbom.spdx.json) per release.
- package.json:1-79 — No SBOM generation script or release hook exists in the root workspace scripts.
high
Ensure the SBOM is based on the exact pinned dependency graphs: pnpm-lock.yaml for the npm ecosystem and the Go module graph for apps/monitoring. Validate that the SBOM covers direct + transitive dependencies and matches the resolved dependency inventory from lockfiles.
- pnpm-lock.yaml:1-40 — Committed pnpm lockfile enables deterministic transitive inventory for SBOM generation.
- apps/monitoring/go.mod:1-35 — Go module declares direct/indirect requirements; SBOM generation should include Go transitive deps.
med
Add a CI check that fails the build if SBOM artifact generation is missing or empty, and (optionally) compare SBOM contents against the repository’s lockfile-based inventory to prevent drift.
- package.json:1-79 — There is no existing SBOM validation/generation step to hook into CI.

License compliance 17%

License compliance is partially present (lockfiles exist for both npm and Go), but the dependency license scan found a strong-copyleft risk: node-forge@1.3.3 is flagged as 'BSD-3-Clause OR GPL-2.0' (strong-copyleft). This can change deal terms for a proprietary SaaS and requires isolation/replacement and confirmation that attribution/NOTICE obligations are met. Additionally, repository-level LICENSE/NOTICE files were not found at the repo root in this audit run (evidence gap).

high
Replace or remove node-forge@1.3.3 with an alternative dependency that has a single permissive license (or obtain/record a clear license grant/exception from node-forge and document it). Re-run the license scan to confirm the strong-copyleft tier disappears.
- pnpm-lock.yaml:1-16685 — Strong-copyleft flagged dependency: node-forge@1.3.3 detected as 'BSD-3-Clause OR GPL-2.0' (tier: strong-copyleft).
high
Verify and add/restore release attribution artifacts: ensure a NOTICE file and/or a complete third-party licenses bundle exists (and is current with lockfile changes). This should cover all transitive dependencies, including any that require attribution.
- pnpm-lock.yaml:1-16685 — Attribution/NOTICE obligations are required for shipping proprietary SaaS using third-party components; no root LICENSE/NOTICE files were available in this run (evidence gap).
med
Document the license compliance process in CI/release (e.g., fail builds on strong/network copyleft tiers; generate an SBOM + third-party licenses report during release).
- pnpm-lock.yaml:1-16685 — Lockfile is present, but enforcement/documentation evidence was not found in this run; add CI gates around the same license scan logic.

Known-vulnerability scan 0%

I did not find any wiring that actually performs a known-vulnerability scan as part of the repo’s automation. The root package.json lacks a vulnerability-scan script/command hook, and (separately) the dependency lockfiles contain a large number of HIGH/CRITICAL OSV findings according to osv-scanner—meaning that a scan would be meaningful, but the repo does not appear to apply the primitive.

high
Add a CI job (e.g., GitHub Actions) that runs a lockfile-based vulnerability scan (osv-scanner/osv) over all relevant manifests/lockfiles (pnpm-lock.yaml and apps/monitoring/go.mod), and fails the build when there are un-triaged HIGH/CRITICAL findings (plus produce a SARIF/artifact report).
- package.json:1-79 — No vulnerability-scan script exists to invoke from CI.
- pnpm-lock.yaml:1-40 — pnpm lockfile is present and is the correct scan input for dependency-pinned CVEs.
med
Create/standardize a dedicated script (e.g., `pnpm run security:vuln-scan`) that executes osv-scanner against the repo lockfiles, and document the triage workflow for each HIGH/CRITICAL finding (remediate vs. exception with justification).
- package.json:1-79 — Root scripts are currently focused on build/test/lint only; add a dedicated security script to ensure consistent execution.

Known-exploited CVEs 0%

The 'known-exploited CVEs' hygiene primitive is applicable and was effectively executed at scan time: osv-scanner’s known-exploited detector returned known_exploited_count=0. However, I did not find a concrete in-repo implementation of this primitive (e.g., a CI step/config) to cite—only the off-graph scan results and the presence of lockfiles/manifests.

high
Add/confirm a CI gate that runs osv-scanner (or equivalent) in the known-exploited mode on every PR and fails the build if known_exploited=true findings appear; ensure it covers pnpm-lock.yaml and Go modules.
- pnpm-lock.yaml:1-120 — This lockfile is the required anchor for the known-exploited check; wire a CI job to scan it.
- apps/monitoring/go.mod:1-35 — The Go manifest is another required dependency anchor; wire it into the same known-exploited scan gate.
- osv_dep_scan(mode=vulns):n/a — Scan result indicates known_exploited_count=0, but without a cited CI implementation, this is not guaranteed to be enforced over time.

Dependency usage & reachability 50%

The codebase does show correct, concrete on-graph reachability for at least one major dependency surface: drizzle-orm is imported and directly used to construct the application DB handle (packages/server/src/db/index.ts). Beyond that, the remaining dependency usage/reachability checks for unused/phantom deps and call-site reachability could not be fully enumerated within this run because the call-site API (receiver→callee resolution) queries returned no results for specific receivers (likely due to graph modeling/receiver-binding behavior), so the audit coverage is partial rather than a clean, comprehensive mapping.

high
Extend reachability validation beyond drizzle-orm by running call-site-based queries per frequently imported external library (e.g., axios, hono, vitest, protobufjs) and then code_read the highest-importance call sites to confirm vulnerable-function reachability (not just import presence).
- packages/server/src/db/index.ts:1-41 — Positive example of correct reachability mapping; use it as the template for the next libraries.
med
Create an explicit “declared-but-never-imported” and “imported-but-undeclared/phantom” review list by diffing manifest deps (pnpm lock/importers + go.mod) against virgil_query raw_import for each external package family, then confirm removals/manifest corrections with code_read of the relevant module boundaries.
- apps/monitoring/go.mod:1-21 — Go manifest declares fiber and other modules; ensure virgil_query raw_import shows actual imports in the monitoring app code paths (currently only reachability for TS/ORM was confirmed).
- pnpm-lock.yaml:1-60 — pnpm lock declares app/API dependencies; reachability diff requires comparing these declared deps to raw_import usage in source.

Dependency freshness 22%

Dependency freshness is only partially implemented: the repo commits lockfiles/go.mod with pinned versions (good for determinism), but it lacks evidence of an active dependency-update mechanism (update bot not configured) and OSV findings show critical/high vulnerabilities tied to pinned versions (notably Fiber v2.52.6 in apps/monitoring/go.mod). Overall, freshness hygiene exists as “pinning,” but not as “ongoing remediation,” so it’s weak.

high
Upgrade the pinned runtime dependency github.com/gofiber/fiber/v2 from v2.52.6 to a fixed version (per OSV/GHSA advisories), and re-lock. Start with Fiber because the vuln scan flags CRITICAL issues on the exact pinned version.
- apps/monitoring/go.mod:1-10 — fiber/v2 is pinned to v2.52.6; OSV vuln scan reports CRITICAL/HIGH findings for this exact version.
high
Enable and operationalize an automated dependency update mechanism (dependabot/renovate) and ensure update PRs actually merge (remediation velocity).
- git history (tool output):N/A — git_dep_provenance shows update_bot_configured=false (no bot configured). Without a mechanism, lockfile freshness tends to decay.
med
Add/verify CI steps that generate and publish SBOM/CVE/License reports on each release and (ideally) on PRs, so freshness is continuously measured—not just pinned.
- pnpm-lock.yaml:1-30 — pnpm lockfile is present, but CI freshness/SBOM generation could not be confirmed from the provided evidence set; introduce explicit CI gates for freshness.

Upstream maintenance 0%

Upstream-maintenance hygiene (i.e., actively detecting and replacing deprecated/abandoned upstream dependencies) is not evidenced anywhere in the codebase in a concrete way. While dependencies are pinned via go.mod and pnpm-lock.yaml files, there is no demonstrable upstream-maintenance control/verification wired into these dependency sources.

high
Add an upstream-maintenance gate to CI that fails the build (or opens an automated PR) when dependencies are deprecated/abandoned upstream or no longer maintained. Concretely, run an OSV/deprecation/abandonment check over the resolved lockfile(s) (Go + pnpm) and block merges until replacements are proposed.
- apps/monitoring/go.mod:1-35 — Go dependency pins should be covered by the upstream-maintenance gate.
- pnpm-lock.yaml:1-60 — Root pnpm lockfile should be scanned for deprecated/abandoned upstream dependencies.
- packages/server/src/emails/pnpm-lock.yaml:1-80 — Nested pnpm lockfile should also be scanned and gated.
med
Ensure remediation velocity is actionable: enable and configure Renovate/Dependabot (or equivalent) and require successful lockfile-upgrade PRs for any deprecated/abandoned upstream hits.
- N/A (tooling evidence):N/A — Off-graph signal indicates update-bot configuration is not present (update_bot_configured=false).

Remediation velocity 0%

Remediation velocity is not clearly implemented as an automated dependency-update mechanism in this codebase: automated bot configuration evidence is missing (no dependabot/renovate config files found in the repo), so the primitive cannot be verified as an operating mechanism even though the repository appears to have dependency-update activity in git history.

high
Add/enable an automated dependency-update bot (Dependabot and/or Renovate) with CI/workflow configuration, and ensure it is actually triggered (config present in-repo).
- package.json:1-79 — Current root configuration does not include any bot/workflow mechanism for dependency updates.
high
Verify end-to-end velocity: ensure dependency-update PRs are created and merged regularly (especially within the last 90 days), then confirm the number of merged dependency-update commits remains non-zero.
- package.json:1-79 — No evidence of an update-automation pipeline is present in repository configuration files read so far.
med
Ensure CI includes a dependency scanning/SBOM generation step to keep upgrade blast-radius and CVE backlog remediation timely (precondition for meaningful velocity).
- package.json:1-79 — No CI/SBOM/scan steps are defined in package.json; rely on CI workflows to implement this.

Supply-chain integrity 108%

Supply-chain integrity is present: the JS ecosystem uses committed pnpm lockfiles with explicit integrity hashes (`resolution.integrity`), and the Go monitoring service uses pinned go.mod versions backed by go.sum content hashes. Overall quality is good for lockfile integrity verification, but grading is not “perfect” likely because this codebase appears to use multiple lockfiles (root + nested) and the audit did not confirm a single unified CI enforcement point in this pass.

high
Confirm CI/build uses the committed lockfiles for installation (e.g., `pnpm install --frozen-lockfile` for both root and nested lockfile scopes, and `go mod download` with go.sum verification).
- pnpm-lock.yaml:1-60 — Evidence of committed pinning/integrity-capable lockfile; ensure CI actually enforces frozen usage.
med
For multi-lockfile setups (root pnpm + nested pnpm under packages/server/src/emails), document and standardize which workflows/commands target which lockfile to avoid drift or accidental installs that bypass one lockfile.
- packages/server/src/emails/pnpm-lock.yaml:1-120 — Evidence that this scope has its own integrity-hash lockfile; governance is needed so installs use it consistently.

Dependency-confusion resistance 100%

Dependency-confusion resistance appears implemented primarily via committed, pinned lockfiles (root pnpm-lock.yaml and a package-specific pnpm-lock.yaml for server emails) plus explicit, fully-qualified Go module dependencies in apps/monitoring/go.mod. I did not find evidence of unscoped private names/typo-squatted package specs in the lockfile sections reviewed, and the workspace dependency is correctly treated as a local link rather than a registry package.

high
Also read each package.json (and any .npmrc/pnpmrc specifying registries) for unscoped private dependencies, typo-similar names, or floating version ranges (e.g., ^/*) that could allow resolution drift beyond the lockfile’s guarantees.
- pnpm-lock.yaml:1-120 — Lockfile pinning is present, but the primitive’s full verification requires confirming the corresponding manifests (package.json) don’t contain unscoped/private or ambiguous dependency specs.

IP ownership / provenance 0%

I did not find any explicit contributor IP assignment/provenance mechanism (e.g., CLA/contributor agreement) in the repository documentation. The Contributing Guide provides contribution workflow guidance but does not include any CLA/assignment terms, so the IP ownership / provenance primitive is not demonstrably implemented in a durable way in this codebase.

high
Add/enable a concrete contributor IP assignment mechanism (e.g., CLA assistant or DCO + explicit IP license/assignment), and document it in CONTRIBUTING.md (requirements, how to sign, enforcement on PRs).
- CONTRIBUTING.md:1-197 — No CLA/contributor IP assignment process is described in the contribution guidance.
high
Create a dedicated legal artifact (e.g., CLA.md / Contributor Agreement) referenced from CONTRIBUTING.md, including terms for inbound IP assignment/license and how exceptions are handled.
- CONTRIBUTING.md:1-197 — Currently contains workflow/setup/build guidance but no links or statements about contributor agreement/IP assignment.
med
Add a short README section that points contributors to the CLA/contributor agreement page (or a link to it from the contributing section).
- README.md:1-65 — README does not include any pointer to contributor IP assignment/provenance terms.

AI-coding-tool provenance N/A

AI features exist in the codebase (e.g., AI provider selection and UI wiring), but there is no evidence that AI-coding provenance is tracked or that AI-generated code is labeled/attributed via a documented convention (no AI-usage/provenance policy or generated-code markers observed). Per this primitive’s rubric, this should be treated as N/A for actionable site matching because the required provenance-tracking machinery is not present.

high
Add an explicit AI-coding provenance policy and convention (e.g., required PR description and/or file header/trailer for AI-generated/assisted code; include how to record prompts, model/provider, and review sign-off).
- CONTRIBUTING.md:1-197 — No AI-coding provenance requirements exist today in the contribution guidance.
med
Introduce repository-level generated-code/provenance markers (examples: standardized comment header for files or block-level markers indicating AI assistance; optionally enforce via lint/CI checks).
- packages/server/src/utils/ai/select-ai-provider.ts:1-160 — AI-related code has no provenance markers; adding a convention would make such code auditable.

Not applicable to this codebase: AI-coding-tool provenance.

Implementation & Customization

Configuration over per-customer branches: no "if customer_id == 12345", no pricing literals scattered outside the billing module.

76% 8/10 scored

Configuration over code branches 100%

3/3 expected sites
Centralized pricing/plan logic 33%

1/3 expected sites
Metering decoupled from pricing model 0%

0/4 expected sites not present
Feature gating via flags, not forks 100%

6/6 expected sites
Customization isolation & upgrade safety 100%

5/4 expected sites
Theming / white-label as config 100%

7/7 expected sites
Tenant-configurable behavior surface 100%

3/3 expected sites
Onboarding-by-configuration cost 75%

4/4 expected sites

Configuration over code branches 100%

This codebase applies the “configuration over code branches” primitive for at least one concrete variation surface: whitelabeling/branding. Branding differences (meta title, favicon, and custom CSS) are stored in a structured whitelabelingConfig JSON configuration, served via API endpoints, and injected into the client via a provider component—supporting different instances/customizations without creating divergent code paths.

high
Audit other variation surfaces beyond whitelabeling (e.g., billing/entitlements, feature availability, deployment templates) and refactor any remaining plan/customer-specific behavior to be driven from configuration/state models similar to whitelabelingConfig.
- apps/dokploy/components/proprietary/whitelabeling/whitelabeling-provider.tsx:1-32 — Good reference implementation for config-driven customization; use it as a pattern for other tenant/instance variation.
med
If more customization knobs are expected, extend the existing webServerSettings.whitelabelingConfig JSON schema (and its TRPC input/output validation) rather than introducing new UI branches.
- packages/server/src/db/schema/web-server-settings.ts:1-245 — Schema-centralization is what keeps the customization dimension from causing code divergence.

No hardcoded customer branching N/A

No hardcoded customer/tenant/org/account ID branching (e.g., `if customerId === 123`) was found. Where tenant identity is used, it is for data scoping/authorization via variable values from session/context (e.g., `orgId`, `ctx.session.activeOrganizationId`), which is the correct approach.

low
Keep validating future changes by searching for direct comparisons of identity fields against literals (e.g., `customerId === <number|string>`, `tenantId === '<literal>'`, `orgId === '<literal>'`) in business-logic/business-layer code.
- packages/server/src/services/deployment.ts:22-60 — Current pattern demonstrates the desired approach: variable-based identity scoping rather than literal ID branching.

Centralized pricing/plan logic 33%

A centralized pricing/plan module exists at `apps/dokploy/server/utils/stripe.ts` (tier definitions + `getStripeItems`). However, pricing/plan rules are still duplicated elsewhere: the billing UI re-implements price calculations, and the Stripe webhook re-encodes Startup included-server rules with local constants. The router’s checkout flow is correctly wired to the centralized module, but overall pricing logic is not fully centralized end-to-end.

high
Remove/replace the duplicated client-side pricing math in `show-billing.tsx` with calls to the centralized pricing module (or expose a shared “pricing preview” helper from `server/utils/stripe.ts` and reuse it on the client).
- apps/dokploy/components/dashboard/settings/billing/show-billing.tsx:1-70 — Contains duplicated pricing rules (`calculatePrice*`, `STARTUP_SERVERS_INCLUDED`) that should instead be sourced from the centralized module.
high
Update `pages/api/stripe/webhook.ts` to compute included server quantity using the centralized tier/price mapping (e.g., reuse Startup base price IDs and included-server quantity from `server/utils/stripe.ts`).
- apps/dokploy/pages/api/stripe/webhook.ts:1-90 — Defines local Startup price ID list and `STARTUP_SERVERS_INCLUDED = 3`, then uses them to derive `serversQuantity`.
med
Ensure all tier identification/detection paths rely on centralized constants (expand the set of exported constants/rules from `server/utils/stripe.ts` as needed, and remove any local tier-identification logic elsewhere).
- apps/dokploy/server/api/routers/stripe.ts:1-140 — Tier detection in `getCurrentPlan` is a core pricing/plan rule surface; it should remain centralized and consistent with webhook/UI expectations.

Metering decoupled from pricing model 0%

The codebase does not implement a metering layer that captures usage generically and maps it to charges in a separate billing layer. Instead, Stripe subscription price IDs and webhook events are used to compute entitlement quantities (serversQuantity) and immediately drive core behavior (server activation/inactivation and user entitlement fields). This is indicative of pricing/plan mechanics being coupled to core product logic rather than decoupled via a generic metering subsystem.

high
Introduce a generic metering/usage capture component (e.g., record server usage events or current usage counters) that writes usage records without any Stripe pricing knowledge. Then implement a billing/mapping component that converts usage meters to charges/entitlements, and finally have core apply entitlements from that billing result.
- apps/dokploy/pages/api/stripe/webhook.ts:14-49 — getSubscriptionServersQuantity currently maps Stripe price IDs to entitlement quantity; replace this with a usage-meter ingestion path + separate mapping.
- apps/dokploy/pages/api/stripe/webhook.ts:275-346 — updateServersBasedOnQuantity drives core server activation from billing-derived quantity; refactor so core consumes entitlements produced by a billing mapping layer, not Stripe-derived computations.
med
Move plan determination away from inline Stripe price-id checks in core/routers. Instead, compute entitlements centrally (from billing mapping) and expose a stable entitlement interface (e.g., maxServers) to the rest of the app.
- apps/dokploy/server/api/routers/stripe.ts:1-120 — getCurrentPlan inspects Stripe subscription price IDs and returns 'startup'/'hobby'/'legacy' for feature gating; replace with entitlement retrieval.
low
Add tests around the separation boundary: (a) metering correctness independent of Stripe, (b) billing mapping correctness independent of core activation logic, and (c) core entitlement application behavior independent of Stripe pricing structures.
- apps/dokploy/pages/api/stripe/webhook.ts:160-205 — Webhook currently performs multiple responsibilities (Stripe retrieval, quantity computation, entitlement updates, server activation). Tests should enforce a cleaner separation.

Feature gating via flags, not forks 100%

The codebase has a strong, centralized entitlement-gating approach for enterprise features: a reusable `EnterpriseFeatureGate` component for UI and an `enterpriseProcedure`/router checks for backend enforcement. Enterprise-locked pages and modules (whitelabeling, SSO, audit logs) consistently use the gate rather than introducing forked per-plan/per-customer logic.

med
For proprietary routers, prefer consistently using `enterpriseProcedure` (or a single shared server-side guard) where feasible, to reduce duplicated `hasValidLicense(...)` checks (e.g., compare `audit-log.ts` with the `enterpriseProcedure` pattern).
- apps/dokploy/server/api/routers/proprietary/audit-log.ts:1-68 — Router performs its own `hasValidLicense` check instead of relying solely on `enterpriseProcedure`—functional but slightly less uniform.
- apps/dokploy/server/api/trpc.ts:160-226 — Defines the centralized server-side entitlement guard (`enterpriseProcedure`) that could standardize usage across proprietary routers.

Documented extension interface N/A

No documented extension/plugin interface (a stable, versioned contract for customer/partner extension isolated from core) was found. The codebase instead handles variation (providers/source types and webhook behavior) via core conditional logic and hardcoded component registrations, which implies extension requires code changes rather than config-driven plugin registration.

high
Introduce a documented extension contract (interfaces + registration mechanism) for deploy/webhook handling and provider implementations, so new providers or webhook behaviors can be added by registering an implementation instead of editing core `if/else` logic.
- apps/dokploy/pages/api/deploy/[refreshToken].ts:70-170 — Core webhook/deploy flow branches on `sourceType` and `provider` directly; this is the primary seam where a plugin contract would be expected.
med
Refactor the provider UI (and any provider-specific server logic) to consume a registry of provider definitions/components rather than hardcoded imports + union types.
- apps/dokploy/components/dashboard/application/general/generic/show.tsx:1-80 — The UI is hardwired to a fixed provider set via `TabState` and explicit imports; this should become registry-driven to enable extensions without core edits.

Customization isolation & upgrade safety 100%

This codebase implements customization isolation for whitelabeling in a largely upgrade-safe, config-driven way. Whitelabeling is centralized behind a dedicated API router and applied through a provider that injects branding/meta/CSS from persisted configuration, rather than forking core UI logic per customer.

high
Add/verify sanitization and safety controls around customCss since it is injected via dangerouslySetInnerHTML. Ensure the contract clearly defines what CSS is allowed so upgrades and security posture remain consistent across versions.
- apps/dokploy/components/proprietary/whitelabeling/whitelabeling-provider.tsx:1-32 — Uses dangerouslySetInnerHTML to inject config.customCss into a <style> tag, making CSS injection part of the customization contract.
med
Document the whitelabelingConfig schema as a stable, versioned contract (fields, expected formats, backward-compat rules). This reduces the chance that future core upgrades break older saved customer configurations.
- apps/dokploy/components/proprietary/whitelabeling/whitelabeling-settings.tsx:1-260 — The form schema defines the configuration model (appName/appDescription/logoUrl/customCss/etc.), but the project-level stability/versioning rules aren’t shown in the audited slices.
low
If whitelabeling is intended to be tenant-scoped (multiple tenants), confirm the persistence layer (getWebServerSettings/updateWebServerSettings) truly scopes config per tenant rather than using a single global value.
- apps/dokploy/server/api/routers/proprietary/whitelabeling.ts:1-107 — Router delegates to getWebServerSettings/updateWebServerSettings; the audited code shows the API boundary, but not tenant scoping semantics.

Theming / white-label as config 100%

The codebase implements white-label theming as a persisted, data-driven configuration (whitelabelingConfig) with TRPC endpoints and React hooks. Branding is applied via a dedicated WhitelabelingProvider that injects runtime CSS and document metadata, and key public/error auth surfaces read config via useWhitelabelingPublic(). This supports onboarding new partners by updating configuration rather than forking builds.

low
Consider adding a small set of automated checks/tests to ensure all required themable fields (metaTitle, faviconUrl, customCss, loginLogoUrl, errorPageTitle/Description, footerText) are consistently read on each branded surface after UI refactors.
- apps/dokploy/components/proprietary/whitelabeling/whitelabeling-provider.tsx:1-32 — Central provider applies document-level title/favicon and customCss; tests can validate these bindings remain intact.

Tenant-configurable behavior surface 100%

Tenant-configurable behavior surface exists and is implemented as a configuration model for whitelabeling/branding. The system provides an owner-gated mutation to update persisted `whitelabelingConfig`, a public read to expose branding fields to unauthenticated pages, and UI components that render onboarding branding from that configuration.

high
Audit other customer-requested variation areas (e.g., workspace limits, feature rules, workflow fields) for the same pattern: a persisted settings/rules model + centralized update/read APIs + consumption points in UI/business logic. Whitelabeling is present; verify whether other behavior requests are still hardcoded in code paths.
- apps/dokploy/server/api/routers/proprietary/whitelabeling.ts:1-107 — Shows the desired pattern for config-driven behavior; use this as the benchmark while checking other variation features.
med
Confirm that non-public read paths and all rendering entrypoints consistently use the configuration model (avoid any lingering hardcoded defaults that require code edits for further customization).
- apps/dokploy/components/layouts/onboarding-layout.tsx:1-83 — Demonstrates consumption of config with defaults; ensure other pages follow the same approach.

Onboarding-by-configuration cost 75%

The codebase supports low-touch onboarding by relying on generic, self-serve flows (register + invitation acceptance) and tenant provisioning via data-layer mutations (organization creation inserts DB records). Additionally, onboarding UI branding is driven by whitelabeling configuration rather than per-customer code forks. Overall, this aligns with “onboarding-by-configuration cost” as new customers/orgs appear to be onboarded by creating/updating data, not editing or forking code.

high
Add/confirm a documented, self-serve onboarding runbook that explicitly states: (1) how a new org/tenant is created (API/DB provisioning flow), (2) how invitations are issued and accepted, and (3) what whitelabeling configuration options are required for brand onboarding—so onboarding is operational/config, not engineering.
- apps/dokploy/server/api/routers/organization.ts:1-79 — Shows the tenant/org provisioning mechanism should be the basis of the runbook.
- apps/dokploy/pages/invitation.tsx:60-140 — Shows the invite-based onboarding mechanism that should be documented.
med
Verify and centralize the whitelabeling configuration source-of-truth (used by onboarding UI) and ensure it supports all onboarding-relevant brand fields without requiring code changes.
- apps/dokploy/components/layouts/onboarding-layout.tsx:1-27 — Onboarding UI reads whitelabeling config for app name/description/logo—expand/document these fields to prevent future per-customer code edits.

Not applicable to this codebase: No hardcoded customer branching, Documented extension interface.

Procurement Code Readiness

Data-export and data-subject erase/export endpoints, region pinning, and DPA-mapped controls that survive enterprise procurement.

0% 7/10 scored

Self-serve trust documentation 0%

0/1 expected sites
Controls-to-contract mapping 0%

0/1 expected sites not present
Data export mechanism 0%

0/4 expected sites not present
Deletion / erase-on-request 0%

0/3 expected sites not present
Data residency commitment 0%

0/3 expected sites not present
Enterprise access controls 0%

0/2 expected sites not present
Sub-processor transparency 0%

0/3 expected sites not present

Self-serve trust documentation 0%

A single committed doc exists (SECURITY.md), but it is limited to vulnerability disclosure expectations. The repository does not provide a self-serve trust documentation set suitable for procurement deal-closing (certifications/attestations, DPA/contract commitments, versioned sub-processor transparency artifact, pen-test summaries, and operational control/status evidence are not packaged in the trust docs).

high
Create or expand a prospect-facing trust-center doc set (e.g., docs/trust/ + a trust landing page) that self-serves the standard procurement artifacts: current SOC 2/ISO attestations (with version/date), DPA/contract commitments (or the current DPA/terms), a maintained versioned sub-processor list that prospect reviewers can reconcile to the service’s actual integrations, pen-test/security assessment summaries, and control/status evidence (and how it is kept current).
- SECURITY.md:1-29 — Current content covers vulnerability reporting but does not package the required trust artifacts for self-serve procurement diligence.
med
Add a dedicated maintained sub-processor transparency document (in docs/trust or similar) that is explicitly prospect-consumable (not just code templates) and includes versioning/date, and ensure the entries match the third-party integrations actually used.
- packages/server/src/templates/processors.ts:1-20 — A ‘processors’ template file is present, but it is implementation code and not a prospect-consumable, maintained sub-processor inventory doc; a real trust-list doc should exist alongside this.

Questionnaire response library N/A

No questionnaire response library (CAIQ/SIG/VSA response bank) is present in the repository. Per the primitive’s definition, this is a DATA-ROOM artifact; its absence here is expected. Request the current, versioned questionnaire response set from the seller for procurement diligence.

high
Ask the seller’s GC / R&W underwriter for the current, versioned security questionnaire response library (e.g., CAIQ/SIG/VSA) mapped to the relevant frameworks/controls and aligned to the system versions in production.
- : — git_artifact_scan: `questionnaire` category count=0 (absent). This is a DATA-ROOM follow-up; do not treat as a code gap.

Controls-to-contract mapping 0%

The controls-to-contract mapping primitive is not packaged in this codebase. The only relevant doc-adjacent security artifact found (SECURITY.md) does not include any DPA/MSA mapping of commitments to implemented controls and audit evidence. No DPA/MSA/legal mapping artifact was found, so there is nothing to grade for deal-closing traceability.

high
Create/locate the seller’s DPA/MSA controls-to-contract mapping document (controls mapping table) that explicitly maps each DPA commitment (e.g., encryption, retention, breach notice, data residency) to (a) the implemented system mechanism(s) and (b) the audit evidence artifact(s) (e.g., SOC 2 Type II control tests, configuration evidence, logs/reports). Ensure it is versioned and cross-references any code-visible enforcement points.
- SECURITY.md:1-29 — Current doc content does not contain the required mapping/traceability statements; serves as evidence of what is currently packaged.
high
Add a packaged traceability section to the trust/security documentation set that names the DPA/MSA commitments and points reviewers to the controls mapping document and its evidence sources (e.g., SOC 2 report version, audit evidence index).
- SECURITY.md:1-29 — Shows the repository’s only trust/security doc and indicates where the mapping should be integrated or linked.

Data export mechanism 0%

No complete tenant-scoped 'export all my data on request' mechanism was found in the codebase. The only 'download/export' behaviors observed are partial: feature/user downloads (2FA backup codes), view-specific downloads (Docker logs), and backup/restore tooling for specific artifacts (volume backups, web-server backup jobs) rather than a packaged, tenant-scoped export endpoint/job covering ALL tenant data in a portable format.

high
Add a tenant-scoped 'export all data' request flow: (1) a protected API/TRPC endpoint that initiates an async export job for the tenant; (2) job execution that gathers all tenant-owned data across products/modules; (3) a portable output format (e.g., tenant data JSON/CSV + media bundles) packaged into a downloadable archive; (4) completion status + secure download link; (5) explicit pagination/streaming and size limits.
- apps/dokploy/components/dashboard/docker/logs/docker-logs-id.tsx:1-240 — Existing downloads are view-specific (logs only), demonstrating the gap vs. 'all tenant data' packaging.
high
Ensure tenant scoping and permission checks are explicit in the export job scheduler and data retrieval layer (e.g., require tenantId/orgId context and enforce membership).
- packages/server/src/utils/volume-backups/restore.ts:1-127 — Existing download logic is keyed by specific resource identifiers (volumeName/backupFileName), not by tenant-wide export scope.
med
Wire the export mechanism into the UI as a single 'Download my data' action that triggers the tenant export job rather than providing multiple partial feature downloads.
- apps/dokploy/components/dashboard/settings/profile/enable-2fa.tsx:200-330 — Current UX supports partial downloads (2FA backup codes) rather than the consolidated tenant export required by procurement portability.

Deletion / erase-on-request 0%

The codebase contains many generic “delete/remove” operations (e.g., deleting projects and backup records), but there is no verifiable, tenant/subject-scoped erase-on-request implementation that demonstrably cascades through backups/derived data and ties the deletion to an auditable erase request. The primary evidence found is consistent with row deletions rather than an erase-on-request primitive.

high
Implement a dedicated, customer/data-subject-initiated erase workflow (e.g., `eraseOnRequest(subjectId|tenantId, requestId)`) that (1) validates authorization, (2) determines all data domains/derived stores/backups to remove for that tenant/subject, (3) executes a verified cascade (including backup/object-store cleanup), and (4) records an auditable, end-to-end deletion status report keyed by the erase request ID.
- packages/server/src/services/project.ts:77-90 — Current deletion for projects is a direct DB delete without code-visible cascade-to-backups/derived evidence in this function.
- apps/dokploy/server/api/routers/project.ts:679-713 — API calls `deleteProject()` for erase intent, but shown evidence points to primary-row deletion rather than erase-on-request cascade.
high
Augment deletion handlers to explicitly remove associated backup artifacts in storage (or create a job that does so) and link that action to the erase request/audit log; do not rely on deleting only DB metadata for backups.
- packages/server/src/services/backup.ts:70-77 — Backup record removal is a DB delete; it does not show verified backup artifact deletion as part of an erase-on-request cascade.
med
Add/extend automated tests that prove cascade behavior: when an erase request is executed for a tenant/subject, derived tables and backup/storage artifacts are absent afterward (or marked as deleted) and the final deletion result is recorded against the request ID.
- apps/dokploy/server/api/routers/project.ts:679-713 — Deletion endpoint exists, but the system needs tests demonstrating erase-on-request cascade + verifiability beyond primary-row removal.

Data residency commitment 0%

No end-to-end “data residency commitment” mechanism was found. While the codebase contains a `region` field, it is used only for configuring S3 backup/destination connectivity (passed to `rclone` as `--s3-region`). There is no evidence of tenant/org region pinning plus region-keyed routing that enforces where tenant data and compute run.

high
Add a tenant-scoped residency attribute (org/tenant “data residency region”) and enforce it end-to-end: data placement (storage location/bucket/DB/cluster) and request routing should be keyed off this tenant residency value, with checks at boundaries (API/middleware/router) so cross-region writes/reads are blocked.
- apps/dokploy/components/dashboard/settings/destination/handle-destinations.tsx:1-220 — Current `region` usage is destination connectivity (rclone connection string), not tenancy residency enforcement.
high
Update backup/export logic to enforce residency rather than only target S3 region. If backups must also be residency-bound, derive the required region from the tenant residency setting (not only from a per-destination credential field) and validate compliance at backup upload time.
- packages/server/src/utils/backups/utils.ts:1-260 — `destination.region` is used to build S3 flags; no tenant residency enforcement/routing is evidenced.
med
Add explicit audit/traceability for residency enforcement (e.g., log the tenant residency region used for placement/routing decisions and persist it with job/run metadata) so procurement reviewers can verify enforceability.
- apps/dokploy/server/api/routers/destination.ts:1-183 — Currently only logs/audits destination create/update/delete; residency enforcement (tenant-region driven routing/placement) is not evidenced in this API surface.

Enterprise access controls 0%

The codebase shows partial IP-related primitives (an IP-in-CIDR helper and Traefik whitelist middleware typing), but there is no evidence that a tenant-configurable IP allowlist is enforced at the request boundary with a corresponding admin surface. Router middleware wiring appears to cover auth and path/redirect behaviors, not network restriction allowlisting.

high
Implement tenant-scoped IP allowlist enforcement at the Traefik edge boundary: add middleware generation that writes a Traefik `ipWhiteList`/equivalent allowlist middleware based on tenant configuration, and attach it to the router during `createRouterConfig` (or an equivalent per-tenant router builder).
- packages/server/src/utils/traefik/domain.ts:74-115 — Router middleware composition is the correct boundary hook; currently it conditionally adds auth/path/redirect, but not IP allowlist middleware.
high
Add an admin surface for managing the allowlist (per tenant) that persists the CIDRs and triggers propagation to the dynamic Traefik config (local and remote/serverId paths), so procurement can obtain an auditable control mapping.
- packages/server/src/utils/traefik/middleware.ts:1-88 — Middleware config is loaded/written to `middlewares.yml` and can be updated remotely; this is the mechanism the allowlist manager should update, once CIDR settings are stored.
med
If CDN IP handling is intended to be part of access control, wire it into the tenant allowlist enforcement path (instead of only providing helper functions and hardcoded CDN ranges). Ensure the allowlist logic is actually used to accept/deny requests at the edge.
- packages/server/src/services/cdn.ts:1-71 — IP-in-CIDR and CDN ranges exist, but there is no evidence they are enforced as tenant network restrictions at the router boundary.

Sub-processor transparency 0%

The repository does not provide evidentiary, versioned sub-processor transparency artifacts suitable for closing a DPA procurement clause. The only repo artifact surfaced under “subprocessors” is `packages/server/src/templates/processors.ts`, but its contents are unrelated to maintaining a sub-processor list (it’s a template processing utility). Code does include third-party integrations (e.g., Stripe and AI provider SDKs), but there is no corresponding packaged, versioned sub-processor inventory available in the repo to match those integrations to a DPA clause.

high
Create/restore the expected maintained, versioned sub-processor inventory artifact (e.g., under `docs/subprocessors/` or a committed `SUBPROCESSORS.md`) and make it explicitly DPA-backed (including: version/date, named sub-processors, and what they do / data categories as applicable). Ensure new third-party processor additions trigger an update to this list.
- packages/server/src/templates/processors.ts:1-330 — Current “subprocessors” bucket artifact is not a sub-processor list; it cannot be used as DPA evidence.
high
Cross-check the new/updated sub-processor list against actual third-party SDK usage in code (at minimum: Stripe integrations, and the AI-provider selection/invocation utilities). Add any missing third parties to the inventory and bump the version/date.
- apps/dokploy/server/api/routers/stripe.ts:1-200 — Stripe SDK is instantiated and called, so Stripe must be represented in the declared sub-processor inventory.
- apps/dokploy/pages/api/stripe/webhook.ts:1-200 — Stripe webhook handler is implemented, another concrete third-party integration to be covered by the inventory.

Compliance attestation readiness N/A

The codebase does not contain (and the scan did not find) the required procurement data-room artifact for this primitive: a current compliance attestation readiness package (e.g., SOC 2 Type II report + control-to-code traceability/control mapping). This is expected because the artifact should be supplied from the seller’s data room, not derived from source code.

high

Request the current SOC 2 Type II (or equivalent: ISO 27001, and any required pen-test/assurance materials) and the corresponding control-to-code (Dim 5 audit evidence) traceability package from the seller/data owner. Ensure it is current (latest report period) and includes a clear mapping of each relevant control to implemented mechanisms and evidence artifacts.
med

Ask for versioned documentation that ties the attestation to the specific product/release scope being procured (e.g., what services/tenants are in-scope for the Type II report, and how the control mapping corresponds to that scope).

Reliability / SLA evidence N/A

No packaged Reliability/SLA evidence artifacts (e.g., status page configuration, published SLA terms, or incident postmortems/runbooks) were found in the repository. The git-based evidence scan reports the `status_sla` category as absent. While the codebase contains health-check/monitoring logic (operational mechanics), that does not constitute deal-ready procurement evidence for uptime/SLA track record.

high

Request the seller’s current, published SLA terms and status/uptime reporting artifacts (status page URL or repo config, uptime/availability metrics definition, and any incident/postmortem write-ups) so procurement can map the operational track record.
med

Ask for runbooks/post-incident reports showing reliability handling (e.g., how incidents are detected, triaged, communicated, and resolved) and versioned evidence that these practices are maintained.

Not applicable to this codebase: Questionnaire response library, Compliance attestation readiness, Reliability / SLA evidence.

Reporting & Data Export

Customer-accessible export endpoints (CSV, Parquet, JSON), scheduled exports, and a documented map of emitted events.

36% 6/10 scored

On-demand data export 0%

0/2 expected sites not present
Scheduled / recurring exports 71%

6/7 expected sites
In-product reporting / analytics 67%

3/4 expected sites
Documented export / event schema 78%

3/3 expected sites
Export access control & audit 0%

0/2 expected sites
Exit portability / no lock-in 0%

0/4 expected sites not present

On-demand data export 0%

No on-demand tenant data export primitive was found. The codebase exposes backup/volume-backup APIs (with permission checks and audit logging), but these are infrastructure/data-backup workflows (create/schedule/run/restore), not a tenant-scoped “download/export my data” takeout in portable formats suitable for customer analytics/warehouse/exit.

high
Implement a true tenant-scoped on-demand export/download handler (e.g., TRPC/HTTP route) that exports the full tenant dataset (all customer-relevant entities) into portable formats (tabular/columnar/structured), with streaming or chunking for large exports.
- apps/dokploy/server/api/routers/backup.ts:1-120 — Current “export-like” functionality in this router is backup scheduling/creation, not a customer takeout export endpoint.
high
Ensure the export endpoint is authorization-gated at tenant scope (not only service-level), and write an audit log entry on each export request/result (including export scope and output metadata).
- apps/dokploy/server/api/routers/backup.ts:1-120 — Backup endpoints do audit and check service permission, but there is no tenant-scoped “export my data” takeout path evident here.
med
Add/standardize a portable export format contract (e.g., versioned schema + manifests) and validate completeness coverage against the tenant’s data model before allowing downloads.
- apps/dokploy/server/api/routers/volume-backups.ts:1-170 — Volume-backup router provides operational backup outputs, but there is no evidence of a manifest/schema-driven tenant export completeness mechanism.

Export completeness & fidelity N/A

No code-visible primitive for “Export completeness & fidelity” was found. The repository appears to implement scheduled/manual backups (database + filesystem/volume backup artifacts) with S3 uploads and restore flows, but it does not include a customer-facing, tenant-scoped export endpoint/job that exports the customer’s complete data model with correct types/relationships for round-trippable analytics/warehouse ingestion.

high
Add a true tenant-scoped “data export” mechanism (export endpoint + export job) that serializes ALL exportable entities/fields from the customer’s data model (customer, financial, operational, config, permissions/accounts, integration specs, and historical analytics if applicable) into a portable format, with explicit inclusion/exclusion and stable typing/relations.
- apps/dokploy/server/api/routers/backup.ts:1-220 — Backups exist, but they target backup artifacts (create/schedule/restore), not a complete portable export of the product’s data model.
high
Ensure the export path is permission-gated and tenant-scoped, and includes auditing of export initiation/completion plus integrity checks (e.g., row counts/hashes) to prevent silent truncation.
- packages/server/src/utils/backups/web-server.ts:1-136 — Current backup code focuses on producing an archive and uploading it; it is not a verified, complete data-model export with integrity/coverage guarantees.
med
Define and test an export coverage matrix (entity/field → export output → schema) and add regression tests that diff the export output against the expected data model so that adding/removing entities doesn’t silently break export completeness.
- packages/server/src/utils/backups/compose.ts:1-97 — Compose backup produces a backup artifact and status notifications; it does not establish a general export schema/coverage contract for all product entities.

Large / async export handling N/A

No code-visible mechanism matching the “Large / async export handling” primitive (i.e., an async export job that handles large tenant datasets with streamed output/download and progress/notification for data portability) was found. The repository primarily implements backup/restore and job scheduling for operational volume/database backups, including some streaming of restore logs, but this does not constitute the customer-facing bulk dataset export primitive this audit targets.

high
Confirm whether the product has a customer data export/takeout feature at all (tenant-scoped, permissions + audit), and if so, locate its implementation. If it exists outside the codebase (or under a different term than export/backup/takeout), update the search accordingly (e.g., “takeout”, “data export”, “dump”, “export file”, “download job”).
- apps/dokploy/server/api/routers/volume-backups.ts:220-341 — Current async/streaming work visible here is restoration/log streaming for volume backups, not dataset export to the customer.
med
If you intend backups to satisfy this primitive, replace/extend the backup workflow with an export-job pipeline that (1) covers all tenant dataset categories in a portable format, (2) runs asynchronously (job/queue), (3) streams the output (no buffering the whole dataset in-request), and (4) provides progress/notification and a downloadable artifact.
- packages/server/src/utils/backups/web-server.ts:1-136 — This demonstrates operational backup mechanics (zip/rclone) but does not demonstrate a streamed, large dataset export for customer analytics/exit.

Scheduled / recurring exports 71%

Scheduled/recurring execution exists and is implemented via a persistent schedule model (`cronExpression`, `enabled`, `organizationId`), a runner that initializes enabled schedules into recurring queue jobs, and background BullMQ workers that execute them. For “scheduled exports”, the concrete recurring export delivery path is the backup scheduling logic: enabled backup schedules are cron-triggered and run backup routines that use configured destinations (S3 credentials are constructed). Schedule creation/update/delete is permission-gated and audited.

high
Add/verify retry policy and a dead-letter queue (DLQ) on the recurring export queue jobs. Current queue setup removes jobs on completion/failure, but no explicit retry/backoff/DLQ handling is shown in the scheduler/worker wiring.
- apps/schedules/src/queue.ts:1-111 — Shows `defaultJobOptions` with `removeOnComplete/removeOnFail` and repeat scheduling, but no visible retry/backoff/DLQ configuration.
med
Confirm tenant isolation end-to-end for scheduled execution. The runner queries enabled schedules broadly (no organization filter in the shown bootstrapping), so verify the underlying DB/data model and job payload always confine execution to the owning organization (and that backups/destinations are organization-scoped).
- apps/schedules/src/utils.ts:1-220 — Bootstraps enabled schedules/backups via DB queries without an explicit `organizationId` filter in the shown code.
- packages/server/src/db/schema/schedule.ts:1-86 — `organizationId` exists in the schedule schema, but scheduled job runner filtering/tenant enforcement needs verification.

Warehouse sync / reverse-ETL N/A

No warehouse-sync / reverse-ETL primitive was found. The repo contains no off-graph warehouse connector configuration artifacts (e.g., dbt/airbyte/fivetran/singer/meltano configs) in the expected `warehouse_sync_config` category, and the codebase appears to focus on infrastructure orchestration plus backups/restores rather than exporting customer data into external BI/warehouse destinations via incremental sync.

high

Add a warehouse-sync reverse-ETL layer with maintained, tenant-scoped connector configs (dbt/airbyte/fivetran/singer/meltano or an equivalent internal sync service), including incremental sync state storage and documented target support.
high

Implement and document a complete data export contract for warehouse sync: supported destinations, sync frequency/incremental semantics, and the exported schema so customers can rely on portable results.
med

Ensure the sync/export path is permission-gated and audited (tenant scope + audit logs for export actions and failures).

In-product reporting / analytics 67%

The repo contains a real in-product reporting/analytics module: customer dashboard pages (e.g., Docker container dashboard) backed by tenant-scoped, permission-gated TRPC queries. The UI supports typical analytics interactions (filter/sort/paginate) and is not just a static admin chart. However, evidence of broader portable “data-out” reporting exports is not present in the discovered reporting paths for this primitive (the audit scope here is limited to in-product reporting itself).

high
Extend the reporting surfaces with explicit customer data-export paths (tenant-scoped, permission-gated, auditable) so “insight out” is possible beyond in-product views.
- apps/dokploy/components/dashboard/docker/show/show-containers.tsx:1-241 — This confirms reporting is interactive and data-backed, but it only shows the in-product fetch/render loop; no export/download handler was evidenced in the reporting wiring.
med
Confirm tenant-scoping and permission enforcement consistently across all reporting dashboards (not only Docker). For the monitoring dashboard page, verify the server-side permission checks complete the flow through to the metric source endpoints.
- apps/dokploy/pages/dashboard/monitoring.tsx:1-134 — This page is another customer dashboard surface and includes getServerSideProps auth/permission logic; ensure the rest of the monitoring data pipeline is similarly tenant-scoped.

Event stream completeness N/A

N/A for this primitive in this codebase. While there are runtime event-emission usages (e.g., WebSocket/EventEmitter style 'emit' for log streaming), the repository does not expose a complete, documented internal event catalog for an export/reporting event stream that can be diffed against internal event emit/publish/dispatch/track sites. As a result, there is no implementable 'emitted-vs-documented' completeness loop to audit for drift.

high
Add (and maintain) a documented internal event catalog for the reporting/export event stream (event names + payload schema + versioning) in a doc-adjacent artifact that matches what the backend actually emits (e.g., AsyncAPI EVENTS.md or equivalent), then wire backend emission to that catalog.
- apps/api/src/service.ts:1-120 — Current 'events' usage is external fetching, not an internal documented event stream; this blocks drift checking.
med
Implement an internal event emission layer with a single dispatch function (e.g., publish/track wrapper) and ensure all product event occurrences flow through it, so the emitted set can be deterministically compared to the documented catalog.
- apps/dokploy/server/wss/listen-deployment.ts:1-194 — Eventing is currently scattered across websocket/server handlers; it is not centralized for catalog-based completeness verification.
low
Create an automated CI check that extracts emitted event names from the code (emit/publish/track/dispatch sites) and diffs against the documented catalog to detect drift.
- apps/dokploy/scripts/generate-openapi.ts:1-133 — The repo already generates OpenAPI docs; a similar approach can be used for event catalogs, but currently it is not present as a completeness target.

Documented export / event schema 78%

This codebase has documented schema artifacts for consumer integration via a maintained, versioned OpenAPI specification (openapi.json) generated from the live API router and synced through CI. However, this appears to be API contract documentation rather than an explicit async/event-catalog schema (e.g., asyncapi-style event catalog) specifically for export/event payloads.

high
Add/maintain an explicit export/event schema catalog (e.g., asyncapi or EVENTS.md-style documentation) that enumerates exported event names and their payload shapes, and version it alongside the OpenAPI contract.
- openapi.json:1-120 — Current documented schema evidence is the OpenAPI spec artifact; no separate, explicitly versioned export/event payload catalog was identified in the doc-adjacent scan output.
med
Ensure the documented schema clearly marks which endpoints correspond to bulk/export or event streams, and include response/payload examples for the egress surfaces.
- apps/dokploy/scripts/generate-openapi.ts:1-133 — OpenAPI generation is centralized; enhancing tagging/descriptions for export/event surfaces would make the documented contract more actionable for portability.

Export access control & audit 0%

The codebase does include an export-adjacent access control + audit mechanism for backup lifecycle operations: backup create/update/delete and manual backup runs are permission-checked (tenant/service scoping) and consistently written to the audit trail using audit(ctx, ...). However, the backup file listing (rclone lsjson) and the restore-with-logs streaming endpoint appear to be permission/tenant-checked but do not show audit log writes on those specific data-movement endpoints, which is a gap for an “export access control & audit” primitive.

high
Add audit log writes to the listBackupFiles handler on the actual data-egress/export-like operation path (after permission + org checks, before/after rclone listing). Include action (e.g., 'list'), resourceType (e.g., 'backupFile' or 'destination'), and resourceId (destinationId/serverId if applicable).
- apps/dokploy/server/api/routers/backup.ts:466-520 — listBackupFiles is permission-gated via withPermission('backup','read') and performs orgId checks, but the handler body shown contains no audit(ctx, ...) call.
high
Add audit log writes to restoreBackupWithLogs for the restore action, using the same audit(ctx, ...) utility. Audit should occur once per restore request (and optionally include destinationId/source identifiers) after permission checks succeed.
- apps/dokploy/server/api/routers/backup.ts:522-614 — restoreBackupWithLogs performs checkServicePermissionAndAccess for input.databaseId but the handler body shown does not call audit(ctx, ...).
med
Create a small internal convention helper (e.g., auditBackupAction(ctx, action, backupId, destinationId)) and use it across all backup/export-like endpoints (create/update/delete/run/list/restore) to reduce drift and ensure every portable data-access path is auditable.
- apps/dokploy/server/api/routers/backup.ts:18-474 — Lifecycle operations already use audit(ctx, ...) in multiple places; introducing a shared helper will standardize missing endpoints like list/restore.
- apps/dokploy/server/api/utils/audit.ts:1-26 — audit(ctx, ...) already provides consistent tenant/user context; centralizing action/resource selection is straightforward.

Exit portability / no lock-in 0%

Exit portability (no lock-in) is not implemented as a complete, customer-accessible full-account data export/takeout mechanism. The codebase has backup/retention automation (DB/compose dumps to destinations) and schedule management with permission checks, but there is no evidence of a tenant-scoped full-account export endpoint/job that guarantees complete, portable data extraction for exit. The available terms file also does not contain an explicit data portability / termination export-rights clause.

high
Add a tenant-scoped, permissioned “full account export / data takeout” API endpoint that triggers a job to export ALL tenant-relevant data (at minimum: services/instances, configurations, volumes/metadata needed to restore, and historical operational data that the product uses). Ensure completeness is verified against the tenant data model (no silent truncation) and export output is in a portable format (e.g., versioned JSON/CSV bundles).
- packages/server/src/utils/backups/index.ts:1-157 — Backups are scheduled and limited to backup types; this is not an account-wide, complete export workflow.
high
Extend/introduce an export-job pipeline that is auditable and tenant-scoped end-to-end: (1) authz + tenant validation at job creation, (2) export execution with streaming/packaging, (3) audit-log writes on job start/completion/failure, and (4) a secure download link or user-notified artifact location.
- apps/dokploy/server/api/routers/schedule.ts:1-220 — Schedules show how permissions/auditing are done for scheduled tasks, but there is no equivalent export/job route for exit portability.
med
Add an export completeness checklist + tests that assert all tenant entities are included in the export bundle. Use this to prevent drift as the schema evolves.
- packages/server/src/utils/backups/compose.ts:1-97 — Current backups focus on DB dump commands for specific backup types; they do not demonstrate tenant-wide completeness guarantees.
med
Contract hand-off: confirm with the buyer’s GC that the MSA/terms include a termination/off-boarding data portability clause (export rights prior to lock-in). The current repository terms file does not include such language.
- TERMS_AND_CONDITIONS.md:1-23 — No data portability / termination export-rights clause is present in the provided terms file.

Not applicable to this codebase: Export completeness & fidelity, Large / async export handling, Warehouse sync / reverse-ETL, Event stream completeness.