AI / Data Foundation
Versioned data pipelines, pinned model versions, and a real vector or feature store — not scattered cron jobs and model="latest".
Declarative, tested transformations 33%
The codebase does include declarative, version-controlled transformation logic with unit tests—most clearly for Traefik routing label generation via `createDomainLabels`, which has extensive Vitest coverage including regression and edge cases. However, the broader “compose spec augmentation” pipeline (e.g., `addDomainToCompose`, compose YAML parsing, and write/serialization steps) appears less fully covered by similarly contract-driven transform tests, so the primitive is only partially implemented end-to-end.
- high
Add dedicated unit/integration tests for `addDomainToCompose` covering boundary conditions: (a) empty `domains`, (b) missing target service, (c) composeType docker-compose vs stack/swarm, (d) isolatedDeployment/randomize branches, and (e) presence/absence of existing labels (ensuring idempotent unshift behavior and network injection).
- packages/server/src/utils/docker/domain.ts:113-170 — This is where domains are merged into the parsed compose spec (high-risk transformation logic that should be protected by per-transform tests).
- med
Add tests for YAML parsing and remote compose loading behavior (`loadDockerCompose`/`loadDockerComposeRemote`) including invalid YAML, missing paths, and stderr/empty stdout cases to ensure the transformation fails closed (returns null) rather than producing incorrect outputs.
- packages/server/src/utils/docker/domain.ts:70-112 — Parsing/loading governs the correctness of downstream transformations; boundary tests are needed.
- low
Consider extracting a small “transformation module” surface for compose augmentation (pure functions returning transformed specs, with serialization/writes at the edges) so that the tested transformations are more clearly separated from IO concerns and are easier to audit as a transformation layer.
- packages/server/src/utils/docker/domain.ts:214-252 — `createDomainLabels` is already pure and testable; the same style can be propagated to spec augmentation while keeping IO (read/write/remote) at boundaries.
Orchestrated pipelines 0%
This repo includes an orchestration layer for scheduled/recurring operations using BullMQ: schedules are represented as typed job payloads (Zod), jobs are scheduled centrally with cron repeat patterns, and workers execute them via a dedicated job runner. However, the pipeline layer does not clearly declare explicit retry/attempt/backoff behavior at the queue level, and the job runner catches errors and only logs (which can prevent the orchestrator from reliably marking runs as failed for retries).
- high
Add explicit retry behavior to BullMQ job options (e.g., `attempts`, `backoff`, and/or `removeOnFail` strategy) in `jobQueue`/`defaultJobOptions`, and ensure failure states propagate (don’t swallow errors).
- apps/schedules/src/queue.ts:1-80 — Queue/job options currently only set `removeOnComplete` and `removeOnFail`—no retry/attempt/backoff configuration is declared.
- apps/schedules/src/utils.ts:1-273 — `runJobs` wraps execution in `try/catch` and logs errors via `logger.error(error)` without rethrowing; this can inhibit BullMQ from treating the run as failed for retry purposes.
- med
Improve run observability by recording structured metadata with each job execution (inputs, schedule/job ids, timezone/cron expression, and any script/command version), and include it in both success/failure logs.
- apps/schedules/src/queue.ts:1-80 — Job creation passes `job` as data and sets repeat pattern, but there is no visible mechanism for attaching versioned execution metadata beyond the job payload.
- apps/schedules/src/workers.ts:1-46 — Worker logs include `job.data`, but there is no evident structured/complete run context (versions/inputs captured for determinism).
- low
If you intend true dependency-graph orchestration (DAG), model task dependencies explicitly (e.g., separate job types with `dependsOn` or a workflow engine), rather than using only repeatable single-step jobs.
- apps/schedules/src/queue.ts:1-80 — Scheduling maps directly to a single queue job per cron tick per job type; there is no explicit DAG/dependency structure.
Data quality validation / contracts 100%
This codebase does have data quality validation/contracts: it externalizes input validation into runnable Zod schemas and DB insert schemas (via `drizzle-zod`), and applies them at TRPC ingestion boundaries (router `.input(...)` gates) to reject invalid requests before side effects and persistence.
- high
Confirm that every ingestion boundary that leads to persistence/side-effects has a schema gate (TRPC `.input(...)` or equivalent) and not only runtime checks. Add/extend schemas where gaps exist (e.g., ensure all operations that accept ids/paths/files use a shared Zod contract module).
- apps/dokploy/server/api/routers/docker.ts:1-220 — Representative example of correct schema-gated ingestion; use this pattern to audit and fill any missing `.input(...)` gates elsewhere.
- med
Add explicit negative-path tests (unit/integration) that assert invalid inputs are rejected (400/typed TRPC error), and that no DB rows are created for invalid payloads.
- packages/server/src/db/schema/notification.ts:220-320 — There are strong schema contracts here; tests would ensure the contracts remain enforced as the API evolves.
- low
Where schemas exist (e.g., `uploadFileToContainerSchema`), standardize error messages and error codes so diagnostics clearly indicate which contract failed and why (field-level errors from Zod).
- apps/dokploy/server/api/routers/docker.ts:220-302 — This site already validates and type-checks the file; improving diagnostic consistency would further harden the contract workflow.
Raw / immutable source layer 0%
No immutable/raw landing layer was found. The raw compose handling deletes and recreates the on-disk compose location and writes `docker-compose.yml` directly, which makes the original source non-recoverable for later audits/reprocessing.
- high
Introduce an immutable/raw landing directory for raw compose inputs (e.g., store `composeFile` as `raw/docker-compose.yml` or with a content-hash/run-id). Do not delete it on subsequent renders/deploys; only create new versions when raw input changes.
- packages/server/src/utils/providers/raw.ts:1-20 — Current implementation removes and recreates `${outputPath}` before writing `docker-compose.yml`, eliminating any recoverable original source.
- high
Update the raw-to-workdir wiring so downstream steps (randomizeSpecificationFile, domain injection, etc.) write into a separate mutable working directory, while the immutable/raw source remains untouched.
- packages/server/src/utils/docker/domain.ts:1-220 — `cloneCompose()` routes `sourceType === 'raw'` through `getCreateComposeFileCommand()` which writes directly to the compose workdir path; immutable raw storage should be added alongside it.
- med
Add a manifest/metadata record (at least content hash, timestamp, and sourceType) for each raw compose input version to support reproducibility and audit trails.
- packages/server/src/utils/providers/raw.ts:1-20 — The current raw pipeline only base64-encodes and writes content; there is no evidence of versioned metadata or immutable preservation for audit/replay.
Data + pipeline versioning 0%
No governed “data + pipeline versioning” primitive was found. The codebase implements deployment/job orchestration and records deployment logs and (some) commit-derived identity, but there is no evidence of a versioning system that snapshots/governs both the pipeline logic and the data/state it runs on in a reproducible, traceable way.
- high
Introduce a reproducibility contract for each deployment run: (1) persist a pipeline version manifest (build config/toolchain versions + repository code revision/commit) and (2) persist immutable data/state artifact identifiers (dataset snapshot/version or input artifact digests). Store both in the deployment/job database record and propagate them through the queue and execution paths.
- apps/dokploy/pages/api/deploy/[refreshToken].ts:260-310 — Deployment job is the run boundary where the system should externalize pipeline+data versions, but it currently only includes `deploymentHash` (commit-derived) as a log description.
- high
Add explicit, queryable linkage between runs, pipeline versions, and data snapshots: create dedicated DB tables/fields (or an external system like DVC/lakeFS) for data artifact versions and pipeline manifests, and reference them from deployments.
- packages/server/src/services/deployment.ts:60-115 — Deployment creation writes deployment status/logPath but does not attach a data snapshot/version or pipeline manifest reference, limiting traceability and reproducibility.
Data lineage / provenance 0%
No data lineage/provenance primitive is implemented as a governed, queryable mechanism. The codebase contains an audit log schema (action/resource tracking) and an Inngest-to-UI mapping that reconstructs some source relationships (events → runs), but there is no automatic, complete lineage of datasets and their transformations, nor any explicit lineage emission that can be validated during change management.
- high
Introduce a dedicated lineage/provenance data model and emission points in the pipeline/application layer: for every derived dataset/row returned by transformations (e.g., event/run aggregation), persist (1) source identifiers, (2) transformation/mapping version, (3) schema/version of output, and (4) timestamps and operator/job metadata. Expose it via API so it is queryable and change-management verifiable.
- apps/api/src/service.ts:1-240 — This is where derived job rows are produced from fetched event/run sources; it should be the primary lineage emission site.
- med
Extend or complement the existing audit-log mechanism only if it is repurposed for data lineage: add dataset/derivation identifiers and explicit transformation edges. Otherwise, keep audit-log as security/compliance and implement lineage in a separate governed store.
- packages/server/src/db/schema/audit-log.ts:1-95 — Current audit log schema is action/resource oriented and lacks dataset derivation/transform-history semantics.
Feature management 0%
No implementation of the requested “Feature management” primitive exists in this codebase. The only related concept found is an enterprise license-based UI gate, which does not externalize or unify feature definitions for training vs. serving.
- high
Confirm the intended scope: if you actually need ML feature-store style management (e.g., Feast/Tecton) for train/serve parity, introduce a dedicated feature layer with a single versioned definition source, and ensure both training and serving read the same compiled feature definitions.
- apps/dokploy/components/proprietary/enterprise-feature-gate.tsx:1-115 — Current “feature” handling is limited to an enterprise license gate; it does not represent shared train/serve feature definitions.
- med
If the real requirement is product feature flags (not ML features), rename/re-scope this primitive in your audit rubric (feature flags vs. feature store/feature definitions).
- apps/dokploy/components/proprietary/enterprise-feature-gate.tsx:40-94 — This is explicitly a license-gating component, not a feature definition system.
Vector / embedding store N/A
No managed/vector embedding store primitive is present or wired in this codebase. Although there is an AI text-generation service with structured output validation, the implementation does not compute embeddings, does not persist vectors, and does not query any dedicated vector database.
- high
If the product intends to support RAG/semantic search over stored project/app data, introduce a dedicated vector embedding store layer (with explicit schema, indexing, and a model+content-version linkage) and wire it into the request path where retrieval is needed.
- packages/server/src/services/ai.ts:1-258 — Current AI path is generation-only (`generateText`) with structured output validation; it provides a clear contrast point for where a retrieval step (embed→upsert/search→grounded generation) should be inserted.
Model version pinning 0%
Model calls exist (via `generateText`), but there is no evidence of “model version pinning” being enforced: the code passes through a model identifier chosen at runtime (`aiSettings.model` / `input.model`) without ensuring it is an explicit, pinned version (e.g., rejecting `latest`/`stable`-style floating aliases or requiring versioned IDs).
- high
Enforce pinned, explicit model IDs at the boundary where models are selected: validate `aiSettings.model` (and `input.model` in `testConnection`) against a policy (reject `latest`/`stable` and require versioned IDs), and/or map friendly names to pinned model IDs from a single config file.
- packages/server/src/services/ai.ts:70-170 — Model ID flows directly from `aiSettings.model` into `generateText` (`provider(aiSettings.model)` → `generateText({ model })`) with no pinning enforcement.
- apps/dokploy/server/api/routers/ai.ts:240-380 — `testConnection` uses `provider(input.model)` and then `generateText` with no validation/pinning.
- med
Add an internal “model registry” (versioned config) that stores supported pinned model IDs per provider, and ensure both runtime suggestions and connection tests draw from this registry.
- packages/server/src/services/ai.ts:70-170 — Central invocation point for AI suggestions; best place to route through a pinned model registry.
Prompt / model-call management N/A
I did not find any evidence of prompt/model-call management in this codebase: there are no centralized prompt/config artifacts (no prompt/prompts directories or prompt-management libraries), and there are no detectable LLM model-call sites (no OpenAI/Anthropic model SDK usage tied to chat/completions/generate-style calls). This repo appears to be a non-LLM application (infrastructure/dashboard/workers), so this primitive does not appear applicable here.
- high
If/when LLM features are added, introduce a managed, versioned prompt/config layer (single source of truth) and route all model calls through it; otherwise, do not force this primitive into a non-LLM stack.
Reproducibility / determinism 0%
The codebase does not provide a reproducibility/determinism primitive that would allow recreating runs exactly from pinned inputs (e.g., captured seeds, environment/dependency versions, and deterministic execution controls). A clear non-deterministic component exists: password generation uses `Math.random()` without any seed management or run-input capture.
- high
Introduce a deterministic randomness mechanism for replayable runs: replace `Math.random()` with a seeded PRNG (e.g., `seedrandom`/custom xorshift) and plumb a `seed` value from run boundaries into `generateRandomPassword`; also persist the seed alongside other run inputs.
- packages/server/src/auth/random-password.ts:1-21 — Current implementation uses `Math.random()` and has no parameterization or seed capture, blocking exact replay.
- med
Add run-boundary metadata capture for determinism: at the entry points that trigger nondeterministic operations (auth flows, any “randomize” features, job/worker execution), record (a) seed(s) used, (b) environment variables relevant to behavior, and (c) code/dependency versions (e.g., git SHA + lockfile hash).
- packages/server/src/auth/random-password.ts:1-21 — The absence of any seed/environment capture in the randomness-producing function is the blocker; without run-boundary capture, determinism cannot be achieved.
AI output validation 0%
AI output validation is partially implemented. For `suggestVariants`, the code uses a strict Zod schema at the model boundary via the AI SDK’s structured-output facility (`Output.object({ schema })`), ensuring invalid outputs are rejected before being used. However, other model call paths (e.g., `analyzeLogs`, `testConnection`) return/only lightly check raw model text with no strict schema gate, and there is no demonstrated retry/self-correction loop that reuses the same schema and consistent error messaging.
- high
Add strict structured output validation (Zod + `Output.object({ schema })`) to the `analyzeLogs` path so `result.text` is replaced by a schema-gated object (e.g., `{ summary, issues_found, root_cause, suggested_fix }`), rejecting non-conforming outputs instead of passing raw text through.
- apps/dokploy/server/api/routers/ai.ts:251-277 — Model output is returned as raw text (`return { analysis: result.text }`) without schema validation.
- high
Tighten `testConnection` output validation to an explicit contract (e.g., require exact `text === 'ok'` or use structured output with a schema) rather than only checking that `result.text` is non-empty.
- apps/dokploy/server/api/routers/ai.ts:302-350 — Only checks `if (!result.text)` before returning success; does not validate expected content/shape.
- med
Implement a closed-loop retry around schema validation failures for the structured-output calls (including reusing the same schema and returning a consistent validation error message), rather than failing immediately or only logging.
- packages/server/src/services/ai.ts:170-218 — Schema validation exists, but there is no visible retry/self-correction loop that re-prompts on validation failure with consistent messaging.
Grounding / wrongness check 0%
Grounding / wrongness checking for LLM outputs does not appear to be implemented as a reusable primitive. The `aiRouter` endpoints call `generateText` and directly return `result.text` (or only check it is non-empty) without any judge/validator pass that verifies claims against the provided context/logs.
- high
Add a centralized grounding/wrongness-check wrapper for LLM outputs (e.g., `groundingWrongnessCheck({context: input.logs, outputText})`) that either (a) produces a structured verdict per claim supported by the context, or (b) rejects and triggers a bounded retry with an evidence-only prompt and strict schema validation. Wire it into `analyzeLogs` so the endpoint never returns unverified claims.
- apps/dokploy/server/api/routers/ai.ts:180-320 — Model output from `generateText` is returned directly (`analysis: result.text`) with no grounding/judge step.
- high
Harden the `testConnection` endpoint with explicit contract validation (e.g., require trimmed `result.text` equals `ok`) and treat mismatch as a failure. This is a minimum wrongness check for even simple outputs.
- apps/dokploy/server/api/routers/ai.ts:180-320 — Prompt says "Reply with 'ok'", but code only checks `if (!result.text)` and does not verify the actual content equals `ok`.
- med
Add an automated eval/verification harness (golden tests) for the `analyzeLogs` behavior: a set of log snippets with expected grounded findings, and a CI pass/fail that enforces that the grounding check gates the output.
- apps/dokploy/server/api/routers/ai.ts:180-320 — This endpoint defines the behavior that would require eval gating; currently no evidence of such a harness is present in the code paths shown.
Self-correction / feedback loop 0%
No self-correction / feedback loop exists. While the project uses structured output validation for the main AI suggestion generator (`suggestVariants`) via `Output.object({ schema: fullSchema })`, failures are not used to drive a bounded retry where the specific validation error is fed back into the model prompt and re-checked.
- high
Implement a closed self-correction loop inside `suggestVariants`: catch schema/parse/validation errors from the AI SDK/Zod gate, extract the specific error message/path, append it to the prompt (e.g., “Your previous output failed because: <error>. Fix and retry.”), and re-run `generateText` for a bounded number of attempts (e.g., 2-3) before returning a safe fallback (like a deterministic error payload).
- packages/server/src/services/ai.ts:70-230 — Model call + strict schema gate exist, but on failure it only throws (open loop). This is the exact call site where the primitive should close the loop.
- med
Ensure the API layer (`apps/dokploy/server/api/routers/ai.ts`) either (a) does not just propagate errors, or (b) returns a structured error that includes the last validation error and attempt count when the loop ultimately fails.
- apps/dokploy/server/api/routers/ai.ts:260-330 — `suggest` mutation calls `suggestVariants(...)` and propagates failures without any retry-with-feedback behavior at the boundary.
- low
Add a unit/integration test that forces an invalid model output (or mocks `generateText` to return invalid JSON) and asserts that the second attempt includes the validation error and that attempts are bounded.
- packages/server/src/services/ai.ts:70-230 — The critical behavior to test is the retry-with-error-feedback around the structured output gate.
Evaluation harness + scoring 0%
I could not find any evaluation harness + scoring implementation for AI output quality (no `eval/`, `golden/`, `testset/`, or evaluation runner artifacts, and no code that logs model I/O into a scored, recurring eval process). The codebase does have runtime schema validation and AI calls for suggestion generation and log analysis, but the offline golden-set evaluation/scoring layer described by the primitive is missing.
- high
Add an offline evaluation harness for the AI features (starting with `suggestVariants` and `analyzeLogs`): create a versioned golden dataset (inputs like user requests/log samples + expected properties), run model generations in batch, score outputs with deterministic judges, and store results per run (including model/version, prompt version, and seed/params).
- packages/server/src/services/ai.ts:74-258 — Production model call path (`suggestVariants`) that should be evaluated against golden cases and scored; currently only runtime schema gating is visible.
- apps/dokploy/server/api/routers/ai.ts:240-335 — Production model call path (`analyzeLogs`) that should feed into the eval harness + scoring and have run outputs logged for recurring evals.
- high
Implement production logging specifically for eval readiness: persist (a) the feature name, (b) prompt/prompt-template version, (c) model identifier + provider, and (d) the structured model output (or model text), along with the input payload. Wire logs to the eval runner so the next recurring eval can replay and score them.
- packages/server/src/services/ai.ts:74-220 — The model invocation (`generateText`) occurs here; this is the natural hook point for logging inputs/outputs/model+prompt versions.
- med
Create a documented one-command check/CI job that runs the eval suite, reports pass/fail with clear thresholds, and fails the build on regressions.
- apps/dokploy/server/api/routers/ai.ts:240-335 — A concrete second feature endpoint to include in CI evals once the harness exists (ensures broader coverage beyond just compose suggestions).
Runnable correctness checks 0%
The codebase contains a Vitest configuration and many unit tests, but I could not find a documented one-command “correctness check” entrypoint (or CI workflow) that an agent can run to get an unambiguous pass/fail verdict. Therefore, runnable correctness checks are not externally governed/documented in a way this primitive requires.
- high
Add a root-level, documented command that runs the repository’s correctness suite and returns a clear exit code (e.g., a `test` script like `vitest run --config apps/dokploy/__test__/vitest.config.ts`), and ensure CI uses the same command.
- apps/dokploy/__test__/vitest.config.ts:1-35 — This is the existing test configuration that should be invoked by a single runnable command.
- med
Add/confirm a CI workflow that runs the same command and fails the build on test failure, providing the positive confirmation needed for this primitive.
- apps/dokploy/__test__/setup.ts:1-44 — The test setup indicates tests are intended to run in CI without a real DB; a CI runner should invoke them reliably.
- low
Document the command in a contributor-facing file (README/CONTRIBUTING) alongside expected duration and how to run locally (same command as CI).
- apps/dokploy/__test__/vitest.config.ts:1-35 — Because the test runner behavior is defined here, documentation should point users/agents to the exact command that uses it.
Actionable diagnostics 25%
The codebase does include actionable diagnostics for core authorization failures (structured `TRPCError` with error `code` and `message`). However, some upstream API failures in `apps/api/src/service.ts` are handled primarily via logging and then returning empty/truncated data, which reduces actionability for callers.
- high
Standardize upstream API failure handling in `apps/api/src/service.ts`: when `fetchInngestEvents` / `fetchInngestRunsForEvent` encounter non-OK responses, propagate a structured error to callers (with a code/type and fix hint), rather than only logging and returning empty/partial results.
- apps/api/src/service.ts:44-83 — Non-OK response logs `status` and `body` but then `break`s and returns partial data without throwing an actionable error.
- apps/api/src/service.ts:93-125 — Non-OK response logs details but then returns `[]`, leaving the caller without a diagnostic failure signal.
- med
Add actionable diagnostics around `resolveRole()` failure modes (especially `JSON.parse(entry.permission)`) so that malformed role permissions produce a clear message and stable error code rather than failing indirectly later.
- packages/server/src/services/permission.ts:73-119 — Role resolution parses `entry.permission` and can fail in ways not shown to be wrapped in structured, user-actionable errors.
- low
Ensure environment-variable errors (e.g., missing `INNGEST_BASE_URL`) use the same structured diagnostic pattern (stable code/type) as other API errors, not just a raw `Error` string.
- apps/api/src/service.ts:175-193 — Throws generic `Error("INNGEST_BASE_URL is required...")` when config is missing; improve consistency with stable error codes/types.
Positive confirmation 0%
No explicit “positive confirmation” success contract (e.g., a CI workflow or other governed pass/fail gate that clearly indicates correctness/safe-to-stop) was found in the repository. While the project uses Vitest and includes test files, I did not find any CI/build definition that externally and unambiguously signals test success.
- high
Add a GitHub Actions workflow (or equivalent CI) that runs the repo’s canonical test command (e.g., `vitest run` / `pnpm test`) and treats a passing run as the explicit positive-confirmation signal (green status). Ensure the workflow is the single source of truth for the one-command check.
- apps/dokploy/__test__/vitest.config.ts:1-35 — Shows tests are configured via Vitest, so CI can safely rely on `vitest run` exit code as the success/failure signal.
- med
Create/verify a single documented root-level script for the “green check” (e.g., `pnpm test`), and ensure it runs the full relevant test suite with deterministic settings (no watch mode).
- apps/dokploy/__test__/vitest.config.ts:1-35 — Vitest configuration exists, but no explicit positive-confirmation wiring (like a documented script/CI gate) was found.
Machine-readable contracts 92%
Machine-readable contracts are present and well-externalized: the repo generates and commits an OpenAPI specification (openapi.json) from the server router, and the Swagger UI consumes the generated OpenAPI document. Additionally, chatwoot integration expectations are expressed via a .d.ts declaration file.
- high
Add/ensure a CI check that re-runs apps/dokploy/scripts/generate-openapi.ts and fails if the generated output differs from the committed openapi.json (keeps contract in strict sync with implementation).
- apps/dokploy/scripts/generate-openapi.ts:1-133 — The generator writes openapi.json (source-of-truth pipeline), but without a visible enforced diff check, contract drift is possible.
Not applicable to this codebase: Vector / embedding store, Prompt / model-call management.