AI / Data Foundation
Versioned data pipelines, pinned model versions, and a real vector or feature store — not scattered cron jobs and model="latest".
Declarative, tested transformations 140%
A clear instance of declarative, tested transformations exists in the workflow template import path: `templateTransforms.ts` provides centralized transformation functions for credential overriding and resourceLocator scrubbing, and `templateTransforms.test.ts` contains unit tests (including boundary/empty and non-mutation checks). The import orchestration (`templateActions.ts`) correctly delegates to this transformation layer.
- high
Extend this transformation-layer pattern to any other template-related data reshaping currently done inline (if present). As a check, ensure any future transformation helpers have adjacent unit tests and are invoked by the import orchestration rather than reimplemented at call sites.
- packages/frontend/editor-ui/src/features/workflows/templates/utils/templateActions.ts:33-74 — Shows the preferred orchestration style: call into the transformation layer from the template import path.
- med
If additional transformation responsibilities emerge (e.g., further template normalization beyond credentials and resourceLocator), consider expanding the existing `templateTransforms.ts` module rather than creating one-off helpers next to orchestration logic, to keep transforms declarative and testable as a single governed layer.
- packages/frontend/editor-ui/src/features/workflows/templates/utils/templateTransforms.ts:1-173 — Currently centralizes multiple related transformations (credential normalization/replacement and resourceLocator clearing) in one tested module.
Orchestrated pipelines N/A
No codebase-wide “Orchestrated pipelines” primitive was found in the sense of a governed DAG/asset definition with explicit declared dependencies, retry behavior, and reproducible, observable pipeline runs. What was found related to “orchestration” is worker-status polling in the frontend, which does not constitute a pipeline orchestration layer with a dependency graph and execution governance.
- high
Identify where data/AI pipelines are executed (backend worker/services layer) and ensure they are defined declaratively as a DAG (assets/manifests) with explicit dependencies and retry/success/failure handling surfaced as structured run records (reproducible inputs + run metadata).
- packages/frontend/editor-ui/src/features/settings/orchestration.ee/orchestration.store.ts:1-82 — Current “orchestration” visibility is limited to worker status polling; the primitive audit criteria (DAG dependencies, retries, reproducible runs) are not present here.
Data quality validation / contracts 50%
The repo externalizes data/contract validation using Zod-based schemas at tool boundaries (notably in the agents integration tool layer) and also provides a reusable schema-resolution/validation helper in the workflow SDK. This indicates a real data-quality/contract primitive, though only a small number of ingestion-boundary sites were confirmed in this audit slice.
- high
Extend this audit to all ingestion boundaries (HTTP handlers/DTO parsing, workflow input ingestion, node parameter parsing, webhook payloads). For each, require an explicit schema gate (Zod/JSON-schema-to-Zod) with a quarantine/error path instead of only downstream type assumptions.
- packages/cli/src/modules/agents/integrations/integration-tools.ts:805-920 — Shows the expected pattern (Tool `.input(...)` schema gate) at one ingestion boundary; replicate/verify across other ingestion surfaces.
- med
Add/verify standardized error routing for schema-validation failures (e.g., consistent error codes, capture invalid payloads, and ensure invalid inputs never propagate into execution).
- packages/@n8n/workflow-sdk/src/validation/resolve-schema.ts:1-227 — The helper produces descriptive validation messages but this audit did not confirm the end-to-end quarantine/routing behavior for invalid datasets at every boundary.
Raw / immutable source layer 0%
An explicit raw landing-layer concept exists (`InsightsRaw`), and compaction stages read from it. However, the compaction process deletes the processed raw rows from the source table, so the raw layer is not immutable and is not safely recoverable for auditing/reprocessing after transforms.
- high
Make `insights_raw` append-only/immutable by removing the destructive delete. Replace `DELETE FROM ${sourceTableName} ...` with an approach that preserves raw rows (e.g., mark-processed without deletion, move to an archive table, or store immutable raw snapshots keyed by run/batch).
- packages/cli/src/modules/insights/database/repositories/insights-by-period.repository.ts:220-290 — Compaction currently deletes processed rows from the source table, which violates the 'raw / immutable source layer' requirement.
- med
Add explicit auditability guarantees: store lineage metadata for each compaction run (source batch identifiers, time window, and counts) so auditors can reproduce aggregates from immutable raw data.
- packages/cli/src/modules/insights/insights-compaction.service.ts:1-220 — Compaction is staged and batch-oriented, but there is no evidence (in the inspected code) of lineage metadata that preserves the exact immutable raw inputs used per aggregate output.
Data + pipeline versioning 0%
No clear implementation of “Data + pipeline versioning” (data state captured in immutable, versioned snapshots and linked to specific pipeline logic releases) was found. The codebase has evaluation dataset syncing and mock/pin-data generation, but dataset state is updated (and new examples get random UUIDs) rather than being snapshot-versioned and tied to a specific pipeline release for guaranteed reproducibility.
- high
Add explicit, release-tied dataset snapshotting for evaluation inputs (e.g., store generated/synced scenario inputs + splits as immutable artifacts/versions, then reference the snapshot id in evaluation runs). Ensure the snapshot is created deterministically from repo content + pipeline version/commit, not only by diffing current filesystem state.
- packages/@n8n/instance-ai/evaluations/langsmith/dataset-sync.ts:55-210 — Current sync logic updates/creates examples based on derived inputs and uses randomUUID for new examples; there is no release-specific snapshot id or immutable dataset artifact coupling.
- med
Record and persist pin-data generation provenance/versioning (generator code version/commit + schema resolution strategy + input workflow hash) and store the resulting pin data as versioned artifacts (or ensure the evaluation can fetch a prior immutable pin-data version).
- packages/@n8n/ai-workflow-builder.ee/evaluations/support/pin-data-generator.ts:1-489 — Pin data generator resolves schemas using typeVersion and constructs prompts, but there is no evident mechanism for immutable, release-specific storage/versioning of the generated data outputs.
- low
If using an external system (e.g., LangSmith) as the data store, introduce a governed “dataset/model/pipeline version” field in example metadata and enforce that evaluation runs pin to a specific dataset snapshot/version rather than relying on “current sync”.
- packages/@n8n/instance-ai/evaluations/langsmith/dataset-sync.ts:1-55 — Metadata and sync behavior are present, but there is no shown linkage to a pipeline release/version that would make re-runs reproducible against a specific data state.
Data lineage / provenance N/A
No explicit data lineage / provenance primitive (e.g., OpenLineage/Marquez/DataHub/Amundsen, or equivalent lineage/provenance emission + governance artifacts) was found in this codebase via repository-wide searches. No schema/config/artifact for provenance emission and no lineage/provenance implementation points were located.
- high
Add an explicit, machine-queryable lineage/provenance emission layer in the pipeline/execution path (record dataset identifiers, source dataset(s), and transformation/derivation edges with timestamps + run identifiers). Ensure it is persisted and queryable (DB tables/events) and covered by automated tests that validate lineage correctness end-to-end.
- med
Adopt or integrate a standard lineage model (e.g., OpenLineage) or define an equivalent internal schema and publish a machine-readable contract (schema) for lineage events.
Feature management N/A
I did not find any feature-management primitive (e.g., a centralized, versioned feature definition/feature store used as a single source of truth by both training and serving). The codebase appears to define feature-like schemas inline in application code rather than externalizing them into a governed feature layer suitable for avoiding training/serving skew.
- high
Introduce a centralized feature-definition artifact (feature store + versioned contracts/feature manifests) and route both training and serving to read from the same generated/compiled feature definitions.
- packages/frontend/editor-ui/src/features/agents/agent.types.ts:1-39 — Current schemas are defined inline via TypeScript interfaces, indicating an implicit (code-local) source of truth rather than an externalized feature layer.
Vector / embedding store 17%
This codebase clearly implements a vector/embedding-store primitive via n8n LangChain vector store nodes (including external providers like PGVector) and a shared createVectorStoreNode dispatcher. However, the audited implementation shown for the in-memory store is explicitly ephemeral (lost on restart) and the memory manager metadata does not demonstrate any governance tying vectors to the embeddings model version and embedded content version.
- high
Add explicit vector versioning governance: store metadata/namespace keyed by (1) embeddings model identifier/version and (2) a hash/version of the embedded content (e.g., per-document content hash or dataset manifest). Ensure both are written alongside vectors and used to route queries to the correct index.
- packages/@n8n/ai-utilities/src/utils/vector-store/MemoryManager/MemoryVectorStoreManager.ts:120-200 — Metadata tracks size/createdAt/lastAccessed but not embeddings model version or content version.
- packages/@n8n/ai-utilities/src/utils/vector-store/MemoryManager/MemoryVectorStoreManager.ts:200-320 — addDocuments persists vectors into the in-memory store without recording model/content version linkage.
- high
Enforce model/content version linkage at the insertion call path: have createVectorStoreNode/insertOperation derive a content version (or accept one from upstream) and pass it into populateVectorStore so providers can persist vectors into a versioned namespace/index.
- packages/@n8n/ai-utilities/src/utils/vector-store/createVectorStoreNode/operations/insertOperation.ts:1-82 — Insertion delegates persistence to args.populateVectorStore, but no versioning contract is enforced in this flow.
- med
For each external vector store provider (e.g., PGVector), implement and validate a consistent schema for storing: embeddings_model_id, embeddings_model_version, content_version/hash, and a retrieval filter to prevent mixing indexes from different versions.
- packages/@n8n/nodes-langchain/nodes/vector_store/VectorStorePGVector/VectorStorePGVector.node.ts:1-260 — A persistent vector-store provider exists; the remaining requirement is to ensure it persists version linkage and uses it during retrieval.
Model version pinning 100%
Model version pinning exists and is implemented well in `packages/@n8n/ai-workflow-builder.ee/src/llm-config.ts`, where model factories construct LangChain chat models using explicit versioned model IDs for OpenAI and Anthropic. Additionally, integration-test fixtures include explicit versioned model identifiers to keep test behavior deterministic. However, general runtime node templates (e.g., the OpenAI-compatible example node and the LMChatOpenAi node) appear to accept model IDs from node parameters without enforcing version pinning.
- high
Add governance to user-facing model selection nodes (e.g., OpenAI-compatible example node and LMChatOpenAi): validate that `model` is a pinned, versioned ID (or provide a controlled dropdown sourced from a pinned registry), and reject/warn on floating identifiers like `latest`/`stable` or unversioned names.
- packages/@n8n/node-cli/src/template/templates/programmatic/ai/model-openai-compatible/template/nodes/ExampleChatModel/ExampleChatModel.node.ts:36-66 — The node forwards `model: modelName` directly from node parameters into `supplyModel` without pin enforcement.
- packages/@n8n/nodes-langchain/nodes/llms/LMChatOpenAi/LmChatOpenAi.node.ts:300-360 — The node exposes a user-provided `model` resource locator and routes it to the provider; this is a runtime invocation surface where pinning should be governed.
- med
Extend `llm-config.ts` model registry coverage (if needed) to also cover any remaining generation/evaluation stages so all production model invocations go through pinned factories rather than partially relying on user-entered model strings.
- packages/@n8n/ai-workflow-builder.ee/src/llm-config.ts:18-134 — Currently, pinning is strong for the factories defined in this file; ensuring every relevant production path uses these factories reduces drift risk.
Prompt / model-call management 89%
The codebase does have a managed/centralized prompt layer for at least the AI Workflow Builder and related evaluators (e.g., packages/@n8n/ai-workflow-builder.ee/src/prompts/* with re-exported builders). Core LLM calls use these prompt builders and, in key flows like the Planner Agent, enforce a structured output schema with validation and a bounded retry loop—matching the intended prompt/model-call management primitive.
- high
Ensure every other agent/evaluator model call site in this repo (outside the planner/responder examples) follows the same pattern: (1) prompt text built from the centralized prompts module, and (2) output parsed/validated against an explicit schema gate before the result is used.
- packages/@n8n/ai-workflow-builder.ee/src/prompts/index.ts:1-82 — Centralized prompts exist; this is the desired enforcement target for remaining call sites.
- packages/@n8n/ai-workflow-builder.ee/src/agents/planner.agent.ts:121-176 — Demonstrates the correct pattern (validation + bounded retry). Other call sites should be checked for equivalent governance.
- med
Add/strengthen automated tests that fail if a call site inlines a prompt literal or bypasses the centralized prompt builders (e.g., unit tests asserting the prompt builder function is used, or snapshot tests tied to prompt builder outputs).
- packages/@n8n/ai-workflow-builder.ee/src/agents/planner.agent.ts:55-105 — Planner prompt governance is already testable via prompt builder outputs + schema parsing; extending this approach can prevent prompt drift elsewhere.
Reproducibility / determinism 0%
The repo contains at least one strong determinism pattern: deterministic workflow-builder node ID generation that is explicitly implemented and unit-tested. However, at higher-level evaluation/execution run boundaries (the parts that would need exact reproducibility of datasets/prompting/LLM sampling), the observed harness/config wiring does not show explicit capture of determinism controls such as RNG seed or pinned model sampling parameters; the execution flow also uses randomUUID for run IDs.
- high
Add a determinism configuration object at evaluation-run boundaries (e.g., in the harness runner config): capture and persist (1) RNG/seed value(s), (2) LLM sampling parameters (temperature/top_p/max_tokens), (3) pinned model IDs/versions, and (4) relevant environment/config versions. Persist it alongside the evaluation run artifacts (transcript/output/score).
- packages/@n8n/ai-workflow-builder.ee/evaluations/harness/runner.ts:30-120 — The run config type is the natural run boundary but contains no explicit seed/determinism controls in the observed portion.
- high
Eliminate or quarantine non-deterministic identifiers used inside evaluation execution, or ensure they are explicitly recorded as non-reproducible metadata while the actual determinism controls are captured separately (e.g., record the seed + model parameters instead of relying on deterministic outputs only).
- packages/cli/src/modules/instance-ai/eval/execution.service.ts:1-120 — Uses randomUUID when returning error results; this affects run identity but should not affect the ability to recreate the underlying workflow evaluation deterministically without an explicit captured determinism config.
- med
Ensure CI/provenance capture includes the determinism-critical items, not just CI source (ci/local) and GH run IDs. Either embed commit SHA/branch in the metadata artifact or ensure the run artifact stores them directly (instead of relying on LangSmith auto-tracking).
- packages/@n8n/ai-workflow-builder.ee/evaluations/cli/ci-metadata.ts:1-39 — Comments explicitly say commit SHA/branch are not included here; reproducibility of exact code prompts/configs would be stronger if persisted alongside run artifacts.
AI output validation 100%
The codebase contains a strong, schema-governed AI output validation primitive for structured outputs. LLM text is parsed and validated against a declared Zod schema before any result is accepted, and when auto-fix is enabled, failures trigger a bounded retry loop that re-checks the corrected output against the same schema.
- high
Search for any other LLM call sites whose outputs are consumed without going through a structured output parser (e.g., raw `content`/string outputs returned to workflow execution). Add/route them through a schema gate like `N8nStructuredOutputParser` to ensure consistent rejection/retry behavior.
- packages/@n8n/nodes-langchain/utils/output_parsers/N8nStructuredOutputParser.ts:1-179 — Demonstrates the intended pattern: schema-validated parsing with rejection on mismatch; this can be used as the standard routing target.
- med
If additional structured output formats exist beyond the current Zod-from-JSON-schema path, factor them into the same parsing/validation interface so all formats share identical error messages and retry semantics.
- packages/@n8n/nodes-langchain/utils/output_parsers/N8nOutputFixingParser.ts:1-96 — Shows the retry loop depends on `this.outputParser` (same schema gate); ensuring format parity will preserve this guarantee.
Grounding / wrongness check 100%
The codebase contains a concrete grounding/wrongness check primitive implemented as LLM-as-judge correctness evaluation plus robust judge-output parsing (with multiple output-format fallbacks). This enables verdicts (pass/fail) derived from comparing generated output against expected context, rather than surfacing raw model text without verification.
- high
Extend the wrongness-check coverage from offline evaluations to any production paths where AI outputs are acted upon (e.g., auto-executed workflow edits or direct user-facing factual claims). Ensure there is a deterministic verdict gate (schema-validated verdict + bounded retries/fallback) between generation and action.
- packages/@n8n/agents/src/evals/correctness.ts:1-31 — Currently provides grounding/wrongness checks in the eval framework; confirm whether analogous gating exists in runtime production decision points.
- med
Add explicit tests asserting end-to-end that judge verdict parsing fails safely (e.g., returns undefined/throws) and cannot be interpreted as a pass when parsing fails.
- packages/@n8n/agents/src/evals/parse-judge-response.ts:1-33 — Parsing has fallbacks; add regression tests for malformed/ambiguous judge outputs to ensure the system doesn’t accept an incorrect verdict.
Self-correction / feedback loop 0%
I did not find a closed self-correction feedback loop that takes a specific check/validation error, injects it into the next model attempt, and re-checks with bounded retries. The closest pattern is a bounded retry loop in the checklist verifier, but it retries without feeding back the failure details into the prompt.
- high
Implement a closed feedback loop in the checklist verifier: when agent.generate throws or when parsed. results is missing/empty, capture the exact error (e.g., exception message, structuredOutput parsing failure reason, or validation mismatch) and append it to the next attempt’s user message/instructions (e.g., an additional section like “Previous attempt failure: … Fix accordingly”). Keep MAX_VERIFY_ATTEMPTS and ensure a safe fallback still returns an empty result if all attempts fail.
- packages/@n8n/instance-ai/evaluations/checklist/verifier.ts:52-111 — Bounded retry exists, but failure information is not injected into subsequent prompts; it only logs/warns and re-runs with the same message construction.
- med
Add a targeted test that simulates structuredOutput/JSON schema failures and asserts the next attempt includes the prior failure message and that parsing succeeds on retry (or safely falls back after MAX_VERIFY_ATTEMPTS).
- packages/@n8n/instance-ai/evaluations/checklist/verifier.ts:52-111 — There is no shown mechanism or test coverage here that validates prompt-level feedback on specific failures.
Evaluation harness + scoring 100%
The repo includes a full evaluation harness with scoring and artifact output: evaluators run via a central harness runner, harness-level score aggregation is implemented in a dedicated score-calculator module, and evaluation results are persisted to disk (including summary.json) for offline regression measurement. Implementation quality is strong and appears to support both local and LangSmith modes with reusable, testable components.
- med
Verify and document the recurrence/golden-set comparison workflow end-to-end (e.g., how outputDir artifacts map to golden datasets and how diffs/regression thresholds are computed in CI), and ensure failing scores route back to the operator/CI gate.
- packages/@n8n/ai-workflow-builder.ee/evaluations/harness/output.ts:1-220 — Artifacts are saved to disk, but the code slices reviewed did not confirm the specific CI/recurring comparison against a golden baseline.
- packages/@n8n/ai-workflow-builder.ee/evaluations/harness/runner.ts:1-200 — Runner supports scoring feedback collection, but the reviewed slices did not show the full gate/threshold comparison loop that enforces quality on recurring runs.
Runnable correctness checks 67%
The repository contains a runnable correctness-check primitive in the form of the `packages/testing/code-health` CLI, which runs rule checks and returns clear process exit codes (0 on pass; 1 on violations; 2 on internal errors). I did not find evidence of a root-level documented single command or CI workflow entrypoint wiring it, so the completeness of the “one entrypoint” story is only partial.
- high
Add/confirm a top-level, documentation-backed command (e.g., a repo root `pnpm` script or Make/Just target) that runs `packages/testing/code-health/src/cli.ts` with the intended arguments, so agents can discover a standard one-command pass/fail check without spelunking.
- packages/testing/code-health/src/cli.ts:1-105 — While the CLI itself has the pass/fail semantics, I did not locate (via code-graph queries) a root-level `package.json`/CI workflow entrypoint in this audit environment to confirm the standard command wiring.
- med
Document the exact CLI invocation contract (supported commands/flags, expected env vars like `CODE_HEALTH_CHANGED_FILES`, and what constitutes pass/fail) in a README at `packages/testing/code-health/` so the correctness signal is externally governed.
- packages/testing/code-health/src/cli.ts:1-105 — The behavior and exit codes are present in code, but there was no evidence in the audited slices of accompanying documentation explaining how to run it.
Actionable diagnostics N/A
The codebase contains a governed diagnostics primitive via custom ESLint rules in `packages/@n8n/eslint-config`. The rules emit structured, rule-specific messages and often include autofixers, turning lint failures into actionable “what/where/how to fix” diagnostics.
- high
Add/verify a repo-level runnable check documentation (e.g., `npm run lint` / `pnpm lint`) and ensure it surfaces ESLint rule IDs + file/line locations in CI logs, so diagnostics are actionable outside of local development.
- packages/@n8n/eslint-config/src/plugin.ts:1-33 — Confirms the diagnostics are wired into an ESLint plugin and enabled in a recommended config, but does not itself demonstrate how the repo runs lint in CI.
- med
For the most important rules, ensure each rule uses `meta.messages`/`messageId` consistently (and includes an autofix where safe), extending the existing pattern shown by `no-plain-errors` and `no-json-parse-json-stringify`.
- packages/@n8n/eslint-config/src/rules/no-plain-errors.ts:1-50 — Shows the desired pattern: named messageId and fix implementation.
Positive confirmation 67%
The codebase has an explicit positive confirmation mechanism in the AI workflow evaluation CLI: successful completion ends with exit code 0, while exceptions end with exit code 1. However, the inline comment indicates pass/fail is treated as informational rather than mapped to the exit code, limiting the strength of “correctness” signaling.
- high
If the intended primitive is “confirm correct (green) vs wrong (fail) so agents/CI can safely stop,” update the CLI to map evaluation pass/fail (based on the computed `summary` / score vs threshold) to the process exit code (e.g., exit 0 only when pass, exit 2 or 1 when fail) instead of always exiting 0 on successful completion.
- packages/@n8n/ai-workflow-builder.ee/evaluations/cli/index.ts:520-690 — Currently: always `process.exit(0)` after `runEvaluation`, with pass/fail described as informational (comment). This can be insufficient if correctness must gate CI/agent stopping.
Machine-readable contracts 83%
This codebase contains strong machine-readable contract artifacts, primarily via exported Zod schemas (extension and package manifests) and explicit JSON-schema-driven tool input validation in the agent layer. These contracts are treated as source-of-truth for validation and are supported by automated tests.
- high
Identify the production call sites that consume the exported schemas (beyond the schema definitions/tests) and ensure there is an explicit, documented path for agents/tools to retrieve the contract artifacts (e.g., stable imports/entrypoints or generated schema outputs).
- packages/@n8n/extension-sdk/src/schema.ts:1-97 — Schema exists, but this audit slice only confirms the contract artifact; it does not yet confirm every consuming surface uses the exported schema as the gate.
- packages/cli/src/modules/n8n-packages/spec/manifest.schema.ts:1-38 — Schema exists with refinements, but we did not yet read the handler/loader code paths that ingest manifests using this schema.
- med
For workspace manifests, consider migrating/adding an explicit Zod schema (or equivalent JSON-schema artifact) alongside the parser to make the contract more uniformly machine-readable for downstream tooling.
- packages/@n8n/instance-ai/src/workspace/workspace-manifest.ts:1-40 — Current contract is enforced via parsing logic, but it is not exported as a standalone schema artifact equivalent to the Zod-based manifests.
Not applicable to this codebase: Orchestrated pipelines, Data lineage / provenance, Feature management, Actionable diagnostics.