Building a Document Intelligence Platform with NestJS, n8n, and PostgreSQL

A document question-answering system is often reduced to a short pipeline: extract text, split it into chunks, create embeddings, retrieve relevant chunks, and generate an answer.

That description is technically correct, but architecturally incomplete.

Once I had the first end-to-end flow working, embeddings were no longer the main concern. The harder questions were about ownership and state:

When does an uploaded document become searchable?
Which component owns the document lifecycle?
What happens if indexing succeeds only partially?
Where should retrieval logic live?
How can a poor answer be investigated?
How should a failed operation be retried without corrupting state?

These are familiar software architecture problems. Similar questions appear in payment processing, order fulfillment, and other systems where work crosses service and transaction boundaries.

I built this platform to examine those concerns in a concrete implementation. NestJS provides the application boundary, n8n coordinates multi-step workflows, PostgreSQL with pgvector stores durable state and performs retrieval, and external model APIs provide embeddings and responses.

What surprised me during implementation was how quickly the basic retrieval path came together compared with the time spent deciding when data should become visible and which component was allowed to change it.

The resulting system is intentionally small. Its value as a case study comes from making ownership, state transitions, and failure modes visible.

The Problem Behind Document Intelligence

Document processing becomes complex when an upload stops being a single request and becomes a lifecycle.

A file must be validated, parsed, normalized, divided into chunks, embedded, and persisted before it can participate in retrieval. A question follows another multi-stage path: it is recorded, embedded, matched against document chunks, submitted with context, and completed with an answer and its sources.

Each stage introduces an architectural concern.

State ownership: Multiple components participate in processing, but only one should define the authoritative document and question state.

Lifecycle management: A stored document is not necessarily a searchable document. The system needs explicit transitions between pending, indexing, indexed, and failed.

Traceability: An answer alone is insufficient for diagnosis. The system must retain references and metadata for the chunks and documents that influenced it.

Failure recovery: Network calls, workflow executions, and database writes can fail independently. Recovery must account for partial progress and duplicate execution.

The retrieval algorithm matters, but it sits inside this larger operational model. The primary design problem is deciding which component owns each responsibility and what happens when work crosses those boundaries.

Architecture Overview

The platform contains four main runtime responsibilities:

NestJS exposes public and internal APIs, validates uploads, extracts text, owns application state, performs retrieval, and persists results.
n8n sequences the indexing and question-answering workflows.
PostgreSQL with pgvector stores documents, chunks, questions, answers, sources, and vectors.
OpenAI APIs create embeddings and return responses from supplied document context.

Docker Compose runs NestJS, n8n, and PostgreSQL locally. The application and n8n use separate databases on the same PostgreSQL server, so workflow metadata does not share a schema with application data.

Figure 1: The service boundary keeps application state in NestJS and PostgreSQL while n8n coordinates calls to external APIs.

The central boundary is between application ownership and workflow coordination.

NestJS is the only component that directly owns application records. n8n can request searches and state changes, but it does so through protected internal endpoints. It does not write directly to the application tables.

This arrangement keeps the public contract independent of workflow implementation details. It also prevents the workflow engine from becoming a second backend with its own interpretation of document and question state.

Architecture Decisions

Why NestJS Owns Application State

The backend owns validation, persistence, status transitions, and retrieval because those operations enforce application rules.

I initially considered allowing n8n to write chunks and statuses directly to PostgreSQL. It would have removed an internal HTTP call and reduced the number of steps in each workflow. After tracing a few failure scenarios, however, it became unclear whether the workflow or the backend was responsible for enforcing lifecycle rules. That ambiguity was more costly than the extra request.

If n8n wrote directly to PostgreSQL, business rules would be split between TypeScript services, workflow nodes, and SQL embedded in workflow definitions. Changes to the schema or lifecycle would then require coordinated updates across multiple ownership points.

Keeping writes behind NestJS costs an additional internal HTTP hop. In return, the system gets one application boundary for validation, authorization, persistence, and future transactional changes.

Why n8n Is Used for Workflow Orchestration

Indexing and question answering both require ordered calls across the backend and external APIs. n8n makes those sequences visible and allows each step to be inspected during development.

The trade-off is that workflow definitions become deployable artifacts with their own versioning, credentials, execution history, and failure behavior. They must be treated as code, not as an informal collection of nodes.

For that reason, n8n coordinates work but does not define the durable state model.

Figure 2: The indexing workflow sequences validation, chunking, embedding requests, and the callback that persists the completed chunk set.

Why PostgreSQL with pgvector

The platform already needed relational storage for documents, questions, statuses, and source metadata. pgvector allowed vectors and relational records to remain in the same database.

I also evaluated whether the retrieval layer should begin with a dedicated vector database. At this stage, that would have introduced another service, another consistency boundary, and a synchronization path for metadata already held in PostgreSQL. Keeping relational and vector data together was the more practical choice until scale or query requirements justify the additional moving parts.

This avoids an additional data service and keeps retrieval filters close to application state. For example, the search query can exclude chunks whose parent documents are not indexed.

A dedicated vector store may become appropriate at a different scale or with different query requirements. The current design does not assume that operational complexity before it is needed.

Why Source References Are Persisted

Every completed question stores the document and chunk metadata used to produce its answer. This turns source identification into part of the application record rather than leaving it only in temporary workflow context.

The additional storage is small compared with the diagnostic value. It allows an engineer to identify which records the retrieval stage returned. It does not preserve a snapshot of the retrieved text, so exact reconstruction would require storing the chunk content or version alongside the answer.

Why Synchronous Processing Was Chosen Initially

The public upload and question requests currently wait for their n8n workflows to finish.

This is not the preferred production model for long-running work. It was chosen because it makes state transitions and failures easy to observe while validating the design. A request either returns the completed resource or exposes the failure path immediately.

The cost is direct coupling between HTTP latency and workflow duration. Moving to asynchronous processing is therefore an expected architectural change, not an optional optimization.

Document Lifecycle Management

Document States

The backend records four document states:

pending: the document record exists but processing has not started;
indexing: text extraction succeeded and the indexing workflow is running;
indexed: the complete chunk set has been stored and can be searched;
failed: extraction or workflow execution did not complete.

These states define visibility. Retrieval includes chunks only when their parent document is indexed.

Upload Flow

The upload path is:

POST /documents validates the MIME type and 10 MB size limit.
NestJS creates a pending document record.
The backend extracts and normalizes text.
The document moves to indexing.
NestJS invokes the n8n indexing webhook.
n8n splits the text into overlapping chunks.
An embedding is requested for each chunk.
n8n submits the complete chunk collection to an internal endpoint.
NestJS stores the chunks and marks the document as indexed.

Figure 3: The indexing sequence shows how document ownership remains with NestJS while processing crosses workflow and API boundaries.

Figure 4: A completed indexing request returns the persisted document only after its chunks have been stored and its status has changed to indexed.

Re-Indexing Strategy

The internal chunk endpoint deletes the existing chunks for a document before saving the replacement collection. The unique constraint on (document_id, chunk_index) prevents duplicate chunk positions within one document.

Replacement is preferable to appending because it avoids mixing chunks produced by different extraction or chunking configurations.

Consistency Considerations

The indexed status is the gate between incomplete and searchable data. This prevents partially processed documents from entering retrieval under normal execution.

However, status checks alone do not make the write sequence atomic.

Transaction Boundaries

Deleting old chunks, inserting the replacement collection, and setting the document to indexed currently occur as separate database operations.

If insertion fails after deletion, the previous index has already been removed. If the status update fails after insertion, valid chunks may exist while the document remains unavailable to retrieval.

These operations should run in one database transaction. A stronger re-indexing design could also build a versioned chunk set and switch the active version only after the complete replacement is ready, preserving the previous searchable version during processing.

Retrieval Architecture

Chunk Storage Strategy

The indexing workflow splits normalized text into chunks of up to 1,200 characters with a 200-character overlap. It prefers a paragraph, sentence, or whitespace boundary when one appears late enough in the candidate chunk.

Each stored chunk includes:

A generated chunk ID.
Its parent document ID.
A zero-based chunk index.
The extracted text.
Its embedding.

The chunk index provides stable ordering within a document and participates in the uniqueness constraint.

Embeddings and Vector Storage

Embeddings are stored in a vector(1536) column. The dimension matches the configured text-embedding-3-small model.

This creates an intentional schema dependency. Changing to a model with a different vector size requires a migration and re-indexing of existing documents.

The migration creates an HNSW index with vector_cosine_ops for non-null embeddings.

Figure 5: Document chunks retain their parent document, ordering, content, and vector representation inside the application database.

Similarity Search

For each question, the backend returns up to five of the closest available chunks using cosine distance:

SELECT
  chunk.id AS "chunkId",
  chunk.document_id AS "documentId",
  chunk.chunk_index AS "chunkIndex",
  document.file_name AS "fileName",
  chunk.content,
  1 - (chunk.embedding <=> $1::vector) AS similarity
FROM document_chunks chunk
INNER JOIN documents document
  ON document.id = chunk.document_id
WHERE chunk.embedding IS NOT NULL
  AND document.status = $3
ORDER BY chunk.embedding <=> $1::vector
LIMIT $2;

The workflow currently supplies a limit of five, while the backend query remains parameterized and accepts limits from 1 to 20. There is no minimum similarity threshold, so the query ranks the available chunks and returns the nearest results even when their similarity is weak. That is acceptable for the current implementation, but thresholding or post-retrieval filtering would be worth evaluating before production use.

Why Retrieval Logic Lives in NestJS

Retrieval depends on application rules, not only vector distance. It must understand document status today and will need to understand tenant, collection, and authorization scope in a shared deployment.

Keeping the query in NestJS places those filters inside the same boundary that owns the schema and access rules. n8n receives retrieval results without gaining direct database access.

Why PostgreSQL Was Sufficient

The current workload benefits from joins between vectors and relational metadata. PostgreSQL can perform that work without synchronizing document state into a separate retrieval service.

This decision should be revisited based on measured volume, latency, filtering requirements, and operational constraints. It is not a claim that one storage design fits every retrieval workload.

Question Processing Workflow

Question Creation

POST /questions creates a question with pending status, an empty source collection, and no answer.

Question Embedding

NestJS invokes the n8n question workflow. The workflow validates the request and creates an embedding from the question text.

Retrieval Process

n8n sends the question embedding and a limit of five to the protected search endpoint. NestJS searches chunks belonging to indexed documents and returns their content and source metadata.

Response Generation

The workflow labels the retrieved chunks as sources and builds a request containing the question and document context. Its instructions require the response to use only the supplied context, cite source labels, and state when the documents do not contain enough information.

Answer Persistence and Status Updates

After receiving a response, n8n calls the backend completion endpoint with the answer and source metadata. NestJS stores both and changes the question status to completed.

If the workflow fails before completion, the question is marked failed and retains an error message.

Figure 6: The question sequence separates question state, retrieval, response generation, and final persistence across their owning components.

Figure 7: The n8n workflow coordinates question embedding, protected retrieval, context construction, response generation, and completion.

Traceability and Source References

The completed question record stores:

Chunk ID.
Document ID.
Chunk index.
File name.
Similarity score.

Figure 8: The completed question stores the answer together with the document and chunk references returned by retrieval.

This metadata supports both user-facing verification and operational diagnosis.

When an answer is incorrect, the first investigation is retrieval: did the search return the relevant chunks? If it did not, the likely causes include extraction, chunk boundaries, embedding quality, filtering, or ranking.

If the correct evidence was retrieved but the answer was still wrong, the investigation moves to context construction and response generation.

Persisting sources separates these failure classes. Without them, an engineer sees only the final text and must reconstruct transient workflow state from logs, assuming those logs still exist.

This is a limited form of observability. A production system should also record model identifiers, workflow versions, timing, retrieval parameters, and correlation IDs so that a result can be reproduced more reliably.

Security Boundaries

Public APIs

Clients use /documents and /questions. In the current local implementation, these routes do not require user authentication.

Internal APIs

n8n uses /internal/* to store chunks, search them, and complete questions. Saving a document's chunks also changes its status to indexed. Public clients are not expected to call these routes.

Webhook Authentication

Calls from NestJS to n8n include x-webhook-secret. Each workflow validates that value before processing its payload.

Internal API Protection

Calls from n8n to NestJS include x-internal-api-key. A guard protects the internal controller.

These shared secrets make the trust boundary explicit and prevent accidental access during local development. They are not a complete security design for a public deployment.

Additional Production Requirements

An internet-facing version would require:

User authentication and authorization.
Tenant and document-level access checks.
Private networking for internal routes.
Managed secret storage and rotation.
HTTPS and request rate limits.
Malware scanning for uploaded files.
Audit records.
Resource and cost limits.

The current search covers every indexed document. In a multi-user system, retrieval must enforce scope in the database query rather than relying on workflow input to filter results.

Failure Handling

Failure handling is where the service boundaries are tested.

My first implementation concentrated on successful workflow execution. Once I started considering lost webhook responses, repeated requests, and failures between chunk deletion and insertion, recovery began to shape the design as much as retrieval did. The difficult part was not detecting that something failed; it was determining whether the operation had already completed and what could be repeated safely.

Failed Document Processing

The backend creates the document record before extraction and indexing. If extraction fails or the n8n call throws an error, NestJS marks the document as failed.

There is one defensive check: if the workflow completed and set the document to indexed but the webhook response was lost, the backend reads the current status and returns the indexed document instead of overwriting it with failed.

This handles one ambiguous network outcome, but it is not a general recovery mechanism.

Failed Question Processing

Questions follow a similar pattern. If the workflow call fails, NestJS checks whether the question was already completed. A completed result is returned; otherwise, the question becomes failed with an error message.

Again, this protects against a lost response after successful completion, but not against every partial failure.

Retry Considerations

External API calls are not retried. A production retry policy should distinguish transient failures from permanent failures, apply bounded exponential backoff, and cap total processing time.

Retries also need operation-level visibility. An execution record should identify the document or question, attempt number, failure stage, timestamps, and final disposition.

Idempotency

Retrying is unsafe unless each operation is idempotent.

For indexing, (document_id, chunk_index) provides a stable logical identity, but deleting and recreating the complete set still needs a transaction or versioned replacement strategy.

For question completion, the backend should reject or conditionally apply late updates based on the current status, attempt ID, or workflow version. Otherwise, a delayed execution can overwrite a newer result.

Duplicate Workflow Executions

A timeout can cause a caller to repeat a webhook request while the first execution is still running. The current system does not assign an idempotency key or execution token, so duplicate workflows may perform the same external calls and writes.

An execution identifier should be created by the application and propagated through every workflow call and callback. The backend can then accept each state transition once.

Partial Failures

The main partial-failure window is chunk replacement: old chunks are deleted, new chunks are inserted, and document status is updated in separate operations.

Other windows include a generated answer that is not persisted, a persisted completion whose response is lost, and an external request that succeeds after the local caller has timed out.

These are normal distributed-system outcomes. They should be represented in the design rather than treated as exceptional edge cases.

Recovery Strategies

A production design should combine:

Transactional state changes within PostgreSQL.
Idempotency keys across HTTP and workflow boundaries.
Durable execution records.
Bounded retries for transient failures.
Dead-letter handling for exhausted attempts.
Replay tooling for failed operations.
Conditional state transitions.
Correlation IDs across backend, workflow, and external calls.

The goal is not to prevent every failure. It is to make each operation recoverable without guessing whether it already completed.

Production Considerations

Current System Constraints

The present implementation has deliberate limits:

Uploads are limited to 10 MB.
Scanned PDFs are unsupported because OCR is not included.
TXT and JSON files are decoded as UTF-8, and JSON structure is not validated.
Chunks are sized by character count rather than token count.
Embeddings are fixed at 1,536 dimensions.
Each question retrieves up to five chunks.
Searches include all indexed documents.
Uploads and questions wait synchronously for workflows.
Workflow files must be imported and published manually.
Automated end-to-end tests are not present.

These constraints define the current operating envelope. They should remain visible because they affect correctness, latency, isolation, and capacity.

Production Improvements

The first production change would be to separate request acceptance from processing. Upload and question endpoints should return 202 Accepted with an operation identifier, while workers process durable jobs from a queue.

Queues would provide buffering and controlled concurrency, but they would not remove the need for idempotency. At-least-once delivery means duplicate execution must be expected.

Database transactions should protect chunk replacement and lifecycle transitions. Metrics and traces should cover extraction time, chunk counts, embedding calls, retrieval latency, workflow duration, retry counts, and failure stage.

Tenant isolation must be enforced in storage and retrieval queries. Monitoring should detect stuck indexing or pending records, repeated failures, queue backlog, external API latency, and abnormal processing cost.

OCR should be introduced as a separate extraction path with its own status and diagnostics rather than hidden inside generic document parsing.

These changes address scalability and operations, but more importantly, they make ownership and recovery explicit as concurrency increases.

Lessons Learned

What Surprised Me Most

Retrieval worked earlier than I expected. Once text extraction, chunking, embeddings, and the pgvector query were connected, the system could return relevant document sections without much architectural complexity.

The larger effort went into ownership and lifecycle management. I spent more time reasoning about state transitions, duplicate execution, and recovery after ambiguous network outcomes than about embeddings. In hindsight, model behavior was only one dependency inside a system whose correctness depended on application boundaries.

State Ownership Matters More Than Expected

The system became easier to reason about once NestJS was treated as the authority for application state. Workflow visibility is useful, but workflow execution history is not a substitute for an application state model.

Lifecycle Management Is a First-Class Concern

An uploaded document, an extracted document, and a searchable document are different states. Making those differences explicit prevents incomplete data from leaking into retrieval and gives operators a meaningful view of progress.

Retrieval Is Only Part of the Problem

Similarity search can return useful context while the surrounding system remains unreliable or impossible to diagnose. Retrieval quality matters, but so do access scope, source retention, state transitions, and reproducibility.

Failure Recovery Shapes Architecture

Retries, idempotency, and transaction boundaries cannot be added cleanly after ownership is fragmented. The expected failure modes influence API contracts, database constraints, workflow payloads, and status transitions from the beginning.

Workflow Engines Should Not Become Application Backends

n8n is effective for visible, multi-step coordination. It becomes harder to govern when it also owns business rules, persistence, and authorization. Keeping workflows behind an application boundary preserves their value without creating a second source of truth.

Conclusion

Building this platform reinforced that document processing is primarily a state and ownership problem.

NestJS owns the application contract and durable state transitions. n8n coordinates work across service boundaries. PostgreSQL stores relational data and vectors in one consistency domain. External model APIs remain replaceable dependencies rather than owners of application behavior.

The current synchronous implementation makes the lifecycle easy to inspect, but it also exposes the changes required for production: asynchronous jobs, transactional updates, idempotent callbacks, tenant-scoped retrieval, and stronger operational visibility.

Before building it, I expected retrieval quality to dominate the design. During implementation, ownership and recoverability became the more consequential concerns because they determined whether any result could be trusted, explained, or reproduced after a failure.

The project also changed how I evaluate workflow-based systems. I now look first at who owns durable state, how transitions are guarded, and what happens when a callback succeeds but its response is lost. Those questions reveal more about the reliability of the design than the successful path does.

The central lesson is not tied to a particular framework or model. Systems that cross process and network boundaries need explicit ownership, observable state transitions, and recovery paths designed for partial failure.

GitHub Repository

https://github.com/Rumman90/document-intelligence-platform

Command Palette