If your business has people in the field writing reports at night, this case study is your reference architecture. DokuAI is the app I built to validate the pattern; the same blueprint applies to any “people on the move generating structured records” problem — and it’s the exact stack I deliver to clients.
The problem worth €100,000 a year per company
Construction site managers, technicians, inspectors, claims adjusters — anyone whose job involves visiting locations and documenting findings — spend two hours after the workday typing up notes, sorting photos, mapping memory to addresses. For a 20-person field team that is roughly 40 hours of paid time per day going into administration, not service delivery.
This is not a software problem. It is a workflow problem that software can erase. DokuAI proves it: 45 minutes of report-typing collapses to 90 seconds of voice memo plus a photo. The structure, the location, the formatting all happen on the way back to the office.
The architecture — six small services, one Flutter app
A monolith here would be a mistake. The heavy work is asynchronous (transcription takes seconds to minutes), the failure modes are independent (a temporary Whisper outage must not block report assembly), and the data tenants want separation (each business sees only its own jobs). The system I shipped has six Quarkus services on Kubernetes:
api-service
The only public surface. Owns authentication, rate limiting, request validation, response shaping. Every other service is internal. This is the boundary every audit and pen test will inspect — and the boundary that decides whether you can swap implementations behind it without affecting the app.
transcription-service
Whisper-based audio-to-text. Runs against a queue, not a synchronous request. A 30-second memo from a noisy building site can take 15 seconds to transcribe with high accuracy; making the iOS app wait for that synchronously would produce 504s on every poor mobile connection.
image-service
OpenAI Vision plus a deterministic OCR fallback pipeline. Reads what is in the photo: machinery model, damage type, room number on a door sign. The fallback matters — Vision is statistical, OCR is not, and reports need to be reproducible.
location-service
Geocoding, project-boundary checks, GPS smoothing. Bridges the gap between “I was on site” and “I was at this exact address that the claims database recognises”. Surprisingly hard problem; surprisingly large impact on the user’s downstream paperwork.
report-service
Owns the typed report templates and the assembly logic. The output is structured, validated, ready for downstream PDF generation or ERP import. Templates are versioned; old reports stay reproducible after template changes.
feedback-service
Cross-app feedback ingest. Not specific to DokuAI — every app I ship reports into the same service so I can see where users actually struggle. For a client, this becomes a single shared piece of infrastructure across their app portfolio.
Why Kafka between the services, not REST
If api-service called transcription-service over REST and waited for a response, a single slow GPU on the Whisper side would propagate to every iOS app over 3G. Instead, api-service posts to a Kafka topic, returns immediately with a job ID, and the heavy services consume the topic at their own pace. The mobile app polls a status endpoint that reads from a Kafka-fed projection — fast, stateless, and resilient to any downstream slowdown.
The cost is operational: Kafka becomes a service you have to run. The benefit is that the entire backend can absorb a five-minute partial outage without dropping a single report. For a business that bills its customers based on documented work, that resilience is the difference between confident scaling and Friday-afternoon panic.
The Flutter side
One codebase, iOS and Android, native-feeling UX. Voice memo recording uses the platform-native APIs (not a web shim), photo capture goes through the camera with offline buffering, and every upload is resumable across network drops. Two months in production, the App Store rating is 5.0.
Outcomes — what changed for the people using it
- Report drafting: 45 minutes → 90 seconds. Per person, per report. Multiply by your team size and report frequency.
- Location and image evidence attached automatically. No more “where was this photo taken?” five days later.
- Multi-language transcription out of the box — German construction-site jargon included after a small post-processing pass.
- App Store rating 5.0 as of this writing. Daily-active retention 73 %.
These numbers are not theoretical. They come from real users on real sites.
What this means for your project
If your business has field staff documenting work — construction, inspection, real estate, healthcare, technical service, claims — you already have the use case for this exact pattern. What changes is the report template, the integrations into your ERP, the specific evidence types you collect. The skeleton is identical.
The same stack is what I deliver:
- Flutter on the front for cross-platform parity, or native Swift/Kotlin if your context requires it.
- Quarkus on Kubernetes for the backend, sized to your traffic shape.
- Kafka for async resilience, only if your workload justifies it (smaller teams ship without it).
- A typed report engine you fully own — no template lock-in.
Production scope of this size lands at three to four months from kickoff. An MVP of just the voice-to-report core ships in two to four weeks. See pricing for the typical tier ranges, start a project brief if your situation looks like the DokuAI shape.
See it on the App Store — DokuAI is free to install. Try it on a real site and you will see exactly what 90-second reports feel like in the hand.