Engineering Documentation

This document is the public engineering reference for the Police Narratives AI platform. It is the document referenced by the MILLIONHARI LLC Secure Development Policy as the "Engineering Documentation Link." It describes the high-level architecture, environments, secure-by-design principles, privacy-by-design principles, coding standards, and security testing practices that govern engineering work at MILLIONHARI LLC.

1. SYSTEM OVERVIEW

Police Narratives AI is an AI-backed police reporting platform that generates professional incident reports from audio transcription and AI-assisted narrative generation. The platform is delivered as a monorepo composed of three applications and a set of supporting serverless functions:

Web client (client/) — React (Create React App) + TypeScript, TipTap rich-text editor, TailwindCSS, and DaisyUI.
API server (server/) — Express.js + TypeScript, Passport.js authentication, Redis-backed sessions, Drizzle ORM over PostgreSQL.
Mobile (PoliceNarrativesMobile/) — React Native via Expo.
Serverless functions — AWS Lambda functions for Whisper-based transcription, Amazon Transcribe with speaker diarization, transcription completion handling, and AWS Bedrock narrative generation.

2. CLOUD AND INFRASTRUCTURE

Production runs in AWS GovCloud (us-gov-west-1). The platform is built around the following services:

Amazon ECR + ECS — Container images for the client and server are pushed to ECR and run as ECS services in the PoliceNarrativesAIProd cluster.
Amazon S3 — Object storage for narrative audio, interaction audio, transcription artifacts, and form templates. Buckets are scoped per workload.
AWS Lambda — Asynchronous audio processing, transcription, and narrative generation.
Amazon Transcribe — Speaker-diarized transcription for interaction audio.
AWS Bedrock — Foundation-model access for AI narrative generation.
PostgreSQL — Primary relational data store, with schema managed by Drizzle ORM migrations.
Redis — Session store for both individual and agency-admin authentication.
Stripe — Subscription billing for individual users.
Sentry — Error monitoring and release tracking.

3. ENVIRONMENTS

MILLIONHARI LLC maintains three logically segregated environments covering the entire system development life cycle:

Development — Engineer workstations running local PostgreSQL and Redis. Only synthetic or anonymized data is used.
Test / Staging — Used for integration testing, user acceptance, and migration validation prior to production.
Production — The customer-facing environment. AWS credentials, Stripe keys, and Bedrock access are scoped exclusively to this environment and are not shared with development or test.

Customer (Confidential) data must never be copied into development or test environments without the explicit written permission of the data owner and the Founder, in accordance with the Secure Development Policy.

4. SECURE-BY-DESIGN PRINCIPLES

The following principles, drawn from the Secure Development Policy, are applied to every engineering decision. They are non-negotiable and reviewers verify them during code review.

4.1 Minimize attack surface area

Endpoints, IAM permissions, and S3 bucket policies are scoped to the minimum required. The interaction audio bucket has no S3 event notifications because Lambdas are invoked directly by the server, eliminating an unnecessary public trigger.

4.2 Establish secure defaults

Routes default to authenticated and authorized; opt-in is required to expose anything publicly. The Helmet middleware is enabled system-wide. CJIS session limits (12-hour maximum, 30-minute inactivity timeout) are enforced server-side.

4.3 Principle of least privilege

Officers see only their own cases; Records-role users see all agency cases; agency admins manage their own agency. Access checks are enforced server-side in caseAccess rather than relying on the client to hide UI.

4.4 Defense in depth

Authentication, authorization, MFA, rate limiting, Helmet headers, and input validation all run independently. A failure in any single layer does not expose data, because the next layer must also authorize the request.

4.5 Fail securely

Error paths default to denial. When transcription processing is cancelled, the server atomically aborts in-progress S3 multipart uploads, deletes completed files, cancels Amazon Transcribe jobs, and marks case files as failed before discarding the case. No state is left in a partially-trusted condition.

4.6 Don't trust services

Lambda callbacks (transcription complete, narrative complete) are treated as untrusted input: case identifiers are re-validated, the server checks that the case still exists and has not been discarded, and the file completion guard prevents wasted Bedrock invocations on cancelled cases.

4.7 Separation of duties

Development, review, and deployment are not performed end-to-end by a single individual. The pull-request approval requirement and the dual-auth admin model (individual users vs. agency admins) both embody this principle.

4.8 Avoid security by obscurity

Security controls are documented and reviewable. The protection of the system depends on cryptographic keys and access controls, not on the secrecy of paths, parameter names, or API shapes.

4.9 Keep security simple

Security primitives are centralized: a single MFA service, a single session-timeout middleware, a single case-access helper. Engineers reuse these rather than implementing parallel logic.

4.10 Fix security issues correctly

Security findings are fixed at the root cause, regression-tested, and tracked through the standard change-control process. Patches that materially impact security are deployed within 90 days of discovery, in line with the Secure Development Policy.

5. PRIVACY-BY-DESIGN PRINCIPLES

5.1 Proactive not reactive

Privacy considerations are evaluated during design — before any code is written — rather than as a remediation step.

5.2 Privacy as the default

New features default to the most privacy-preserving setting. Audio retention, narrative content, and case data are never shared outside the agency that owns them without an explicit user action.

5.3 Privacy embedded into design

Multi-tenancy, role-based access control, and agency scoping are first-class concepts in the data model rather than bolt-on filters.

5.4 Full functionality — positive sum

Privacy and usability are designed together. The "Keep original video file" toggle, video audio extraction, and the cancellation flow each preserve full feature value while minimizing the data the system retains.

5.5 End-to-end security — full lifecycle protection

Audio is transmitted over TLS, stored in private S3 buckets, and processed by Lambdas under scoped IAM roles. Stale uploads are cleaned up by hourly cron jobs.

5.6 Visibility and transparency

The Privacy Policy and this Engineering Documentation are public. Customers can see what is collected, how it is processed, and the third-party AI service providers that participate.

5.7 Respect for user privacy

Users may review, update, or delete their data through the data subject access request workflow described in the Privacy Policy.

6. AUTHENTICATION AND ACCESS CONTROL

Police Narratives AI uses a dual-authentication model, both based on Passport.js with Redis-backed sessions:

Individual users authenticate at /login with email/password or OAuth providers (Google, Apple).
Agency administrators authenticate at /agency/login and may use SAML 2.0 or OIDC single-sign-on configured per agency.

Multi-factor authentication is configurable per agency for CJIS compliance and supports TOTP authenticator apps, WebAuthn (Touch ID, Face ID, security keys), and Email OTP. Trusted-device support reduces re-authentication friction within a configurable window. Backup codes are issued at MFA enrollment and are stored hashed.

Server-side session enforcement caps sessions at 12 hours and inactivity at 30 minutes. Routes that handle protected data are guarded by middleware that rejects requests from non-MFA-verified sessions.

Authorization is role-based:

Officer — Access limited to cases the officer owns.
Records — Access to all cases within the officer's agency.
Agency Admin — Manages agency configuration, invites, SSO, and MFA policies.

7. DATA PROTECTION

In transit — All client/server traffic is served over TLS. AWS service traffic uses AWS-managed TLS endpoints.
At rest — S3 buckets and the PostgreSQL database use AWS-managed encryption at rest. Sessions in Redis are scoped and protected by network policy.
Secrets — Credentials are supplied via environment variables and are never committed to source control. The .env.sample file documents required keys without values.
Test data — Production customer data is not used for development or test purposes without explicit written permission from the data owner and the Founder.

8. AUDIO PROCESSING PIPELINE

Audio is processed asynchronously through a pipeline of AWS Lambda functions. The pipeline is split into two flows depending on the audio type:

Narrative audio — Uploaded to a dedicated S3 bucket, transcribed by an OpenAI Whisper Lambda, and then handed off to the Bedrock Lambda for narrative generation.
Interaction audio — Uploaded to a dedicated S3 bucket, transcribed with speaker diarization by Amazon Transcribe, handled by a completion-handler Lambda, and then handed off to the Bedrock Lambda. Video files are processed through ffmpeg-based audio extraction within the Lambda before being sent to Transcribe.

Failure handling is explicit: Lambdas report errors back to the API server, which marks the affected case files and cases as failed. Stale uploads are cleaned up by an hourly cron job. Cancellation is atomic and removes all in-flight S3 objects, multipart uploads, and Transcribe jobs associated with the case.

9. DATABASE AND SCHEMA MANAGEMENT

All schema is defined in server/src/db/schema.ts using Drizzle ORM. The schema is the single source of truth and is auto-synced to the client and mobile applications during migration.
New schema changes are introduced by running yarn generate to produce a new migration file. Once committed and merged, migration files are immutable; corrections land as new migrations.
Migrations are applied with yarn migrate. The yarn push command is reserved for development use and is not run against production.
All database access is parameterized through the ORM; raw string-concatenated SQL is prohibited.

10. APPLICATION SECURITY CONTROLS

Helmet — Adds standard HTTP security headers to every response from the API server.
Rate limiting — Applied to authentication and other sensitive endpoints to mitigate brute-force and abuse.
Sentry — Initialized as the first import in the server entry point so that startup errors are captured.
Input validation — Endpoints validate request payloads and reject malformed input before it reaches business logic.
CORS — Restricted to the configured front-end origin(s).
OWASP Top 10 awareness — Engineers are responsible for designing against the OWASP Top 10 categories, including injection, broken access control, cryptographic failures, and vulnerable components.

11. CODING STANDARDS

Language and tooling — TypeScript across client, server, and mobile, with strict type checking enabled. ESLint and Prettier enforce formatting and basic correctness.
Quality — Functions are kept small and focused. Reusable logic is extracted into hooks (client) or services (server) rather than duplicated.
Commenting — Comments document the "why" behind non-obvious decisions. Self-explanatory code does not require commentary.
Security — Engineers follow the OWASP-aligned checklist documented in the Development Process Documentation and never bypass authentication, authorization, or session controls for convenience.
Migrations — Existing migration files are never edited; corrections land as new migrations.

12. SECURITY TESTING AND VULNERABILITY MANAGEMENT

Application code is scanned for known vulnerable dependencies prior to deployment. yarn audit findings are triaged during code review.
Security-relevant changes (authentication, authorization, MFA, payments, IAM, encryption) are flagged as Significant changes and require senior-developer approval before merging into a production branch.
Identified vulnerabilities that materially impact security are remediated and deployed within 90 days of discovery, in accordance with the Secure Development Policy.
No code is deployed to production without documented, successful test results and evidence that any identified security issues have been remediated.

13. THIRD-PARTY DEPENDENCIES

New third-party dependencies are evaluated for licensing, maintenance status, and known CVEs before being introduced.
The Police Narratives AI mobile and web clients pin dependency versions in yarn.lock to ensure reproducible builds.
Third-party AI service providers (Anthropic, OpenAI, Amazon Web Services AI) are governed by the Privacy Policy and the MILLIONHARI LLC Third-Party Management Policy.

14. DEVELOPER TRAINING

Software developers complete secure-development training appropriate to their role at least annually. Training covers, at minimum, the prevention of authorization bypass attacks, insecure session IDs, injection attacks, cross-site scripting, cross-site request forgery, and the use of vulnerable libraries.

15. MONITORING AND INCIDENT RESPONSE

Application errors are reported to Sentry. Significant error spikes following a deployment trigger investigation and, where required, rollback.
Audit logging captures security-relevant events (authentication, MFA, account changes) for review.
Suspected security incidents are reported to the Founder and handled in accordance with the MILLIONHARI LLC incident response procedures.

16. RELATED DOCUMENTS

Development Process Documentation — Code review, approval, testing, and release procedure.
Privacy Policy — Data collection, processing, and user rights.
MILLIONHARI LLC Secure Development Policy (internal).
MILLIONHARI LLC Operations Security Policy (internal).
MILLIONHARI LLC Third-Party Management Policy (internal).

17. CONTACT

Questions about this Engineering Documentation should be sent to support@policenarratives.ai.