Compare commits
10 commits
7369914492
...
fd494bd674
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
fd494bd674 | ||
|
|
80a8c520d2 | ||
|
|
83e7391538 | ||
|
|
f710dc3214 | ||
|
|
f70b4b34e4 | ||
|
|
3263db9f84 | ||
|
|
b60f352347 | ||
|
|
d2faaf718f | ||
|
|
252efa3950 | ||
|
|
0f01b0cb4e |
|
|
@ -18,4 +18,8 @@ STORAGE_REGION=us-east-1
|
||||||
STORAGE_FORCE_PATH_STYLE=true
|
STORAGE_FORCE_PATH_STYLE=true
|
||||||
|
|
||||||
# AI Config
|
# AI Config
|
||||||
GOOGLE_API_KEY=your_gemini_api_key_here
|
GOOGLE_API_KEY=your_gemini_api_key_here
|
||||||
|
|
||||||
|
# Redis Config
|
||||||
|
REDIS_HOST=localhost
|
||||||
|
REDIS_PORT=6379
|
||||||
1
.gitignore
vendored
1
.gitignore
vendored
|
|
@ -55,6 +55,5 @@ pids
|
||||||
# Diagnostic reports (https://nodejs.org/api/report.html)
|
# Diagnostic reports (https://nodejs.org/api/report.html)
|
||||||
report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json
|
report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json
|
||||||
|
|
||||||
/generated/prisma
|
|
||||||
src/schema.gql
|
src/schema.gql
|
||||||
/src/generated/prisma
|
/src/generated/prisma
|
||||||
|
|
|
||||||
128
README.md
128
README.md
|
|
@ -0,0 +1,128 @@
|
||||||
|
# Pandektes Case Law Challenge 🏛️
|
||||||
|
|
||||||
|
This is a NestJS-based legal document parsing application built for the Pandektes technical challenge. It extracts case law metadata from PDF and HTML documents using Gemini AI and stores it in a PostgreSQL database.
|
||||||
|
|
||||||
|
### Demo
|
||||||
|
|
||||||
|
<video width="100%" height="auto" controls>
|
||||||
|
<source src="./assets/demo.mp4" type="video/mp4">
|
||||||
|
Your browser does not support the video tag.
|
||||||
|
<a href="./assets/demo.mp4">Download the video here</a>.
|
||||||
|
</video>
|
||||||
|
|
||||||
|
## 🚀 Getting Started
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
- **Docker & Docker Compose**
|
||||||
|
- **Node.js (v20+)**
|
||||||
|
- **Gemini API Key** (Get one at [Google AI Studio](https://aistudio.google.com/))
|
||||||
|
NOTE: I attached billing to my google account to prevent hitting the free tier limits.
|
||||||
|
|
||||||
|
### Installation
|
||||||
|
1. **Clone the Repo**:
|
||||||
|
```bash
|
||||||
|
git clone [your-repo-url]
|
||||||
|
cd pandektes-challenge
|
||||||
|
```
|
||||||
|
2. **Environment Setup**:
|
||||||
|
Create a `.env` file in the root:
|
||||||
|
```env
|
||||||
|
# AI Config
|
||||||
|
GOOGLE_API_KEY=your_gemini_api_key_here
|
||||||
|
|
||||||
|
# Database (Standard Docker defaults)
|
||||||
|
DATABASE_URL="postgresql://postgres:postgres@localhost:5432/pandektes?schema=public"
|
||||||
|
REDIS_HOST="localhost"
|
||||||
|
REDIS_PORT=6379
|
||||||
|
|
||||||
|
# Storage (Local Minio)
|
||||||
|
STORAGE_ENDPOINT="http://localhost:9000"
|
||||||
|
STORAGE_BUCKET="cases"
|
||||||
|
STORAGE_REGION="us-east-1"
|
||||||
|
STORAGE_ACCESS_KEY="minioadmin"
|
||||||
|
STORAGE_SECRET_KEY="minioadmin"
|
||||||
|
STORAGE_FORCE_PATH_STYLE="true"
|
||||||
|
```
|
||||||
|
3. **Start Infrastructure**:
|
||||||
|
```bash
|
||||||
|
docker-compose up -d
|
||||||
|
```
|
||||||
|
4. **Install & Build**:
|
||||||
|
```bash
|
||||||
|
npm install
|
||||||
|
npx prisma migrate dev
|
||||||
|
npm run start:dev
|
||||||
|
```
|
||||||
|
|
||||||
|
### Usage
|
||||||
|
Once running, you can interact with the app in a few ways:
|
||||||
|
- **Web UI** — I created a basic interface so the app can be easily tested without additional setup. Visit [http://localhost:3000](http://localhost:3000) to upload files and search for cases.
|
||||||
|
- **GraphQL Playground** — Available at [http://localhost:3000/graphql](http://localhost:3000/graphql) for direct query/mutation testing.
|
||||||
|
- **Prisma Studio** — Run `npx prisma studio` to open a visual database browser and inspect the extracted case law entries directly.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🏗️ Architectural Decisions
|
||||||
|
|
||||||
|
### 1. Why a Background Queue (BullMQ)?
|
||||||
|
Large document processing is likely to be "spiky" and slow, in particular with added LLM calls. If we did this directly in the HTTP request, the user's connection would likely time out, as well as potentially blocking the event loop on the main thread.
|
||||||
|
I used sandboxed workers (running in separate processes) to circumvent that. This also ensures that if a particularly heavy PDF causes a memory leak or CPU spike, it doesn't crash the main API that serves other users.
|
||||||
|
|
||||||
|
### 2. S3-Compatible Storage (Minio)
|
||||||
|
Instead of saving files to the local disk, I used an S3-compatible service. Storing files on a local disk makes the app hard to scale effectiely. By using S3 patterns, the app is "cloud-ready" and I can just change the ENV variables to point to AWS S3.
|
||||||
|
|
||||||
|
### 3. Full-Document Parsing
|
||||||
|
I chose to send the full extracted text to Gemini rather than truncating it. Gemini Flash has a 1M token context window, so even a 50-page legal document barely scratches the surface. I did consider truncation, or limiting to just the start and end of the document (which likekly contains the most important information) but I don't have the domain knowledge of case laws to make that call.
|
||||||
|
I instead set a generous character cap (500k) to act as a safety net against abuse.
|
||||||
|
|
||||||
|
### 4. Language Handling
|
||||||
|
It wasn't clear from the requirements whether the AI should extract metadata in the document's original language or normalise everything to English. Since the provided documents include both Danish and English, I haven't enforced any language rules on the AI and left it open for now. This would be trivial to change by adding a language instruction to the prompt.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🛠️ Production Readiness (Next Steps)
|
||||||
|
|
||||||
|
If I were taking this to production, here's what I'd focus on:
|
||||||
|
|
||||||
|
### De-duplication
|
||||||
|
Currently, a user can re-upload the same document multiple times, creating duplicates in the database. This could be mitigated by:
|
||||||
|
- File hashing: Calculate a hash of the uploaded file before processing. This is quick and prevents the exact same file from being processed twice.
|
||||||
|
- Post-AI check: Compare the extracted case number against existing records. Slower, but more logically robust since two different files could describe the same case.
|
||||||
|
|
||||||
|
### File Upload Scaling
|
||||||
|
If files get large, buffering them through the NestJS server becomes a bottleneck. I'd look at removing the upload from NestJS entirely:
|
||||||
|
- The API generates a presigned upload URL (direct to an S3 bucket) and returns it to the frontend.
|
||||||
|
- The frontend uploads the file directly to storage — the NestJS server never touches the binary data.
|
||||||
|
- This makes the backend infinitely more scalable and cheaper to run, while cloud storage handles the heavy lifting.
|
||||||
|
|
||||||
|
### Input Validation
|
||||||
|
Currently, identifier formatting (UUID vs. Case Number) is handled via a helper function in the service layer. For production, I'd create a custom class validator so it fails at the entry point instead (i.e. "Fail Fast" principle).
|
||||||
|
|
||||||
|
### Worker Isolation
|
||||||
|
My queue implementation is a good first step (passing heavy work to a child process instead of blocking the main thread), but in production I'd look at completely isolating the workers — perhaps into their own container. This keeps NestJS as a lightweight entry point, while being able to spin up many separate workers for processing multiple PDFs simultaneously. Other improvements:
|
||||||
|
- Exponential Backoff: If the Gemini API is down for a few minutes, workers will fail immediately. I'd configure the queue with exponential backoff (e.g., retry in 5s, then 20s, then 1min).
|
||||||
|
- Dead Letter Queues (DLQ): If a file is so corrupted it fails after multiple retries, BullMQ should move it to a "failed" queue for manual human review rather than retrying forever.
|
||||||
|
- Worker Timeout: A particularly large PDF could "hang" the worker process. I'd set an explicit `lockDuration` or timeout on jobs so they don't block the queue indefinitely.
|
||||||
|
|
||||||
|
### Logging & Observability
|
||||||
|
- Audit logging: Track who is accessing what and when.
|
||||||
|
- Crash reporting: Integrate a service like Sentry for real-time error alerting.
|
||||||
|
- Health check pings: For container orchestration and uptime monitoring.
|
||||||
|
|
||||||
|
### Security
|
||||||
|
- Authentication & Authorisation: The `/graphql` endpoint is currently open. I'd implement Auth Guards (using Passport/JWT). Even if all users can upload, you might need to track who uploaded what for audit purposes.
|
||||||
|
- CORS: Currently defaults to open. I'd restrict CORS in `main.ts` to only allow trusted frontend domains.
|
||||||
|
- CSRF Protection: GraphQL is prone to CSRF when allowing standard `multipart/form-data`. I'd enable `csrfPrevention: true` in Apollo and require a custom header (like `x-apollo-operation-name`) on all requests.
|
||||||
|
- Rate Limiting: A malicious script could flood the queue with blank PDFs, costing money in AI tokens. I'd use `@nestjs/throttler` to limit uploads per IP per hour.
|
||||||
|
- File Scanning: Add an anti-virus layer (like ClamAV) before saving uploads to S3.
|
||||||
|
|
||||||
|
## 🧪 Testing
|
||||||
|
Run the suite with:
|
||||||
|
```bash
|
||||||
|
npm run test
|
||||||
|
```
|
||||||
|
I've focused the tests on the core parsing logic (`ParserService`) and utility functions, as these are the areas most likely to regress. In a production context, I'd expand coverage to include service-layer tests (e.g. verifying the queue receives the correct payload) and an E2E test for the full upload → process → query flow. I drew the line here to keep the scope reasonable for a challenge.
|
||||||
|
|
||||||
|
---
|
||||||
|
**Author**: George W.
|
||||||
|
**Challenge**: Pandektes Legal Tech Challenge
|
||||||
BIN
assets/demo.mp4
Normal file
BIN
assets/demo.mp4
Normal file
Binary file not shown.
|
|
@ -5,7 +5,7 @@ services:
|
||||||
restart: always
|
restart: always
|
||||||
environment:
|
environment:
|
||||||
POSTGRES_USER: ${POSTGRES_USER:-postgres}
|
POSTGRES_USER: ${POSTGRES_USER:-postgres}
|
||||||
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-yourPassword}
|
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-postgres}
|
||||||
POSTGRES_DB: ${POSTGRES_DB:-pandektes}
|
POSTGRES_DB: ${POSTGRES_DB:-pandektes}
|
||||||
ports:
|
ports:
|
||||||
- "${POSTGRES_PORT:-5432}:5432"
|
- "${POSTGRES_PORT:-5432}:5432"
|
||||||
|
|
|
||||||
|
|
@ -3,10 +3,6 @@
|
||||||
"collection": "@nestjs/schematics",
|
"collection": "@nestjs/schematics",
|
||||||
"sourceRoot": "src",
|
"sourceRoot": "src",
|
||||||
"compilerOptions": {
|
"compilerOptions": {
|
||||||
"deleteOutDir": true,
|
"deleteOutDir": true
|
||||||
"assets": [
|
|
||||||
"generated/prisma/package.json"
|
|
||||||
],
|
|
||||||
"watchAssets": true
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
|
||||||
113
package-lock.json
generated
113
package-lock.json
generated
|
|
@ -1,12 +1,12 @@
|
||||||
{
|
{
|
||||||
"name": "pandektes-challenge",
|
"name": "pandektes-challenge",
|
||||||
"version": "0.0.1",
|
"version": "1.0.0",
|
||||||
"lockfileVersion": 3,
|
"lockfileVersion": 3,
|
||||||
"requires": true,
|
"requires": true,
|
||||||
"packages": {
|
"packages": {
|
||||||
"": {
|
"": {
|
||||||
"name": "pandektes-challenge",
|
"name": "pandektes-challenge",
|
||||||
"version": "0.0.1",
|
"version": "1.0.0",
|
||||||
"hasInstallScript": true,
|
"hasInstallScript": true,
|
||||||
"license": "UNLICENSED",
|
"license": "UNLICENSED",
|
||||||
"dependencies": {
|
"dependencies": {
|
||||||
|
|
@ -23,8 +23,10 @@
|
||||||
"@nestjs/core": "^11.1.14",
|
"@nestjs/core": "^11.1.14",
|
||||||
"@nestjs/graphql": "^13.2.4",
|
"@nestjs/graphql": "^13.2.4",
|
||||||
"@nestjs/platform-express": "^11.0.1",
|
"@nestjs/platform-express": "^11.0.1",
|
||||||
|
"@nestjs/serve-static": "^5.0.4",
|
||||||
"@prisma/adapter-pg": "^7.4.2",
|
"@prisma/adapter-pg": "^7.4.2",
|
||||||
"@prisma/client": "^7.4.2",
|
"@prisma/client": "^7.4.2",
|
||||||
|
"axios": "^1.13.6",
|
||||||
"bullmq": "^5.70.1",
|
"bullmq": "^5.70.1",
|
||||||
"cheerio": "^1.2.0",
|
"cheerio": "^1.2.0",
|
||||||
"class-transformer": "^0.5.1",
|
"class-transformer": "^0.5.1",
|
||||||
|
|
@ -49,6 +51,7 @@
|
||||||
"@types/node": "^22.10.7",
|
"@types/node": "^22.10.7",
|
||||||
"@types/pg": "^8.18.0",
|
"@types/pg": "^8.18.0",
|
||||||
"@types/supertest": "^6.0.2",
|
"@types/supertest": "^6.0.2",
|
||||||
|
"dotenv": "^17.3.1",
|
||||||
"eslint": "^9.18.0",
|
"eslint": "^9.18.0",
|
||||||
"eslint-config-prettier": "^10.0.1",
|
"eslint-config-prettier": "^10.0.1",
|
||||||
"eslint-plugin-prettier": "^5.2.2",
|
"eslint-plugin-prettier": "^5.2.2",
|
||||||
|
|
@ -64,6 +67,9 @@
|
||||||
"tsconfig-paths": "^4.2.0",
|
"tsconfig-paths": "^4.2.0",
|
||||||
"typescript": "^5.7.3",
|
"typescript": "^5.7.3",
|
||||||
"typescript-eslint": "^8.20.0"
|
"typescript-eslint": "^8.20.0"
|
||||||
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": ">=20"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/@angular-devkit/core": {
|
"node_modules/@angular-devkit/core": {
|
||||||
|
|
@ -4366,6 +4372,33 @@
|
||||||
"tslib": "^2.1.0"
|
"tslib": "^2.1.0"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
"node_modules/@nestjs/serve-static": {
|
||||||
|
"version": "5.0.4",
|
||||||
|
"resolved": "https://registry.npmjs.org/@nestjs/serve-static/-/serve-static-5.0.4.tgz",
|
||||||
|
"integrity": "sha512-3kO1M9D3vsPyWPFardxIjUYeuolS58PnhCoBTkS7t3BrdZFZCKHnBZ15js+UOzOR2Q6HmD7ssGjLd0DVYVdvOw==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"path-to-regexp": "8.3.0"
|
||||||
|
},
|
||||||
|
"peerDependencies": {
|
||||||
|
"@fastify/static": "^8.0.4",
|
||||||
|
"@nestjs/common": "^11.0.2",
|
||||||
|
"@nestjs/core": "^11.0.2",
|
||||||
|
"express": "^5.0.1",
|
||||||
|
"fastify": "^5.2.1"
|
||||||
|
},
|
||||||
|
"peerDependenciesMeta": {
|
||||||
|
"@fastify/static": {
|
||||||
|
"optional": true
|
||||||
|
},
|
||||||
|
"express": {
|
||||||
|
"optional": true
|
||||||
|
},
|
||||||
|
"fastify": {
|
||||||
|
"optional": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
"node_modules/@nestjs/testing": {
|
"node_modules/@nestjs/testing": {
|
||||||
"version": "11.1.14",
|
"version": "11.1.14",
|
||||||
"resolved": "https://registry.npmjs.org/@nestjs/testing/-/testing-11.1.14.tgz",
|
"resolved": "https://registry.npmjs.org/@nestjs/testing/-/testing-11.1.14.tgz",
|
||||||
|
|
@ -6885,7 +6918,6 @@
|
||||||
"version": "0.4.0",
|
"version": "0.4.0",
|
||||||
"resolved": "https://registry.npmjs.org/asynckit/-/asynckit-0.4.0.tgz",
|
"resolved": "https://registry.npmjs.org/asynckit/-/asynckit-0.4.0.tgz",
|
||||||
"integrity": "sha512-Oei9OH4tRh0YqU3GxhX79dM/mwVgvbZJaSNaRk+bshkj0S5cfHcgYakreBjrHwatXKbz+IoIdYLxrKim2MjW0Q==",
|
"integrity": "sha512-Oei9OH4tRh0YqU3GxhX79dM/mwVgvbZJaSNaRk+bshkj0S5cfHcgYakreBjrHwatXKbz+IoIdYLxrKim2MjW0Q==",
|
||||||
"dev": true,
|
|
||||||
"license": "MIT"
|
"license": "MIT"
|
||||||
},
|
},
|
||||||
"node_modules/available-typed-arrays": {
|
"node_modules/available-typed-arrays": {
|
||||||
|
|
@ -6913,6 +6945,17 @@
|
||||||
"node": ">= 6.0.0"
|
"node": ">= 6.0.0"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
"node_modules/axios": {
|
||||||
|
"version": "1.13.6",
|
||||||
|
"resolved": "https://registry.npmjs.org/axios/-/axios-1.13.6.tgz",
|
||||||
|
"integrity": "sha512-ChTCHMouEe2kn713WHbQGcuYrr6fXTBiu460OTwWrWob16g1bXn4vtz07Ope7ewMozJAnEquLk5lWQWtBig9DQ==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"follow-redirects": "^1.15.11",
|
||||||
|
"form-data": "^4.0.5",
|
||||||
|
"proxy-from-env": "^1.1.0"
|
||||||
|
}
|
||||||
|
},
|
||||||
"node_modules/babel-jest": {
|
"node_modules/babel-jest": {
|
||||||
"version": "30.2.0",
|
"version": "30.2.0",
|
||||||
"resolved": "https://registry.npmjs.org/babel-jest/-/babel-jest-30.2.0.tgz",
|
"resolved": "https://registry.npmjs.org/babel-jest/-/babel-jest-30.2.0.tgz",
|
||||||
|
|
@ -7281,6 +7324,19 @@
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
"node_modules/c12/node_modules/dotenv": {
|
||||||
|
"version": "16.6.1",
|
||||||
|
"resolved": "https://registry.npmjs.org/dotenv/-/dotenv-16.6.1.tgz",
|
||||||
|
"integrity": "sha512-uBq4egWHTcTt33a72vpSG0z3HnPuIl6NqYcTrKEg2azoEyl2hpW0zqlxysq2pK9HlDIHyHyakeYaYnSAwd8bow==",
|
||||||
|
"devOptional": true,
|
||||||
|
"license": "BSD-2-Clause",
|
||||||
|
"engines": {
|
||||||
|
"node": ">=12"
|
||||||
|
},
|
||||||
|
"funding": {
|
||||||
|
"url": "https://dotenvx.com"
|
||||||
|
}
|
||||||
|
},
|
||||||
"node_modules/call-bind": {
|
"node_modules/call-bind": {
|
||||||
"version": "1.0.8",
|
"version": "1.0.8",
|
||||||
"resolved": "https://registry.npmjs.org/call-bind/-/call-bind-1.0.8.tgz",
|
"resolved": "https://registry.npmjs.org/call-bind/-/call-bind-1.0.8.tgz",
|
||||||
|
|
@ -7681,7 +7737,6 @@
|
||||||
"version": "1.0.8",
|
"version": "1.0.8",
|
||||||
"resolved": "https://registry.npmjs.org/combined-stream/-/combined-stream-1.0.8.tgz",
|
"resolved": "https://registry.npmjs.org/combined-stream/-/combined-stream-1.0.8.tgz",
|
||||||
"integrity": "sha512-FQN4MRfuJeHf7cBbBMJFXhKSDq+2kAArBlmRBvcvFE5BB1HZKXtSFASDhdlz9zOYwxh8lDdnvmMOe/+5cdoEdg==",
|
"integrity": "sha512-FQN4MRfuJeHf7cBbBMJFXhKSDq+2kAArBlmRBvcvFE5BB1HZKXtSFASDhdlz9zOYwxh8lDdnvmMOe/+5cdoEdg==",
|
||||||
"dev": true,
|
|
||||||
"license": "MIT",
|
"license": "MIT",
|
||||||
"dependencies": {
|
"dependencies": {
|
||||||
"delayed-stream": "~1.0.0"
|
"delayed-stream": "~1.0.0"
|
||||||
|
|
@ -8058,7 +8113,6 @@
|
||||||
"version": "1.0.0",
|
"version": "1.0.0",
|
||||||
"resolved": "https://registry.npmjs.org/delayed-stream/-/delayed-stream-1.0.0.tgz",
|
"resolved": "https://registry.npmjs.org/delayed-stream/-/delayed-stream-1.0.0.tgz",
|
||||||
"integrity": "sha512-ZySD7Nf91aLB0RxL4KGrKHBXl7Eds1DAmEdcoVawXnLD7SDhpNgtuII2aAkg7a7QS41jxPSZ17p4VdGnMHk3MQ==",
|
"integrity": "sha512-ZySD7Nf91aLB0RxL4KGrKHBXl7Eds1DAmEdcoVawXnLD7SDhpNgtuII2aAkg7a7QS41jxPSZ17p4VdGnMHk3MQ==",
|
||||||
"dev": true,
|
|
||||||
"license": "MIT",
|
"license": "MIT",
|
||||||
"engines": {
|
"engines": {
|
||||||
"node": ">=0.4.0"
|
"node": ">=0.4.0"
|
||||||
|
|
@ -8186,9 +8240,10 @@
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/dotenv": {
|
"node_modules/dotenv": {
|
||||||
"version": "16.6.1",
|
"version": "17.3.1",
|
||||||
"resolved": "https://registry.npmjs.org/dotenv/-/dotenv-16.6.1.tgz",
|
"resolved": "https://registry.npmjs.org/dotenv/-/dotenv-17.3.1.tgz",
|
||||||
"integrity": "sha512-uBq4egWHTcTt33a72vpSG0z3HnPuIl6NqYcTrKEg2azoEyl2hpW0zqlxysq2pK9HlDIHyHyakeYaYnSAwd8bow==",
|
"integrity": "sha512-IO8C/dzEb6O3F9/twg6ZLXz164a2fhTnEWb95H23Dm4OuN+92NmEAlTrupP9VW6Jm3sO26tQlqyvyi4CsnY9GA==",
|
||||||
|
"dev": true,
|
||||||
"license": "BSD-2-Clause",
|
"license": "BSD-2-Clause",
|
||||||
"engines": {
|
"engines": {
|
||||||
"node": ">=12"
|
"node": ">=12"
|
||||||
|
|
@ -8212,6 +8267,18 @@
|
||||||
"url": "https://dotenvx.com"
|
"url": "https://dotenvx.com"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
"node_modules/dotenv-expand/node_modules/dotenv": {
|
||||||
|
"version": "16.6.1",
|
||||||
|
"resolved": "https://registry.npmjs.org/dotenv/-/dotenv-16.6.1.tgz",
|
||||||
|
"integrity": "sha512-uBq4egWHTcTt33a72vpSG0z3HnPuIl6NqYcTrKEg2azoEyl2hpW0zqlxysq2pK9HlDIHyHyakeYaYnSAwd8bow==",
|
||||||
|
"license": "BSD-2-Clause",
|
||||||
|
"engines": {
|
||||||
|
"node": ">=12"
|
||||||
|
},
|
||||||
|
"funding": {
|
||||||
|
"url": "https://dotenvx.com"
|
||||||
|
}
|
||||||
|
},
|
||||||
"node_modules/dunder-proto": {
|
"node_modules/dunder-proto": {
|
||||||
"version": "1.0.1",
|
"version": "1.0.1",
|
||||||
"resolved": "https://registry.npmjs.org/dunder-proto/-/dunder-proto-1.0.1.tgz",
|
"resolved": "https://registry.npmjs.org/dunder-proto/-/dunder-proto-1.0.1.tgz",
|
||||||
|
|
@ -8398,7 +8465,6 @@
|
||||||
"version": "2.1.0",
|
"version": "2.1.0",
|
||||||
"resolved": "https://registry.npmjs.org/es-set-tostringtag/-/es-set-tostringtag-2.1.0.tgz",
|
"resolved": "https://registry.npmjs.org/es-set-tostringtag/-/es-set-tostringtag-2.1.0.tgz",
|
||||||
"integrity": "sha512-j6vWzfrGVfyXxge+O0x5sh6cvxAog0a/4Rdd2K36zCMV5eJ+/+tOAngRO8cODMNWbVRdVlmGZQL2YS3yR8bIUA==",
|
"integrity": "sha512-j6vWzfrGVfyXxge+O0x5sh6cvxAog0a/4Rdd2K36zCMV5eJ+/+tOAngRO8cODMNWbVRdVlmGZQL2YS3yR8bIUA==",
|
||||||
"dev": true,
|
|
||||||
"license": "MIT",
|
"license": "MIT",
|
||||||
"dependencies": {
|
"dependencies": {
|
||||||
"es-errors": "^1.3.0",
|
"es-errors": "^1.3.0",
|
||||||
|
|
@ -9064,6 +9130,26 @@
|
||||||
"dev": true,
|
"dev": true,
|
||||||
"license": "ISC"
|
"license": "ISC"
|
||||||
},
|
},
|
||||||
|
"node_modules/follow-redirects": {
|
||||||
|
"version": "1.15.11",
|
||||||
|
"resolved": "https://registry.npmjs.org/follow-redirects/-/follow-redirects-1.15.11.tgz",
|
||||||
|
"integrity": "sha512-deG2P0JfjrTxl50XGCDyfI97ZGVCxIpfKYmfyrQ54n5FO/0gfIES8C/Psl6kWVDolizcaaxZJnTS0QSMxvnsBQ==",
|
||||||
|
"funding": [
|
||||||
|
{
|
||||||
|
"type": "individual",
|
||||||
|
"url": "https://github.com/sponsors/RubenVerborgh"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"license": "MIT",
|
||||||
|
"engines": {
|
||||||
|
"node": ">=4.0"
|
||||||
|
},
|
||||||
|
"peerDependenciesMeta": {
|
||||||
|
"debug": {
|
||||||
|
"optional": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
"node_modules/for-each": {
|
"node_modules/for-each": {
|
||||||
"version": "0.3.5",
|
"version": "0.3.5",
|
||||||
"resolved": "https://registry.npmjs.org/for-each/-/for-each-0.3.5.tgz",
|
"resolved": "https://registry.npmjs.org/for-each/-/for-each-0.3.5.tgz",
|
||||||
|
|
@ -9128,7 +9214,6 @@
|
||||||
"version": "4.0.5",
|
"version": "4.0.5",
|
||||||
"resolved": "https://registry.npmjs.org/form-data/-/form-data-4.0.5.tgz",
|
"resolved": "https://registry.npmjs.org/form-data/-/form-data-4.0.5.tgz",
|
||||||
"integrity": "sha512-8RipRLol37bNs2bhoV67fiTEvdTrbMUYcFTiy3+wuuOnUog2QBHCZWXDRijWQfAkhBj2Uf5UnVaiWwA5vdd82w==",
|
"integrity": "sha512-8RipRLol37bNs2bhoV67fiTEvdTrbMUYcFTiy3+wuuOnUog2QBHCZWXDRijWQfAkhBj2Uf5UnVaiWwA5vdd82w==",
|
||||||
"dev": true,
|
|
||||||
"license": "MIT",
|
"license": "MIT",
|
||||||
"dependencies": {
|
"dependencies": {
|
||||||
"asynckit": "^0.4.0",
|
"asynckit": "^0.4.0",
|
||||||
|
|
@ -9145,7 +9230,6 @@
|
||||||
"version": "1.52.0",
|
"version": "1.52.0",
|
||||||
"resolved": "https://registry.npmjs.org/mime-db/-/mime-db-1.52.0.tgz",
|
"resolved": "https://registry.npmjs.org/mime-db/-/mime-db-1.52.0.tgz",
|
||||||
"integrity": "sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg==",
|
"integrity": "sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg==",
|
||||||
"dev": true,
|
|
||||||
"license": "MIT",
|
"license": "MIT",
|
||||||
"engines": {
|
"engines": {
|
||||||
"node": ">= 0.6"
|
"node": ">= 0.6"
|
||||||
|
|
@ -9155,7 +9239,6 @@
|
||||||
"version": "2.1.35",
|
"version": "2.1.35",
|
||||||
"resolved": "https://registry.npmjs.org/mime-types/-/mime-types-2.1.35.tgz",
|
"resolved": "https://registry.npmjs.org/mime-types/-/mime-types-2.1.35.tgz",
|
||||||
"integrity": "sha512-ZDY+bPm5zTTF+YpCrAU9nK0UgICYPT0QtT1NZWFv4s++TNkcgVaT0g6+4R2uI4MjQjzysHB1zxuWL50hzaeXiw==",
|
"integrity": "sha512-ZDY+bPm5zTTF+YpCrAU9nK0UgICYPT0QtT1NZWFv4s++TNkcgVaT0g6+4R2uI4MjQjzysHB1zxuWL50hzaeXiw==",
|
||||||
"dev": true,
|
|
||||||
"license": "MIT",
|
"license": "MIT",
|
||||||
"dependencies": {
|
"dependencies": {
|
||||||
"mime-db": "1.52.0"
|
"mime-db": "1.52.0"
|
||||||
|
|
@ -12592,6 +12675,12 @@
|
||||||
"node": ">= 0.10"
|
"node": ">= 0.10"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
"node_modules/proxy-from-env": {
|
||||||
|
"version": "1.1.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/proxy-from-env/-/proxy-from-env-1.1.0.tgz",
|
||||||
|
"integrity": "sha512-D+zkORCbA9f1tdWRK0RaCR3GPv50cMxcrz4X8k5LTSUD1Dkw47mKJEZQNunItRTkWwgtaUSo1RVFRIG9ZXiFYg==",
|
||||||
|
"license": "MIT"
|
||||||
|
},
|
||||||
"node_modules/pure-rand": {
|
"node_modules/pure-rand": {
|
||||||
"version": "7.0.1",
|
"version": "7.0.1",
|
||||||
"resolved": "https://registry.npmjs.org/pure-rand/-/pure-rand-7.0.1.tgz",
|
"resolved": "https://registry.npmjs.org/pure-rand/-/pure-rand-7.0.1.tgz",
|
||||||
|
|
|
||||||
29
package.json
29
package.json
|
|
@ -1,28 +1,19 @@
|
||||||
{
|
{
|
||||||
"name": "pandektes-challenge",
|
"name": "pandektes-challenge",
|
||||||
"version": "0.0.1",
|
"version": "1.0.0",
|
||||||
"description": "",
|
"description": "Legal document parsing API — extracts case law metadata from PDF/HTML using Gemini AI",
|
||||||
"author": "",
|
"author": "George W.",
|
||||||
"private": true,
|
"private": true,
|
||||||
"license": "UNLICENSED",
|
"license": "UNLICENSED",
|
||||||
"scripts": {
|
"scripts": {
|
||||||
"build": "nest build",
|
"build": "nest build && cp src/generated/prisma/package.json dist/src/generated/prisma/package.json",
|
||||||
"format": "prettier --write \"src/**/*.ts\" \"test/**/*.ts\"",
|
|
||||||
"start": "nest start",
|
"start": "nest start",
|
||||||
"start:dev": "nest start --watch",
|
"start:dev": "nest start --watch",
|
||||||
"start:debug": "nest start --debug --watch",
|
"start:prod": "npm run build && nest start",
|
||||||
"start:prod": "node dist/main",
|
|
||||||
"lint": "eslint \"{src,apps,libs,test}/**/*.ts\" --fix",
|
|
||||||
"test": "jest",
|
|
||||||
"test:watch": "jest --watch",
|
|
||||||
"test:cov": "jest --coverage",
|
|
||||||
"test:debug": "node --inspect-brk -r tsconfig-paths/register -r ts-node/register node_modules/.bin/jest --runInBand",
|
|
||||||
"test:e2e": "jest --config ./test/jest-e2e.json",
|
|
||||||
"postinstall": "prisma generate"
|
"postinstall": "prisma generate"
|
||||||
},
|
},
|
||||||
"dependencies": {
|
"dependencies": {
|
||||||
"@apollo/server": "^5.4.0",
|
"@apollo/server": "^5.4.0",
|
||||||
"@as-integrations/express5": "^1.1.2",
|
|
||||||
"@aws-sdk/client-s3": "^3.1000.0",
|
"@aws-sdk/client-s3": "^3.1000.0",
|
||||||
"@aws-sdk/s3-request-presigner": "^3.1000.0",
|
"@aws-sdk/s3-request-presigner": "^3.1000.0",
|
||||||
"@langchain/core": "^1.1.29",
|
"@langchain/core": "^1.1.29",
|
||||||
|
|
@ -34,8 +25,10 @@
|
||||||
"@nestjs/core": "^11.1.14",
|
"@nestjs/core": "^11.1.14",
|
||||||
"@nestjs/graphql": "^13.2.4",
|
"@nestjs/graphql": "^13.2.4",
|
||||||
"@nestjs/platform-express": "^11.0.1",
|
"@nestjs/platform-express": "^11.0.1",
|
||||||
|
"@nestjs/serve-static": "^5.0.4",
|
||||||
"@prisma/adapter-pg": "^7.4.2",
|
"@prisma/adapter-pg": "^7.4.2",
|
||||||
"@prisma/client": "^7.4.2",
|
"@prisma/client": "^7.4.2",
|
||||||
|
"axios": "^1.13.6",
|
||||||
"bullmq": "^5.70.1",
|
"bullmq": "^5.70.1",
|
||||||
"cheerio": "^1.2.0",
|
"cheerio": "^1.2.0",
|
||||||
"class-transformer": "^0.5.1",
|
"class-transformer": "^0.5.1",
|
||||||
|
|
@ -59,7 +52,7 @@
|
||||||
"@types/jest": "^30.0.0",
|
"@types/jest": "^30.0.0",
|
||||||
"@types/node": "^22.10.7",
|
"@types/node": "^22.10.7",
|
||||||
"@types/pg": "^8.18.0",
|
"@types/pg": "^8.18.0",
|
||||||
"@types/supertest": "^6.0.2",
|
"dotenv": "^17.3.1",
|
||||||
"eslint": "^9.18.0",
|
"eslint": "^9.18.0",
|
||||||
"eslint-config-prettier": "^10.0.1",
|
"eslint-config-prettier": "^10.0.1",
|
||||||
"eslint-plugin-prettier": "^5.2.2",
|
"eslint-plugin-prettier": "^5.2.2",
|
||||||
|
|
@ -67,15 +60,15 @@
|
||||||
"jest": "^30.0.0",
|
"jest": "^30.0.0",
|
||||||
"prettier": "^3.4.2",
|
"prettier": "^3.4.2",
|
||||||
"prisma": "^7.4.2",
|
"prisma": "^7.4.2",
|
||||||
"source-map-support": "^0.5.21",
|
|
||||||
"supertest": "^7.0.0",
|
|
||||||
"ts-jest": "^29.2.5",
|
"ts-jest": "^29.2.5",
|
||||||
"ts-loader": "^9.5.2",
|
|
||||||
"ts-node": "^10.9.2",
|
"ts-node": "^10.9.2",
|
||||||
"tsconfig-paths": "^4.2.0",
|
"tsconfig-paths": "^4.2.0",
|
||||||
"typescript": "^5.7.3",
|
"typescript": "^5.7.3",
|
||||||
"typescript-eslint": "^8.20.0"
|
"typescript-eslint": "^8.20.0"
|
||||||
},
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": ">=20"
|
||||||
|
},
|
||||||
"jest": {
|
"jest": {
|
||||||
"moduleFileExtensions": [
|
"moduleFileExtensions": [
|
||||||
"js",
|
"js",
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
-- AlterTable
|
||||||
|
ALTER TABLE "CaseLaw" ADD COLUMN "metadata" JSONB,
|
||||||
|
ADD COLUMN "storageKey" TEXT;
|
||||||
|
|
@ -0,0 +1,7 @@
|
||||||
|
-- CreateEnum
|
||||||
|
CREATE TYPE "CaseStatus" AS ENUM ('PENDING', 'PROCESSING', 'COMPLETED', 'FAILED');
|
||||||
|
|
||||||
|
-- AlterTable
|
||||||
|
ALTER TABLE "CaseLaw" ADD COLUMN "logs" TEXT[],
|
||||||
|
ADD COLUMN "processingError" TEXT,
|
||||||
|
ADD COLUMN "status" "CaseStatus" NOT NULL DEFAULT 'PENDING';
|
||||||
|
|
@ -13,18 +13,28 @@ datasource db {
|
||||||
provider = "postgresql"
|
provider = "postgresql"
|
||||||
}
|
}
|
||||||
|
|
||||||
model CaseLaw {
|
enum CaseStatus {
|
||||||
id String @id @default(uuid())
|
PENDING
|
||||||
title String
|
PROCESSING
|
||||||
decisionType String?
|
COMPLETED
|
||||||
decisionDate DateTime?
|
FAILED
|
||||||
office String?
|
}
|
||||||
court String?
|
|
||||||
caseNumber String?
|
model CaseLaw {
|
||||||
summary String? @db.Text
|
id String @id @default(uuid())
|
||||||
storageKey String?
|
title String
|
||||||
fileType String
|
decisionType String?
|
||||||
metadata Json?
|
decisionDate DateTime?
|
||||||
createdAt DateTime @default(now())
|
office String?
|
||||||
updatedAt DateTime @updatedAt
|
court String?
|
||||||
|
caseNumber String?
|
||||||
|
summary String? @db.Text
|
||||||
|
storageKey String?
|
||||||
|
fileType String
|
||||||
|
status CaseStatus @default(PENDING)
|
||||||
|
processingError String?
|
||||||
|
logs String[]
|
||||||
|
metadata Json?
|
||||||
|
createdAt DateTime @default(now())
|
||||||
|
updatedAt DateTime @updatedAt
|
||||||
}
|
}
|
||||||
|
|
|
||||||
288
public/index.html
Normal file
288
public/index.html
Normal file
|
|
@ -0,0 +1,288 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
|
||||||
|
<head>
|
||||||
|
<meta charset="UTF-8">
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||||
|
<title>Pandektes Tech Challenge</title>
|
||||||
|
<script src="https://cdn.tailwindcss.com"></script>
|
||||||
|
<style>
|
||||||
|
body {
|
||||||
|
background-image: radial-gradient(circle at 2px 2px, rgba(255, 255, 255, 0.05) 1px, transparent 0);
|
||||||
|
background-size: 40px 40px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.no-scrollbar::-webkit-scrollbar {
|
||||||
|
width: 8px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.no-scrollbar::-webkit-scrollbar-thumb {
|
||||||
|
background: rgba(255, 255, 255, 0.1);
|
||||||
|
border-radius: 999px;
|
||||||
|
}
|
||||||
|
</style>
|
||||||
|
</head>
|
||||||
|
|
||||||
|
<body class="bg-slate-900 text-slate-50 min-h-screen flex items-center justify-center m-0">
|
||||||
|
<div
|
||||||
|
class="bg-slate-800/70 backdrop-blur-xl border border-white/10 rounded-3xl p-8 md:p-12 w-full max-w-xl text-center shadow-2xl">
|
||||||
|
<h1 class="text-3xl font-bold mb-2">Pandektes Tech Challenge</h1>
|
||||||
|
<p class="text-slate-400 mb-8">PDF/HTML metadata extraction</p>
|
||||||
|
|
||||||
|
<div class="mb-8">
|
||||||
|
<input type="file" id="file-input" accept=".pdf,.html"
|
||||||
|
class="block w-full text-sm text-slate-400 file:mr-4 file:py-3 file:px-6 file:rounded-xl file:border-0 file:text-sm file:font-semibold file:bg-indigo-500/10 file:text-indigo-400 hover:file:bg-indigo-500/20 cursor-pointer transition-colors" />
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<button id="upload-btn" onclick="handleUpload()"
|
||||||
|
class="bg-indigo-500 hover:bg-indigo-600 text-white font-semibold py-4 px-8 rounded-xl w-full transition-all disabled:bg-slate-700 disabled:cursor-not-allowed shadow-lg shadow-indigo-500/20">
|
||||||
|
Upload and Extract
|
||||||
|
</button>
|
||||||
|
|
||||||
|
<div class="mt-8 border-t border-white/5 pt-8">
|
||||||
|
<div class="text-[10px] uppercase tracking-widest text-slate-500 font-bold mb-3">Lookup Previous</div>
|
||||||
|
<div class="flex space-x-2">
|
||||||
|
<input id="search-input" placeholder="ID or Case Number"
|
||||||
|
class="bg-black/20 border border-white/10 rounded-xl px-4 py-2 text-sm grow focus:outline-none focus:border-indigo-500 transition-colors" />
|
||||||
|
<button onclick="handleSearch()"
|
||||||
|
class="bg-slate-700 hover:bg-slate-600 px-4 py-2 rounded-xl text-sm transition-colors grow-0 whitespace-nowrap">Find
|
||||||
|
Case</button>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div id="result-box"
|
||||||
|
class="mt-8 hidden text-left bg-indigo-500/5 rounded-2xl border border-indigo-500/10 overflow-hidden shadow-inner">
|
||||||
|
<div class="flex items-center justify-between px-6 py-4 bg-white/5 border-b border-white/5">
|
||||||
|
<div id="result-status" class="text-sm font-bold flex items-center"></div>
|
||||||
|
<div id="result-actions"></div>
|
||||||
|
</div>
|
||||||
|
<pre id="result-data"
|
||||||
|
class="p-6 text-[11px] font-mono max-h-96 overflow-y-auto text-slate-400 no-scrollbar"></pre>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div id="tray-view"
|
||||||
|
class="fixed bottom-6 right-6 flex flex-col-reverse space-y-reverse space-y-4 z-50 w-80 max-h-[80vh] overflow-y-auto no-scrollbar">
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<template id="job-template">
|
||||||
|
<div
|
||||||
|
class="bg-slate-800/95 backdrop-blur-xl border border-indigo-500/30 rounded-2xl shadow-2xl overflow-hidden ring-1 ring-white/5 animate-in fade-in slide-in-from-right-4">
|
||||||
|
<div class="bg-slate-700/50 px-4 py-2 flex items-center justify-between border-b border-white/5">
|
||||||
|
<div class="flex items-center space-x-2 truncate">
|
||||||
|
<div class="status-pulse w-2 h-2 rounded-full bg-green-500 animate-pulse shrink-0"></div>
|
||||||
|
<span
|
||||||
|
class="filename-display text-[10px] uppercase tracking-widest text-slate-300 font-bold truncate"></span>
|
||||||
|
</div>
|
||||||
|
<button class="remove-btn text-slate-500 hover:text-white shrink-0">
|
||||||
|
<svg class="w-3.5 h-3.5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||||
|
<path d="M6 18L18 6M6 6l12 12"></path>
|
||||||
|
</svg>
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
<div
|
||||||
|
class="terminal-view p-3 text-[10px] font-mono h-32 overflow-y-auto bg-black/40 text-indigo-300 no-scrollbar">
|
||||||
|
<div class="logs-container space-y-1">
|
||||||
|
<div class="text-green-500/70">✔ Registered...</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="footer-actions p-2 border-t border-white/5 bg-slate-900/50 hidden">
|
||||||
|
<button
|
||||||
|
class="view-result-btn w-full py-1.5 text-[10px] font-bold uppercase bg-indigo-600 hover:bg-indigo-500 text-white rounded-lg transition-colors">View
|
||||||
|
Data</button>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</template>
|
||||||
|
|
||||||
|
<script>
|
||||||
|
const activeJobs = {};
|
||||||
|
|
||||||
|
async function fetchGraphQL(query, variables = {}, isFileUpload = false) {
|
||||||
|
const options = { method: 'POST' };
|
||||||
|
|
||||||
|
if (isFileUpload) {
|
||||||
|
const formData = new FormData();
|
||||||
|
formData.append('operations', JSON.stringify({ query, variables: { file: null } }));
|
||||||
|
formData.append('map', JSON.stringify({ '0': ['variables.file'] }));
|
||||||
|
formData.append('0', variables.file);
|
||||||
|
options.body = formData;
|
||||||
|
} else {
|
||||||
|
options.headers = { 'Content-Type': 'application/json' };
|
||||||
|
options.body = JSON.stringify({ query, variables });
|
||||||
|
}
|
||||||
|
|
||||||
|
const response = await fetch('/graphql', options);
|
||||||
|
const result = await response.json();
|
||||||
|
|
||||||
|
if (result.errors) throw new Error(result.errors[0].message);
|
||||||
|
return result.data;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function handleUpload() {
|
||||||
|
const fileInput = document.getElementById('file-input');
|
||||||
|
const uploadBtn = document.getElementById('upload-btn');
|
||||||
|
const file = fileInput.files[0];
|
||||||
|
|
||||||
|
if (!file) {
|
||||||
|
alert('Please select a file first.');
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const originalText = uploadBtn.innerText;
|
||||||
|
uploadBtn.disabled = true;
|
||||||
|
uploadBtn.innerText = 'Queueing...';
|
||||||
|
|
||||||
|
const query = `mutation($file: Upload!) { uploadCase(file: $file) { id } }`;
|
||||||
|
|
||||||
|
try {
|
||||||
|
const data = await fetchGraphQL(query, { file }, true);
|
||||||
|
const jobId = data.uploadCase.id;
|
||||||
|
|
||||||
|
activeJobs[jobId] = { status: 'PENDING', logCount: 0, caseData: null, filename: file.name };
|
||||||
|
createJobTerminal(jobId, file.name);
|
||||||
|
|
||||||
|
fileInput.value = '';
|
||||||
|
uploadBtn.innerText = 'Upload Another';
|
||||||
|
} catch (error) {
|
||||||
|
alert('Error: ' + error.message);
|
||||||
|
uploadBtn.innerText = originalText;
|
||||||
|
} finally {
|
||||||
|
uploadBtn.disabled = false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
function createJobTerminal(jobId, filename) {
|
||||||
|
const template = document.getElementById('job-template');
|
||||||
|
const clone = template.content.cloneNode(true);
|
||||||
|
const container = clone.querySelector('div');
|
||||||
|
|
||||||
|
container.id = `job-${jobId}`;
|
||||||
|
clone.querySelector('.filename-display').textContent = `Worker ${jobId}`;
|
||||||
|
clone.querySelector('.status-pulse').id = `pulse-${jobId}`;
|
||||||
|
clone.querySelector('.terminal-view').id = `terminal-${jobId}`;
|
||||||
|
clone.querySelector('.logs-container').id = `logs-${jobId}`;
|
||||||
|
clone.querySelector('.footer-actions').id = `footer-${jobId}`;
|
||||||
|
|
||||||
|
clone.querySelector('.remove-btn').onclick = () => document.getElementById(`job-${jobId}`).remove();
|
||||||
|
clone.querySelector('.view-result-btn').onclick = () => renderResultView(jobId);
|
||||||
|
|
||||||
|
document.getElementById('tray-view').appendChild(clone);
|
||||||
|
}
|
||||||
|
|
||||||
|
function renderResultView(jobId) {
|
||||||
|
const job = activeJobs[jobId];
|
||||||
|
if (!job || !job.caseData) return;
|
||||||
|
|
||||||
|
const caseData = job.caseData;
|
||||||
|
const resultBox = document.getElementById('result-box');
|
||||||
|
const resultStatus = document.getElementById('result-status');
|
||||||
|
const resultActions = document.getElementById('result-actions');
|
||||||
|
const resultData = document.getElementById('result-data');
|
||||||
|
|
||||||
|
resultBox.classList.remove('hidden');
|
||||||
|
|
||||||
|
if (caseData.status === 'COMPLETED') {
|
||||||
|
resultStatus.innerHTML = `<span class="text-green-400">✔ ${job.filename}</span>`;
|
||||||
|
resultActions.innerHTML = `<a href="${caseData.downloadUrl}" target="_blank" class="px-3 py-1.5 bg-indigo-500 hover:bg-indigo-600 text-white text-[10px] font-bold uppercase tracking-wider rounded-lg transition-all shadow-lg shadow-indigo-500/20">View File</a>`;
|
||||||
|
|
||||||
|
const displayData = JSON.stringify(caseData, (key, value) => ['logs', 'downloadUrl'].includes(key) ? undefined : value, 2);
|
||||||
|
resultData.textContent = displayData;
|
||||||
|
} else {
|
||||||
|
resultStatus.innerHTML = `<span class="text-red-400">❌ Error: ${job.filename}</span>`;
|
||||||
|
resultActions.innerHTML = '';
|
||||||
|
resultData.textContent = caseData.processingError || 'Unknown error occurred during processing.';
|
||||||
|
}
|
||||||
|
|
||||||
|
resultBox.scrollIntoView({ behavior: 'smooth' });
|
||||||
|
}
|
||||||
|
|
||||||
|
async function handleSearch() {
|
||||||
|
const searchInput = document.getElementById('search-input').value;
|
||||||
|
if (!searchInput) return;
|
||||||
|
|
||||||
|
const resultBox = document.getElementById('result-box');
|
||||||
|
const resultStatus = document.getElementById('result-status');
|
||||||
|
const resultActions = document.getElementById('result-actions');
|
||||||
|
const resultData = document.getElementById('result-data');
|
||||||
|
|
||||||
|
resultBox.classList.remove('hidden');
|
||||||
|
resultStatus.innerHTML = `<span class="text-slate-400 animate-pulse">🔍 Searching...</span>`;
|
||||||
|
resultActions.innerHTML = '';
|
||||||
|
resultData.textContent = '';
|
||||||
|
|
||||||
|
const query = `
|
||||||
|
query($searchTerm: String!) {
|
||||||
|
caseLaw(id: $searchTerm, caseNumber: $searchTerm) {
|
||||||
|
id status title caseNumber summary downloadUrl
|
||||||
|
}
|
||||||
|
}`;
|
||||||
|
|
||||||
|
try {
|
||||||
|
const data = await fetchGraphQL(query, { searchTerm: searchInput });
|
||||||
|
const caseLaw = data?.caseLaw;
|
||||||
|
|
||||||
|
if (caseLaw) {
|
||||||
|
resultStatus.innerHTML = `<span class="text-indigo-400">🔎 Match: ${caseLaw.caseNumber || caseLaw.id.slice(0, 8)}</span>`;
|
||||||
|
resultActions.innerHTML = `<a href="${caseLaw.downloadUrl}" target="_blank" class="px-3 py-1.5 bg-indigo-500 hover:bg-indigo-600 text-white text-[10px] font-bold uppercase tracking-wider rounded-lg transition-all">Open File</a>`;
|
||||||
|
resultData.textContent = JSON.stringify(caseLaw, (key, value) => key === 'downloadUrl' ? undefined : value, 2);
|
||||||
|
} else {
|
||||||
|
resultStatus.innerHTML = `<span class="text-slate-500">❌ Not Found</span>`;
|
||||||
|
resultData.textContent = 'No archive match.';
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
resultStatus.innerHTML = `<span class="text-red-400">❌ Error</span>`;
|
||||||
|
resultData.textContent = error.message;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
setInterval(async () => {
|
||||||
|
const pendingIds = Object.keys(activeJobs).filter(id => !['COMPLETED', 'FAILED'].includes(activeJobs[id].status));
|
||||||
|
|
||||||
|
for (const jobId of pendingIds) {
|
||||||
|
const query = `
|
||||||
|
query($id: String!) {
|
||||||
|
caseLaw(id: $id) {
|
||||||
|
status logs title decisionType decisionDate court caseNumber summary downloadUrl processingError
|
||||||
|
}
|
||||||
|
}`;
|
||||||
|
|
||||||
|
try {
|
||||||
|
const data = await fetchGraphQL(query, { id: jobId });
|
||||||
|
const caseLaw = data.caseLaw;
|
||||||
|
const job = activeJobs[jobId];
|
||||||
|
|
||||||
|
job.status = caseLaw.status;
|
||||||
|
job.caseData = caseLaw;
|
||||||
|
|
||||||
|
const logContainer = document.getElementById(`logs-${jobId}`);
|
||||||
|
const terminalView = document.getElementById(`terminal-${jobId}`);
|
||||||
|
|
||||||
|
if (caseLaw.logs && caseLaw.logs.length > job.logCount) {
|
||||||
|
const newLogs = caseLaw.logs.slice(job.logCount);
|
||||||
|
newLogs.forEach(logText => {
|
||||||
|
logContainer.insertAdjacentHTML('beforeend', `<div class="flex"><span class="text-indigo-500/50 mr-2 opacity-50">➜</span>${logText}</div>`);
|
||||||
|
});
|
||||||
|
job.logCount = caseLaw.logs.length;
|
||||||
|
terminalView.scrollTop = terminalView.scrollHeight;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (['COMPLETED', 'FAILED'].includes(caseLaw.status)) {
|
||||||
|
const statusMessage = caseLaw.status === 'COMPLETED' ? '✔ FINISHED' : '✘ FAILED';
|
||||||
|
logContainer.insertAdjacentHTML('beforeend', `<div class="mt-2 pt-2 border-t border-white/5 text-white font-bold">${statusMessage}</div>`);
|
||||||
|
|
||||||
|
document.getElementById(`footer-${jobId}`).classList.remove('hidden');
|
||||||
|
terminalView.scrollTop = terminalView.scrollHeight;
|
||||||
|
|
||||||
|
const pulseEl = document.getElementById(`pulse-${jobId}`);
|
||||||
|
pulseEl.classList.remove('bg-green-500', 'animate-pulse');
|
||||||
|
pulseEl.classList.add(caseLaw.status === 'COMPLETED' ? 'bg-indigo-500' : 'bg-red-500');
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
console.error(`Poll failed for job ${jobId}:`, error);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}, 1000);
|
||||||
|
</script>
|
||||||
|
</body>
|
||||||
|
|
||||||
|
</html>
|
||||||
|
|
@ -1,19 +1,33 @@
|
||||||
import { MiddlewareConsumer, Module, NestModule } from '@nestjs/common';
|
import { MiddlewareConsumer, Module, NestModule } from '@nestjs/common';
|
||||||
import { PrismaModule } from './common/prisma/prisma.module';
|
import { PrismaModule } from './common/prisma/prisma.module';
|
||||||
import { CasesModule } from './cases/cases.module';
|
import { CasesModule } from './cases/cases.module';
|
||||||
import { ConfigModule } from '@nestjs/config';
|
import { ConfigModule, ConfigService } from '@nestjs/config';
|
||||||
import { GraphQLModule } from '@nestjs/graphql';
|
import { GraphQLModule } from '@nestjs/graphql';
|
||||||
import { ApolloDriver, ApolloDriverConfig } from '@nestjs/apollo';
|
import { ApolloDriver, ApolloDriverConfig } from '@nestjs/apollo';
|
||||||
import { join } from 'path';
|
import { join } from 'path';
|
||||||
import { StorageModule } from './common/storage/storage.module';
|
import { StorageModule } from './common/storage/storage.module';
|
||||||
import { graphqlUploadExpress } from 'graphql-upload-ts';
|
import { graphqlUploadExpress } from 'graphql-upload-ts';
|
||||||
|
import { ServeStaticModule } from '@nestjs/serve-static';
|
||||||
|
import { BullModule } from '@nestjs/bullmq';
|
||||||
|
|
||||||
@Module({
|
@Module({
|
||||||
imports: [
|
imports: [
|
||||||
ConfigModule.forRoot({
|
ConfigModule.forRoot({
|
||||||
isGlobal: true,
|
isGlobal: true,
|
||||||
}),
|
}),
|
||||||
|
ServeStaticModule.forRoot({
|
||||||
|
rootPath: join(process.cwd(), 'public'),
|
||||||
|
exclude: ['/graphql'],
|
||||||
|
}),
|
||||||
|
BullModule.forRootAsync({
|
||||||
|
inject: [ConfigService],
|
||||||
|
useFactory: (configService: ConfigService) => ({
|
||||||
|
connection: {
|
||||||
|
host: configService.get('REDIS_HOST'),
|
||||||
|
port: configService.get('REDIS_PORT'),
|
||||||
|
},
|
||||||
|
}),
|
||||||
|
}),
|
||||||
GraphQLModule.forRoot<ApolloDriverConfig>({
|
GraphQLModule.forRoot<ApolloDriverConfig>({
|
||||||
driver: ApolloDriver,
|
driver: ApolloDriver,
|
||||||
autoSchemaFile: join(process.cwd(), 'src/schema.gql'),
|
autoSchemaFile: join(process.cwd(), 'src/schema.gql'),
|
||||||
|
|
@ -22,9 +36,11 @@ import { graphqlUploadExpress } from 'graphql-upload-ts';
|
||||||
// Needed for uploads in Apollo v4
|
// Needed for uploads in Apollo v4
|
||||||
csrfPrevention: false,
|
csrfPrevention: false,
|
||||||
}),
|
}),
|
||||||
|
|
||||||
|
PrismaModule,
|
||||||
PrismaModule, CasesModule, StorageModule,],
|
CasesModule,
|
||||||
|
StorageModule,
|
||||||
|
],
|
||||||
controllers: [],
|
controllers: [],
|
||||||
providers: [],
|
providers: [],
|
||||||
})
|
})
|
||||||
|
|
|
||||||
|
|
@ -1,8 +1,21 @@
|
||||||
import { Module } from '@nestjs/common';
|
import { Module } from '@nestjs/common';
|
||||||
|
import { BullModule } from '@nestjs/bullmq';
|
||||||
|
import { join } from 'path';
|
||||||
import { CasesService } from './cases.service';
|
import { CasesService } from './cases.service';
|
||||||
import { CasesResolver } from './cases.resolver';
|
import { CasesResolver } from './cases.resolver';
|
||||||
|
import { CaseQueueListener } from './processors/case-queue.listener';
|
||||||
|
|
||||||
@Module({
|
@Module({
|
||||||
providers: [CasesService, CasesResolver]
|
imports: [
|
||||||
|
BullModule.registerQueue({
|
||||||
|
name: 'case-processing',
|
||||||
|
processors: [
|
||||||
|
{
|
||||||
|
path: join(__dirname, 'processors', 'case.worker.js'),
|
||||||
|
},
|
||||||
|
],
|
||||||
|
}),
|
||||||
|
],
|
||||||
|
providers: [CasesService, CasesResolver, CaseQueueListener],
|
||||||
})
|
})
|
||||||
export class CasesModule {}
|
export class CasesModule {}
|
||||||
|
|
|
||||||
|
|
@ -1,10 +1,11 @@
|
||||||
import { Resolver, Query, Mutation, Args, ID, Int, ResolveField, Parent } from '@nestjs/graphql';
|
import { Resolver, Query, Mutation, Args, Int, ResolveField, Parent } from '@nestjs/graphql';
|
||||||
import { GraphQLUpload } from 'graphql-upload-ts';
|
import { GraphQLUpload } from 'graphql-upload-ts';
|
||||||
import type { FileUpload } from 'graphql-upload-ts';
|
import type { FileUpload } from 'graphql-upload-ts';
|
||||||
import { CasesService } from './cases.service';
|
import { CasesService } from './cases.service';
|
||||||
import { CaseLaw } from './entities/case-law.entity';
|
import { CaseLaw } from './entities/case-law.entity';
|
||||||
import { CaseFileValidationPipe } from 'src/common/pipes/file-validation.pipe';
|
import { StorageService } from '@app/common/storage/storage.service';
|
||||||
import { StorageService } from 'src/common/storage/storage.service';
|
import { CaseFileValidationPipe } from '@app/common/pipes/file-validation.pipe';
|
||||||
|
import { CaseStatus } from '../generated/prisma/client.js';
|
||||||
|
|
||||||
@Resolver(() => CaseLaw)
|
@Resolver(() => CaseLaw)
|
||||||
export class CasesResolver {
|
export class CasesResolver {
|
||||||
|
|
@ -13,7 +14,6 @@ export class CasesResolver {
|
||||||
private readonly storage: StorageService,
|
private readonly storage: StorageService,
|
||||||
) {}
|
) {}
|
||||||
|
|
||||||
|
|
||||||
@Query(() => CaseLaw, { name: 'caseLaw', nullable: true })
|
@Query(() => CaseLaw, { name: 'caseLaw', nullable: true })
|
||||||
async findOne(
|
async findOne(
|
||||||
@Args('id', { type: () => String, nullable: true }) id?: string,
|
@Args('id', { type: () => String, nullable: true }) id?: string,
|
||||||
|
|
@ -41,7 +41,18 @@ export class CasesResolver {
|
||||||
}
|
}
|
||||||
|
|
||||||
const buffer = Buffer.concat(chunks);
|
const buffer = Buffer.concat(chunks);
|
||||||
|
// Buffering into memory like this might not be perfectly "scalable" for 1GB files,
|
||||||
|
// but for this project and 10MB limit it's simpler to handle than passing a stream.
|
||||||
return this.casesService.processAndSave(buffer, mimetype, filename);
|
return this.casesService.processAndSave(buffer, mimetype, filename);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// An additional simple fetch all endpoint to demonstrate pagination
|
||||||
|
@Query(() => [CaseLaw], { name: 'caseLaws' })
|
||||||
|
async findAll(
|
||||||
|
@Args('status', { type: () => CaseStatus, nullable: true }) status?: CaseStatus,
|
||||||
|
@Args('take', { type: () => Int, nullable: true, defaultValue: 20 }) take?: number,
|
||||||
|
@Args('skip', { type: () => Int, nullable: true, defaultValue: 0 }) skip?: number,
|
||||||
|
) {
|
||||||
|
return this.casesService.findAll(status, take, skip);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -1,6 +1,10 @@
|
||||||
import { Injectable, NotFoundException, BadRequestException, Logger, Inject } from '@nestjs/common';
|
import { Injectable, NotFoundException, BadRequestException, Logger, Inject } from '@nestjs/common';
|
||||||
import { PRISMA_CLIENT, type PrismaClientInstance } from '../common/prisma/prisma.service';
|
import { PRISMA_CLIENT, type PrismaClientInstance } from '@app/common/prisma/prisma.service';
|
||||||
import { isUuid } from 'src/common/utils/string.utils';
|
import { isUuid } from '@app/common/utils/string.utils';
|
||||||
|
import { Queue } from 'bullmq';
|
||||||
|
import { StorageService } from '@app/common/storage/storage.service';
|
||||||
|
import { InjectQueue } from '@nestjs/bullmq';
|
||||||
|
import { CaseStatus } from '../generated/prisma/client.js';
|
||||||
|
|
||||||
@Injectable()
|
@Injectable()
|
||||||
export class CasesService {
|
export class CasesService {
|
||||||
|
|
@ -8,23 +12,36 @@ export class CasesService {
|
||||||
|
|
||||||
constructor(
|
constructor(
|
||||||
@Inject(PRISMA_CLIENT) private prisma: PrismaClientInstance,
|
@Inject(PRISMA_CLIENT) private prisma: PrismaClientInstance,
|
||||||
|
private storage: StorageService,
|
||||||
|
@InjectQueue('case-processing') private caseQueue: Queue,
|
||||||
) {}
|
) {}
|
||||||
|
|
||||||
async processAndSave(buffer: Buffer, mimetype: string, filename: string) {
|
async processAndSave(buffer: Buffer, mimetype: string, filename: string) {
|
||||||
|
|
||||||
this.logger.log(`Upload received: ${filename} (${mimetype}, ${(buffer.length / 1024).toFixed(1)} KB)`);
|
this.logger.log(`Upload received: ${filename} (${mimetype}, ${(buffer.length / 1024).toFixed(1)} KB)`);
|
||||||
|
const storageKey = await this.storage.upload(buffer, filename, mimetype);
|
||||||
const fileType = mimetype === 'application/pdf' ? 'PDF' : 'HTML';
|
const fileType = mimetype === 'application/pdf' ? 'PDF' : 'HTML';
|
||||||
|
|
||||||
const caseLaw = await this.prisma.caseLaw.create({
|
const caseLaw = await this.prisma.caseLaw.create({
|
||||||
data: {
|
data: {
|
||||||
title: `Processing: ${filename}`,
|
title: `Processing: ${filename}`,
|
||||||
fileType,
|
fileType,
|
||||||
|
storageKey,
|
||||||
|
status: CaseStatus.PENDING,
|
||||||
},
|
},
|
||||||
});
|
});
|
||||||
|
|
||||||
|
const workerDownloadUrl = await this.storage.getPresignedUrl(storageKey);
|
||||||
|
|
||||||
|
await this.caseQueue.add('parse-case', {
|
||||||
|
caseId: caseLaw.id,
|
||||||
|
downloadUrl: workerDownloadUrl,
|
||||||
|
mimetype,
|
||||||
|
});
|
||||||
|
|
||||||
return caseLaw;
|
return caseLaw;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
async findOne(id?: string, caseNumber?: string) {
|
async findOne(id?: string, caseNumber?: string) {
|
||||||
if (!id && !caseNumber) throw new BadRequestException('Provide ID or Case Number');
|
if (!id && !caseNumber) throw new BadRequestException('Provide ID or Case Number');
|
||||||
|
|
||||||
|
|
@ -40,4 +57,13 @@ export class CasesService {
|
||||||
if (!caseLaw) throw new NotFoundException('Case not found');
|
if (!caseLaw) throw new NotFoundException('Case not found');
|
||||||
return caseLaw;
|
return caseLaw;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
async findAll(status?: CaseStatus, take = 20, skip = 0) {
|
||||||
|
return this.prisma.caseLaw.findMany({
|
||||||
|
where: status ? { status } : undefined,
|
||||||
|
orderBy: { createdAt: 'desc' },
|
||||||
|
take: Math.min(take, 100),
|
||||||
|
skip,
|
||||||
|
});
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
@ -1,6 +1,10 @@
|
||||||
import { ObjectType, Field, ID } from '@nestjs/graphql';
|
import { ObjectType, Field, ID, registerEnumType } from '@nestjs/graphql';
|
||||||
import { GraphQLJSON } from 'graphql-type-json';
|
import { GraphQLJSON } from 'graphql-type-json';
|
||||||
|
import { CaseStatus } from '../../generated/prisma/client.js';
|
||||||
|
|
||||||
|
registerEnumType(CaseStatus, {
|
||||||
|
name: 'CaseStatus',
|
||||||
|
});
|
||||||
|
|
||||||
@ObjectType()
|
@ObjectType()
|
||||||
export class CaseLaw {
|
export class CaseLaw {
|
||||||
|
|
@ -34,9 +38,18 @@ export class CaseLaw {
|
||||||
@Field()
|
@Field()
|
||||||
fileType: string;
|
fileType: string;
|
||||||
|
|
||||||
|
@Field(() => [String], { defaultValue: [] })
|
||||||
|
logs: string[];
|
||||||
|
|
||||||
@Field(() => GraphQLJSON, { nullable: true })
|
@Field(() => GraphQLJSON, { nullable: true })
|
||||||
metadata?: any;
|
metadata?: any;
|
||||||
|
|
||||||
|
@Field(() => CaseStatus)
|
||||||
|
status: CaseStatus;
|
||||||
|
|
||||||
|
@Field({ nullable: true })
|
||||||
|
processingError?: string;
|
||||||
|
|
||||||
@Field()
|
@Field()
|
||||||
createdAt: Date;
|
createdAt: Date;
|
||||||
|
|
||||||
|
|
|
||||||
60
src/cases/parser/parser.service.spec.ts
Normal file
60
src/cases/parser/parser.service.spec.ts
Normal file
|
|
@ -0,0 +1,60 @@
|
||||||
|
import { Test, TestingModule } from '@nestjs/testing';
|
||||||
|
import { ParserService } from './parser.service';
|
||||||
|
import { BadRequestException } from '@nestjs/common';
|
||||||
|
|
||||||
|
jest.mock('@langchain/google-genai', () => ({
|
||||||
|
ChatGoogleGenerativeAI: jest.fn().mockImplementation(() => ({
|
||||||
|
withStructuredOutput: jest.fn().mockReturnValue({
|
||||||
|
invoke: jest.fn().mockResolvedValue({
|
||||||
|
title: 'Mock Case',
|
||||||
|
decisionType: 'Judgment',
|
||||||
|
decisionDate: '2024-01-01',
|
||||||
|
office: 'Office X',
|
||||||
|
court: 'Mock Court',
|
||||||
|
caseNumber: '123/2024',
|
||||||
|
summary: 'This is a mock summary for testing purposes.',
|
||||||
|
}),
|
||||||
|
}),
|
||||||
|
})),
|
||||||
|
}));
|
||||||
|
|
||||||
|
describe('ParserService', () => {
|
||||||
|
let service: ParserService;
|
||||||
|
|
||||||
|
beforeEach(async () => {
|
||||||
|
const module: TestingModule = await Test.createTestingModule({
|
||||||
|
providers: [ParserService],
|
||||||
|
}).compile();
|
||||||
|
|
||||||
|
service = module.get<ParserService>(ParserService);
|
||||||
|
process.env.GOOGLE_API_KEY = 'test_key';
|
||||||
|
});
|
||||||
|
|
||||||
|
it('should be defined', () => {
|
||||||
|
expect(service).toBeDefined();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('should throw BadRequestException for unsupported file types', async () => {
|
||||||
|
const buffer = Buffer.from('test');
|
||||||
|
await expect(service.process(buffer, 'image/png')).rejects.toThrow(
|
||||||
|
BadRequestException,
|
||||||
|
);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('should extract text from HTML using cheerio', async () => {
|
||||||
|
const html = '<html><body><h1>Case Title</h1><p>Case content</p></body></html>';
|
||||||
|
const buffer = Buffer.from(html);
|
||||||
|
|
||||||
|
const result = await service.process(buffer, 'text/html');
|
||||||
|
|
||||||
|
expect(result.title).toBe('Mock Case');
|
||||||
|
expect(result.metadata.rawLength).toBeGreaterThan(0);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('should fallback to buffer string if body is empty in HTML', async () => {
|
||||||
|
const text = 'Raw text content';
|
||||||
|
const buffer = Buffer.from(text);
|
||||||
|
const result = await service.process(buffer, 'text/html');
|
||||||
|
expect(result.metadata.rawLength).toBe(text.length);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
@ -70,7 +70,7 @@ export class ParserService {
|
||||||
|
|
||||||
// I read that the most important part of the document for metadata extraction is the
|
// I read that the most important part of the document for metadata extraction is the
|
||||||
// start/end of the document. But I am not a lawyer and uncertain so decided not to
|
// start/end of the document. But I am not a lawyer and uncertain so decided not to
|
||||||
// risk it. In the end I just set a hard cap at 500k characters to avoid abuse.
|
// risk it. In the end I just set a hard cap at 500k characters to avoid abuse
|
||||||
const maxChars = 500_000;
|
const maxChars = 500_000;
|
||||||
const documentText = text.length > maxChars ? text.substring(0, maxChars) : text;
|
const documentText = text.length > maxChars ? text.substring(0, maxChars) : text;
|
||||||
|
|
||||||
|
|
@ -97,5 +97,3 @@ export class ParserService {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,15 +1,15 @@
|
||||||
import { QueueEventsListener, QueueEventsHost, OnQueueEvent, InjectQueue } from '@nestjs/bullmq';
|
import { QueueEventsListener, QueueEventsHost, OnQueueEvent, InjectQueue } from '@nestjs/bullmq';
|
||||||
import { Queue } from 'bullmq';
|
import { Queue } from 'bullmq';
|
||||||
import { PrismaService } from '../../common/prisma/prisma.service';
|
import { PRISMA_CLIENT, type PrismaClientInstance } from '@app/common/prisma/prisma.service';
|
||||||
import { CaseStatus } from '@prisma/client';
|
import { CaseStatus } from '../../generated/prisma/client.js';
|
||||||
import { Logger } from '@nestjs/common';
|
import { Logger, Inject } from '@nestjs/common';
|
||||||
|
|
||||||
@QueueEventsListener('case-processing')
|
@QueueEventsListener('case-processing')
|
||||||
export class CaseQueueListener extends QueueEventsHost {
|
export class CaseQueueListener extends QueueEventsHost {
|
||||||
private readonly logger = new Logger(CaseQueueListener.name);
|
private readonly logger = new Logger(CaseQueueListener.name);
|
||||||
|
|
||||||
constructor(
|
constructor(
|
||||||
private prisma: PrismaService,
|
@Inject(PRISMA_CLIENT) private prisma: PrismaClientInstance,
|
||||||
@InjectQueue('case-processing') private readonly queue: Queue,
|
@InjectQueue('case-processing') private readonly queue: Queue,
|
||||||
) {
|
) {
|
||||||
super();
|
super();
|
||||||
|
|
|
||||||
26
src/cases/processors/case.worker.ts
Normal file
26
src/cases/processors/case.worker.ts
Normal file
|
|
@ -0,0 +1,26 @@
|
||||||
|
import 'dotenv/config';
|
||||||
|
import { Job } from 'bullmq';
|
||||||
|
import axios from 'axios';
|
||||||
|
import { ParserService } from '../parser/parser.service';
|
||||||
|
|
||||||
|
// I chose a sandboxed worker here (separate thread/process) because PDF parsing and
|
||||||
|
// AI calls can be surprisingly CPU heavy. If we did this on the main event loop,
|
||||||
|
// the whole API might lag while one person uploads a massive legal doc.
|
||||||
|
// This keeps the API snappy while the heavy lifting happens in the background.
|
||||||
|
export default async function (job: Job) {
|
||||||
|
const { downloadUrl, mimetype, caseId } = job.data;
|
||||||
|
try {
|
||||||
|
// Download
|
||||||
|
await job.updateProgress('📡 Downloading Case File from storage...');
|
||||||
|
const response = await axios.get(downloadUrl, { responseType: 'arraybuffer' });
|
||||||
|
const buffer = Buffer.from(response.data);
|
||||||
|
await job.updateProgress(`✅ File downloaded (${(buffer.length / 1024).toFixed(1)} KB)`);
|
||||||
|
|
||||||
|
// Parse
|
||||||
|
await job.updateProgress('🔍 Extracting text content...');
|
||||||
|
const result = await ParserService.parse(buffer, mimetype, job);
|
||||||
|
return result;
|
||||||
|
} catch (error) {
|
||||||
|
throw new Error(`Sandboxed parsing failed: ${error.message}`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
@ -2,7 +2,7 @@ import { Global, Module } from '@nestjs/common';
|
||||||
import {
|
import {
|
||||||
PrismaService,
|
PrismaService,
|
||||||
PRISMA_CLIENT,
|
PRISMA_CLIENT,
|
||||||
} from './prisma.service.js';
|
} from './prisma.service';
|
||||||
|
|
||||||
@Global()
|
@Global()
|
||||||
@Module({
|
@Module({
|
||||||
|
|
|
||||||
14
src/common/utils/string.utils.spec.ts
Normal file
14
src/common/utils/string.utils.spec.ts
Normal file
|
|
@ -0,0 +1,14 @@
|
||||||
|
import { isUuid } from './string.utils';
|
||||||
|
|
||||||
|
describe('String Utils', () => {
|
||||||
|
describe('isUuid', () => {
|
||||||
|
it('should return true for valid UUID v4', () => {
|
||||||
|
expect(isUuid('550e8400-e29b-41d4-a716-446655440000')).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('should return false for invalid strings', () => {
|
||||||
|
expect(isUuid('not-a-uuid')).toBe(false);
|
||||||
|
expect(isUuid('12345')).toBe(false);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
@ -1,4 +1,3 @@
|
||||||
|
|
||||||
// Validates if a string is a valid UUID v4 format. Would eventually move to class-validator to "fail early"
|
// Validates if a string is a valid UUID v4 format. Would eventually move to class-validator to "fail early"
|
||||||
export const isUuid = (value: string): boolean => {
|
export const isUuid = (value: string): boolean => {
|
||||||
return /^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i.test(value);
|
return /^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i.test(value);
|
||||||
|
|
|
||||||
20
src/main.ts
20
src/main.ts
|
|
@ -1,8 +1,26 @@
|
||||||
|
import 'reflect-metadata';
|
||||||
import { NestFactory } from '@nestjs/core';
|
import { NestFactory } from '@nestjs/core';
|
||||||
|
import { ValidationPipe, Logger } from '@nestjs/common';
|
||||||
|
import { ConfigService } from '@nestjs/config';
|
||||||
import { AppModule } from './app.module';
|
import { AppModule } from './app.module';
|
||||||
|
|
||||||
async function bootstrap() {
|
async function bootstrap() {
|
||||||
const app = await NestFactory.create(AppModule);
|
const app = await NestFactory.create(AppModule);
|
||||||
await app.listen(process.env.PORT ?? 3000);
|
const logger = new Logger('Bootstrap');
|
||||||
|
|
||||||
|
const configService = app.get(ConfigService);
|
||||||
|
|
||||||
|
app.enableCors();
|
||||||
|
|
||||||
|
app.useGlobalPipes(new ValidationPipe({
|
||||||
|
whitelist: true,
|
||||||
|
transform: true,
|
||||||
|
forbidNonWhitelisted: true,
|
||||||
|
}));
|
||||||
|
|
||||||
|
const port = configService.get<number>('PORT') || 3000;
|
||||||
|
await app.listen(port);
|
||||||
|
logger.log(`🚀 Application running on http://localhost:${port}`);
|
||||||
|
logger.log(`📊 GraphQL Playground: http://localhost:${port}/graphql`);
|
||||||
}
|
}
|
||||||
bootstrap();
|
bootstrap();
|
||||||
|
|
|
||||||
|
|
@ -14,7 +14,11 @@
|
||||||
"sourceMap": true,
|
"sourceMap": true,
|
||||||
"outDir": "./dist",
|
"outDir": "./dist",
|
||||||
"baseUrl": "./",
|
"baseUrl": "./",
|
||||||
"paths": {},
|
"paths": {
|
||||||
|
"@app/*": [
|
||||||
|
"./src/*"
|
||||||
|
]
|
||||||
|
},
|
||||||
"incremental": true,
|
"incremental": true,
|
||||||
"skipLibCheck": true,
|
"skipLibCheck": true,
|
||||||
"strictNullChecks": true,
|
"strictNullChecks": true,
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue