Medical bills contain diagnosis codes. Diagnosis codes reveal conditions. We decided no patient should have to send that to a server just to check if they're being overcharged.
So we built a bill analyzer where everything runs in the browser: Tesseract OCR, code extraction, pricing lookups against Medicare fee schedules, and 3.3M CMS bundling rule checks. Zero network calls after initial load.
The hard problem was size. Raw CMS datasets run to tens of megabytes. We shard so first load is 198KB (479x reduction), detail shards on demand. Zod validation with fail-closed defaults: if data fails schema checks, the feature turns off rather than showing bad numbers.
12 sprints to get OCR to 95.0% F1 across 19 real bills. The failure modes are specific to medical documents: thermal printer ink where $45 becomes $4,500, layouts where every code shifts one column right, ZIP codes in headers extracted as charge amounts. We built a 7-stage filter pipeline to catch these before they reach the pricing engine.
The bundling checks are exhaustive. If a hospital bills code A and code B separately, but CMS says B is included in A, that's an unbundling violation. Most audit tools run this server-side. We load all 3.3M pairs into the browser via sharded JSON and in-memory indexing.
This is a great effort and thanks for this. Any reasons why the NBC article was linked rather than the tool itself? I found the tool link inside the article but was just curious why the link to the article and whether it would violate any HN terms to paste a link to the tool in the comments
Medical bills contain diagnosis codes. Diagnosis codes reveal conditions. We decided no patient should have to send that to a server just to check if they're being overcharged.
So we built a bill analyzer where everything runs in the browser: Tesseract OCR, code extraction, pricing lookups against Medicare fee schedules, and 3.3M CMS bundling rule checks. Zero network calls after initial load. The hard problem was size. Raw CMS datasets run to tens of megabytes. We shard so first load is 198KB (479x reduction), detail shards on demand. Zod validation with fail-closed defaults: if data fails schema checks, the feature turns off rather than showing bad numbers.
12 sprints to get OCR to 95.0% F1 across 19 real bills. The failure modes are specific to medical documents: thermal printer ink where $45 becomes $4,500, layouts where every code shifts one column right, ZIP codes in headers extracted as charge amounts. We built a 7-stage filter pipeline to catch these before they reach the pricing engine.
The bundling checks are exhaustive. If a hospital bills code A and code B separately, but CMS says B is included in A, that's an unbundling violation. Most audit tools run this server-side. We load all 3.3M pairs into the browser via sharded JSON and in-memory indexing.
This is a great effort and thanks for this. Any reasons why the NBC article was linked rather than the tool itself? I found the tool link inside the article but was just curious why the link to the article and whether it would violate any HN terms to paste a link to the tool in the comments
The software mentioned in the article is Orbdoc. https://orbdoc.com/blog
The link is to an article. Would it be better to link to the software?
how many americans out there resort to medical tourism as a viable alternative to beat hospital costs? any numbers?