Axis Biotech accelerated clinical research with Embedix

"Clinical trial PDFs are the messiest documents on earth. Embedix parses them without hallucinating."
Dr. James Okafor
Director of Data Science, Axis Biotech
Company
Axis Biotech
Industry
Pharma
Based in
Cambridge, MA
Founded
2016

Axis Biotech runs clinical trials across oncology and rare diseases. Their research team manages tens of thousands of regulatory documents: protocols, CRFs, adverse event reports, investigator brochures. Most of these are scanned PDFs with complex tables, handwritten notes, and multi-column layouts. Standard RAG tools hallucinated on them.

Document parsing was the bottleneck

Before Embedix, Axis had tried two commercial RAG products. Both could handle clean markdown. Neither could handle a scanned Phase 3 trial protocol with embedded tables and handwritten annotations. The retrieval answers were confidently wrong, which is worse than no answer at all in a regulated context.

The data science team spent four months building their own parsing pipeline on top of Unstructured.io and custom vision models. It worked, but maintaining it was a full-time job.

"We could either build a parsing team or build drugs. Not both."

Dr. James Okafor, Director of Data Science

Adaptive parsing for real documents

Embedix routes each document through a quality classifier. Clean PDFs go through a fast path. Scanned documents go through OCR with layout preservation. Handwritten annotations get flagged for human review.

The retrieval quality on the Phase 3 protocols went from 'unusable' to 'the biostatisticians trust it.'

Ready for the EU AI Act

Axis submits trial data to EMA and FDA. The upcoming EU AI Act requirements around AI traceability and documentation would have been a scramble for the in-house system. With Embedix, every query generated during trial analysis is logged with full provenance, exportable in formats the regulatory team already uses.

When the regulatory team asked 'can we prove which document was used in this analysis,' the answer was yes, down to the section and version.