I've been working in legaltech space and can definitely echo the sentiments there. There are some major legaltech/legal AI companies but after speaking to dozens of law firms, none of them are finding these tools very valuable. But they have signed contracts with many seats, they are busy people, and tech is not intrinsic to them, so they are not in the business of just changing tools and building things in-house (a handful of them are). And the problem is despite massive amount of internal data, all the solutions fail on the relevance and precision scale. When I sit down with actual legal associates, I can see how immensely complex these workflows are, and to fully utilize this data moat you need: (1) multi-step agentic retrieval, (2) a set of rules/heuristics to ground and steer everything per transaction/case "type", (3) adaptation/fine-tuning towards the "house language/style", (4) integration towards many different data sources and tools; and you need to wrap all this with real-world evals (where LLM-as-a-judge technique often fail).
Sure! So to qualify - I've been working in contractual law, and more specifically contract drafting. There are a tonne of other tools in the areas of document management, research, regulatory, timekeeping, etc, so I cannot speak on behalf of those.
Sample size: around 150 law firms across UK, Nordics and DACH (and a smithering across the US). Some were actual month long pilots so there were deeper interactions with some, whilst others were "just conversations". Let's say in each law firm it's 3-4 associates and 1-2 partners, so it's >600 lawyers.
Typically the legal AI solutions in contract drafting involve the lawyer uploading "their database" aka drag-and-drop a folder or a zip file containing potentially 100s-1000s contracts from previous transactions.
What's missing:
- Relevance: For the current transaction the lawyer is working on, the recommendations from AI tools suggest irrelevant information. For example, if it's an M&A transaction in one market (e.g. Nordics), it suggests pricing mechanics from a different market practice (e.g. US) that are irrelevant or not desirable. The text semantics have closest cosine (or whatever) distance, but the market characteristics are orthogonal.
- Representation: as a lawyer you are always representing a specific party (e.g. a "buyer" purchasing another company or an asset from a "seller"). You want your side to be best represented - however the tools often fail to "understand" what/who you are representing, and tend to recommend the opposite of what you want for your client.
- Diversity: The same handful of documents keep being referenced all the time, even though there are other "better" documents that should be used to ground the responses and recommendations.
- Precision: Sometimes you want precise information, such as specific leverage ratios or very specific warranty clauses for a transaction of a particular size within a particular industry.
- Language/tonality: Lawyers talk to other lawyers and there is a specific tonality and language used - precision, eloquence, professionalism. Each law firm also has their "house style" in terms of how they put the words together. AI tools come across as "odd" in terms of how they respond (even when they are correct). It trips the lawyers up a bit and they lose the trust somewhat.
i honestly dont think there's a simple y/n answer there - i think considerations include mostly like 'how costly it is to do so', 'how often do you think you'll need it', and so on.
traces are not as "ephemeral" as FT models - since you can use those to guide agent behaviour when a newer model is released (but still, not as evergreen as other assets - traces generated using say GPT4 would seem pale and outdated compared to ones created on the same dataset using Opus4.5 i reckon)
I've been working in legaltech space and can definitely echo the sentiments there. There are some major legaltech/legal AI companies but after speaking to dozens of law firms, none of them are finding these tools very valuable. But they have signed contracts with many seats, they are busy people, and tech is not intrinsic to them, so they are not in the business of just changing tools and building things in-house (a handful of them are). And the problem is despite massive amount of internal data, all the solutions fail on the relevance and precision scale. When I sit down with actual legal associates, I can see how immensely complex these workflows are, and to fully utilize this data moat you need: (1) multi-step agentic retrieval, (2) a set of rules/heuristics to ground and steer everything per transaction/case "type", (3) adaptation/fine-tuning towards the "house language/style", (4) integration towards many different data sources and tools; and you need to wrap all this with real-world evals (where LLM-as-a-judge technique often fail).
Could you please expand on “none of them find the tools very useful”?
I would love to know how big your sample is, in what way the tools fail, what features are missing etc.
Sure! So to qualify - I've been working in contractual law, and more specifically contract drafting. There are a tonne of other tools in the areas of document management, research, regulatory, timekeeping, etc, so I cannot speak on behalf of those.
Sample size: around 150 law firms across UK, Nordics and DACH (and a smithering across the US). Some were actual month long pilots so there were deeper interactions with some, whilst others were "just conversations". Let's say in each law firm it's 3-4 associates and 1-2 partners, so it's >600 lawyers.
Typically the legal AI solutions in contract drafting involve the lawyer uploading "their database" aka drag-and-drop a folder or a zip file containing potentially 100s-1000s contracts from previous transactions.
What's missing:
- Relevance: For the current transaction the lawyer is working on, the recommendations from AI tools suggest irrelevant information. For example, if it's an M&A transaction in one market (e.g. Nordics), it suggests pricing mechanics from a different market practice (e.g. US) that are irrelevant or not desirable. The text semantics have closest cosine (or whatever) distance, but the market characteristics are orthogonal.
- Representation: as a lawyer you are always representing a specific party (e.g. a "buyer" purchasing another company or an asset from a "seller"). You want your side to be best represented - however the tools often fail to "understand" what/who you are representing, and tend to recommend the opposite of what you want for your client.
- Diversity: The same handful of documents keep being referenced all the time, even though there are other "better" documents that should be used to ground the responses and recommendations.
- Precision: Sometimes you want precise information, such as specific leverage ratios or very specific warranty clauses for a transaction of a particular size within a particular industry.
- Language/tonality: Lawyers talk to other lawyers and there is a specific tonality and language used - precision, eloquence, professionalism. Each law firm also has their "house style" in terms of how they put the words together. AI tools come across as "odd" in terms of how they respond (even when they are correct). It trips the lawyers up a bit and they lose the trust somewhat.
Etc.
(there are many others)
How to know if one should fine tune/pretrain or RL / reasoning train given some data set?
i honestly dont think there's a simple y/n answer there - i think considerations include mostly like 'how costly it is to do so', 'how often do you think you'll need it', and so on. traces are not as "ephemeral" as FT models - since you can use those to guide agent behaviour when a newer model is released (but still, not as evergreen as other assets - traces generated using say GPT4 would seem pale and outdated compared to ones created on the same dataset using Opus4.5 i reckon)