Skip to main content

AI in Litigation Support

The Deposition That Won the Case

It was a patent infringement trial. The plaintiff claimed the defendant's product infringed claims related to a battery management circuit. The defendant's chief technology officer had given a deposition 18 months earlier and a second deposition six months later. The trial team had 600 pages of deposition testimony from this witness across the two sessions. They needed to know one thing: had the witness ever described the circuit design in terms that contradicted his current trial testimony that the design was independently developed?

A junior associate was assigned to read both depositions and flag contradictions. She came back with 14 flagged passages. The trial team ran the depositions through an NLP contradiction detection system as a second pass. It surfaced 31 passages where the witness's language in the first deposition was semantically inconsistent with his language in the second. Of those 31, the associate had found 12 of the 14 most significant ones - but had missed 19 others, including two that were, in the trial team's judgment, highly material.

One of the 19 missed passages became the centerpiece of cross-examination. The witness had described the circuit design in the first deposition using technical terminology that implied awareness of the plaintiff's prior art. In the second deposition, he described the same design using different terminology that obscured that awareness. The contradiction did not involve the same words. No keyword search would have found it. The NLP system surfaced it because the semantic embeddings of the two descriptions were similar in context but divergent in the specific technical attributes being attributed to the design.

The witness was impeached. The jury found for the plaintiff.

This is not a story about AI replacing lawyers. The trial attorney who conducted the cross-examination was a 30-year patent litigator. The NLP system gave her the ammunition. She deployed it with judgment and skill that no machine can replicate. This is the right model for AI in litigation: augment the attorney's preparation, surface what the attorney would miss due to volume, and leave the judgment calls to the professional.


Why This Exists

Litigation preparation is an information management problem at scale. A complex commercial case involves thousands of documents, multiple depositions, expert reports, correspondence, and exhibit lists. The trial team needs to:

  • Know what every witness said, in every deposition, on every relevant topic
  • Identify inconsistencies between depositions, between deposition testimony and documentary evidence, and between different witnesses' accounts
  • Build a chronological timeline of events from thousands of documents
  • Prepare exhibits efficiently, tracking which exhibits have been used, by which witnesses, with what testimony
  • Monitor the opposing expert's opinions across multiple cases for inconsistencies

All of this is information work that does not require legal judgment. It requires exhaustive reading and pattern detection. These are exactly the tasks where AI systems outperform fatigued humans at scale.

The economic argument mirrors e-discovery: a paralegal reviewing depositions at $50/hour reading 600 pages of transcript takes 12+ hours. An NLP pipeline processing the same transcripts takes 30 seconds. The question is whether the NLP output is reliable enough to trust, and what the attorney review process looks like on top of it.


Historical Context

Litigation support as a technology category predates modern AI. In the 1980s and 1990s, case management software (Summation, Concordance) was used to organize documents and transcripts for large cases. These were sophisticated databases, not AI systems.

The first AI applications to litigation support were in the e-discovery space (covered in Lesson 4). TAR for relevance classification was the first widely adopted ML application in litigation.

Deposition analysis tools appeared around 2015 with companies like Deponent AI and features in platforms like TextMap and LiveNote. These were primarily keyword-based: find every place a witness mentioned a particular term or phrase. Semantic analysis came later with transformer-based embeddings.

Timeline reconstruction from documents is a classic IE (information extraction) task - extracting events, dates, and actors from unstructured text. The pipeline was described in academic NLP papers in the 2000s, and legal applications emerged as commercial case management software added AI features.

Settlement prediction - ML models trained on case characteristics to predict case outcomes - is the most controversial application. Academic research (Katz et al., predicting Supreme Court outcomes, 2014; Sulea et al., predicting EU Court of Human Rights outcomes, 2017) showed that models could predict outcomes at 70-80% accuracy on held-out cases. Commercial applications are more cautious because of the ethical complexity of using predictive analytics to pressure settlements.


Core Concepts

Contradiction Detection in Depositions

A contradiction between two statements by the same witness exists when the two statements cannot both be true given the same facts. Detecting contradictions requires:

  1. Statement extraction: identify discrete factual claims from witness testimony
  2. Entity alignment: ensure both statements refer to the same entity, event, or time period
  3. Semantic comparison: determine whether the two claims are inconsistent

The NLP approach uses natural language inference (NLI) - a three-class classification task (entailment, contradiction, neutral) trained on datasets like MultiNLI or SNLI. For deposition analysis:

  • Pair each claim from deposition 1 with each claim from deposition 2 where the same entity and time period are involved
  • Run NLI to classify pairs as entailment (consistent), contradiction (inconsistent), or neutral
  • Surface pairs classified as contradiction for attorney review

The challenge: legal contradictions are often subtle. "I was not aware of the prior art" vs "I had reviewed the prior art literature in my field" is a contradiction, but the surface-form sentences share no common words. NLI models fine-tuned on general text often miss domain-specific contradictions. Fine-tuning on legal deposition examples improves performance significantly.

Timeline Reconstruction

Building a chronology of events from thousands of documents is a temporal information extraction task. The pipeline:

  1. Temporal expression extraction: identify dates, times, and relative temporal expressions ("three weeks later," "before the acquisition") using systems like HeidelTime or SUTime
  2. Event extraction: identify the event associated with each temporal expression
  3. Entity linking: identify which party/product/contract each event involves
  4. Temporal ordering: arrange events in chronological order, handling relative expressions and date ranges
  5. Conflict resolution: when two documents give conflicting dates for the same event, flag the conflict

The output is a structured timeline that serves as the factual backbone of the case - every event supported by one or more documentary citations.

Case Outcome Prediction

Settlement prediction and case outcome prediction use features derived from case characteristics to predict whether a defendant will win or lose, and the likely range of damages.

Input features for a binary win/loss classifier:

  • Jurisdiction and court
  • Judge assignment (judicial behavior varies significantly)
  • Case type (patent, contract, employment, securities)
  • Party characteristics (company size, legal representation tier)
  • Prior similar cases in the jurisdiction
  • Key legal issues and their track record
  • Filing timing relative to trial

These models are trained on historical case outcomes from PACER data. Published accuracy rates are in the 65-80% range on held-out cases for well-scoped case types (e.g., patent validity or securities class actions in specific circuits).

The ethical concerns are significant: if a model says a plaintiff has a 73% chance of winning, does that create pressure to settle at a lower value than is warranted? Does the availability of outcome predictions change litigation strategy in ways that disadvantage parties without access to the tool?

Responsible use: treat outcome prediction as one input into settlement analysis, not a deterministic recommendation. Always combine with attorney judgment about the specific facts and the specific judge.


Code Examples

Contradiction Detection in Deposition Transcripts

"""
Deposition contradiction detection system.
Uses NLI to identify inconsistent statements across deposition sessions.
"""

from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
import torch
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass
import re
import spacy
import numpy as np

@dataclass
class DepositionStatement:
"""A discrete factual claim extracted from deposition testimony."""
text: str
page: int
line: int
witness: str
deposition_date: str
entities: List[str]
temporal_anchors: List[str] # dates or time expressions mentioned


@dataclass
class ContradictionResult:
"""A detected contradiction between two deposition statements."""
statement_1: DepositionStatement
statement_2: DepositionStatement
contradiction_score: float
explanation: str
severity: str # HIGH, MEDIUM, LOW


class DepositionParser:
"""Parse deposition transcripts into structured statements."""

def __init__(self):
try:
self.nlp = spacy.load("en_core_web_sm")
except OSError:
self.nlp = None

def parse_transcript(
self,
transcript_text: str,
witness_name: str,
deposition_date: str,
) -> List[DepositionStatement]:
"""
Parse a deposition transcript into discrete statements.
Extracts Q&A format, capturing witness answers.
"""
statements = []

# Match deposition format: "A: [answer text]" or "THE WITNESS: [text]"
answer_pattern = re.compile(
r"(?:A\.|THE WITNESS:)\s*(.+?)(?=\nQ\.|THE WITNESS:|THE COURT:|$)",
re.DOTALL | re.IGNORECASE,
)

# Also extract page/line numbers
page_line_pattern = re.compile(r"(\d+)\s+(\d+)\s+(.*)")

current_page = 1
current_line = 1

for match in answer_pattern.finditer(transcript_text):
answer_text = match.group(1).strip()

# Split into individual sentences for finer granularity
if self.nlp:
doc = self.nlp(answer_text[:1000])
sentences = [sent.text.strip() for sent in doc.sents if len(sent.text.strip()) > 20]
else:
sentences = [s.strip() for s in answer_text.split(". ") if len(s.strip()) > 20]

for sent in sentences:
entities = self._extract_entities(sent)
temporal_anchors = self._extract_temporal(sent)

if entities or temporal_anchors or len(sent) > 50:
statements.append(DepositionStatement(
text=sent,
page=current_page,
line=current_line,
witness=witness_name,
deposition_date=deposition_date,
entities=entities,
temporal_anchors=temporal_anchors,
))

current_line += 1
current_page += len(answer_text) // 2500 # Approximate

return statements

def _extract_entities(self, text: str) -> List[str]:
"""Extract named entities from text."""
if not self.nlp:
return []
doc = self.nlp(text)
return list(set(ent.text for ent in doc.ents if ent.label_ in ("PERSON", "ORG", "PRODUCT", "EVENT")))

def _extract_temporal(self, text: str) -> List[str]:
"""Extract temporal expressions from text."""
if not self.nlp:
return []
doc = self.nlp(text)
return list(set(ent.text for ent in doc.ents if ent.label_ in ("DATE", "TIME")))


class NLIContradictionDetector:
"""
Uses Natural Language Inference to detect contradictions
between deposition statements.
"""

CONTRADICTION_LABEL = "contradiction"
ENTAILMENT_LABEL = "entailment"
NEUTRAL_LABEL = "neutral"

def __init__(self, model_name: str = "cross-encoder/nli-deberta-v3-base"):
self.nli = pipeline(
"text-classification",
model=model_name,
device=0 if torch.cuda.is_available() else -1,
)

def compare_statements(
self,
statement_a: str,
statement_b: str,
) -> Tuple[str, float]:
"""
Classify the relationship between two statements.
Returns (label, confidence).
"""
result = self.nli(f"{statement_a} [SEP] {statement_b}", truncation=True)
label = result[0]["label"].lower()
score = result[0]["score"]

# Some NLI models return LABEL_0/1/2 - map to semantic labels
label_map = {"label_0": "entailment", "label_1": "neutral", "label_2": "contradiction"}
if label in label_map:
label = label_map[label]

return label, score

def find_contradictions(
self,
statements_session1: List[DepositionStatement],
statements_session2: List[DepositionStatement],
entity_filter: Optional[str] = None,
min_contradiction_score: float = 0.7,
) -> List[ContradictionResult]:
"""
Find contradictions between two deposition sessions.
Only compares statements that share entities (efficient pairing).
"""
results = []

# Build entity index for efficient matching
entity_index_s2: Dict[str, List[DepositionStatement]] = {}
for stmt in statements_session2:
for entity in stmt.entities:
if entity_filter and entity_filter.lower() not in entity.lower():
continue
entity_lower = entity.lower()
if entity_lower not in entity_index_s2:
entity_index_s2[entity_lower] = []
entity_index_s2[entity_lower].append(stmt)

# Compare statements that share entities
seen_pairs = set()
for stmt1 in statements_session1:
for entity in stmt1.entities:
if entity_filter and entity_filter.lower() not in entity.lower():
continue
matching_s2 = entity_index_s2.get(entity.lower(), [])

for stmt2 in matching_s2:
pair_key = (id(stmt1), id(stmt2))
if pair_key in seen_pairs:
continue
seen_pairs.add(pair_key)

label, score = self.compare_statements(stmt1.text, stmt2.text)

if label == self.CONTRADICTION_LABEL and score >= min_contradiction_score:
severity = (
"HIGH" if score >= 0.9
else "MEDIUM" if score >= 0.8
else "LOW"
)
results.append(ContradictionResult(
statement_1=stmt1,
statement_2=stmt2,
contradiction_score=score,
explanation=self._generate_explanation(stmt1.text, stmt2.text),
severity=severity,
))

return sorted(results, key=lambda r: r.contradiction_score, reverse=True)

def _generate_explanation(self, text1: str, text2: str) -> str:
"""Generate a plain-English explanation of the contradiction."""
# Simplified explanation - in production, use LLM for better explanations
return (
f"Statement 1 ('{text1[:80]}...') appears inconsistent with "
f"Statement 2 ('{text2[:80]}...')"
)


# --- Timeline Reconstruction ---

from datetime import datetime
from typing import Optional
import dateparser

@dataclass
class TimelineEvent:
"""A dated event extracted from a document."""
date: Optional[datetime]
date_text: str # Original date expression in document
event_description: str
actors: List[str]
document_source: str
confidence: float


class TimelineBuilder:
"""
Extracts events and dates from legal documents to build case chronologies.
"""

DATE_PATTERNS = [
# ISO format
r"\b(\d{4}-\d{2}-\d{2})\b",
# US format
r"\b(\d{1,2}/\d{1,2}/\d{4})\b",
# Written format
r"\b((?:January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{1,2},?\s+\d{4})\b",
# Abbreviated month
r"\b((?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\.?\s+\d{1,2},?\s+\d{4})\b",
]

def __init__(self):
try:
self.nlp = spacy.load("en_core_web_sm")
self.use_spacy = True
except OSError:
self.use_spacy = False

def extract_events(
self,
text: str,
document_source: str,
) -> List[TimelineEvent]:
"""Extract dated events from document text."""
events = []

# Split into sentences
if self.use_spacy:
doc = self.nlp(text[:10000])
sentences = [sent.text for sent in doc.sents]
else:
sentences = re.split(r"(?<=[.!?])\s+", text)

for sentence in sentences:
# Find dates in sentence
dates_found = []
for pattern in self.DATE_PATTERNS:
for match in re.finditer(pattern, sentence, re.IGNORECASE):
date_str = match.group(1)
try:
parsed = dateparser.parse(date_str)
if parsed:
dates_found.append((date_str, parsed))
except Exception:
pass

if not dates_found:
continue

# Extract actors from the sentence
actors = []
if self.use_spacy:
sent_doc = self.nlp(sentence)
actors = [ent.text for ent in sent_doc.ents if ent.label_ in ("PERSON", "ORG")]

# Create event for each date found
for date_str, parsed_date in dates_found:
events.append(TimelineEvent(
date=parsed_date,
date_text=date_str,
event_description=sentence.strip()[:500],
actors=actors,
document_source=document_source,
confidence=0.8 if actors else 0.5,
))

return events

def build_chronology(
self,
documents: List[Tuple[str, str]], # [(text, source_name), ...]
) -> List[TimelineEvent]:
"""Build a sorted chronological timeline from multiple documents."""
all_events = []
for text, source in documents:
events = self.extract_events(text, source)
all_events.extend(events)

# Sort by date, handling None dates
all_events.sort(
key=lambda e: e.date if e.date else datetime.min
)

return all_events

def format_chronology(self, events: List[TimelineEvent]) -> str:
"""Format chronology for litigation team use."""
lines = ["CASE CHRONOLOGY\n" + "=" * 60]
current_year = None

for event in events:
if event.date:
year = event.date.year
if year != current_year:
lines.append(f"\n--- {year} ---")
current_year = year
date_str = event.date.strftime("%B %d, %Y")
else:
date_str = f"[Undated] ({event.date_text})"

actors_str = ", ".join(event.actors[:3]) if event.actors else "Unknown"
lines.append(
f"\n{date_str}\n"
f" Actors: {actors_str}\n"
f" Event: {event.event_description[:200]}\n"
f" Source: {event.document_source}"
)

return "\n".join(lines)

Mermaid Diagrams

Litigation Support AI Ecosystem

Contradiction Detection Pipeline

Settlement Prediction Model Features


Production Engineering Notes

Deposition Transcript Quality Varies Enormously

Court reporter transcripts have varying quality. High-quality real-time transcripts (produced by certified court reporters) are 98%+ accurate. Rough transcripts, produced during depositions for same-day delivery, may be 90-95% accurate. Transcripts produced by automated speech recognition (increasingly common for lower-stakes depositions) can be 85-92% accurate.

Errors in transcripts create errors in NLP downstream tasks. A witness's technical term transcribed incorrectly becomes an unknown word to the NLP model. Names of products, compounds, or technical concepts are especially prone to transcription error.

Preprocessing deposition transcripts before NLP analysis:

  1. Identify all proper nouns and technical terms using NER
  2. Cross-reference against a case-specific glossary (product names, chemical compounds, technical terms used in the case)
  3. Flag low-confidence technical terms for manual verification
  4. Normalize variations of the same term ("API-123," "API 123," "the API")

Expert Witness Analysis Across Cases

Expert witnesses are often retained in multiple cases, and their prior testimony can be a gold mine for impeachment. An expert who testified in 2019 that a particular methodology was unreliable cannot easily testify in 2023 that the same methodology is reliable.

Building an expert witness database:

  1. Collect all publicly available deposition transcripts and trial transcripts for the expert
  2. Extract the expert's opinions on key methodological and factual questions
  3. Store with temporal metadata (case, date, jurisdiction)
  4. When the expert is retained in a new matter, retrieve their prior opinions on relevant topics

The challenge is opinion tracking over time: expert opinions evolve, and this evolution may be legitimate or may represent improper advocacy. The NLP system surfaces the evolution; the attorney judges whether it is impeachable.

Damages Calculation Models

In certain case types, damages follow mathematical models that can be implemented in code. Patent reasonable royalty calculations use the Georgia-Pacific factors. Lost profits calculations follow incremental profit analysis. Securities fraud damages use event study methodology.

For cases following well-defined damages models:

  1. Extract the factual inputs (royalty base, royalty rate, time period, market share) from case documents using NLP
  2. Implement the damages calculation as a financial model
  3. Sensitivity analysis - vary assumptions to show the range of defensible damages estimates
  4. Document all assumption choices and their documentary support

This is not a black box AI model - it is a transparent financial calculation implemented in code, with every input traceable to a specific document.


Common Mistakes

:::danger Treating settlement prediction as advice Settlement prediction models are statistical tools that output probability estimates. They are not legal advice. A model that says "73% win probability" does not account for the specific judge's recent rulings, the particular jury pool demographics, the client's risk tolerance, or the strategic value of the legal precedent being established. Using predictive analytics to pressure clients into settlements without the attorney's holistic judgment is an ethics violation. These tools are for informing attorney judgment, not replacing it. :::

:::danger Presenting NLP-detected contradictions without attorney verification An NLP system might flag a "contradiction" between two statements that is actually consistent from a legal perspective, or that represents a legitimate clarification. Impeaching a witness with a false contradiction in front of a jury is deeply damaging to your client's case. Every NLP-detected contradiction must be reviewed by the trial attorney before use. The NLP system surfaces candidates; the attorney confirms materiality. :::

:::warning Timeline conflicts without reconciliation When your timeline builder finds that Document A says the meeting happened on March 15 and Document B says it happened on March 22, surfacing the conflict without reconciliation context is insufficient. The system should also retrieve any other documents that corroborate one date or the other, and flag whether one document is contemporaneous (more reliable) and one is retrospective (less reliable). An uncontextualized date conflict sent to a trial team creates confusion, not clarity. :::

:::warning Ignoring document metadata for timeline ordering Documents have created dates, modified dates, sent dates, and content dates - and these can differ significantly. An email forwarded in 2023 describing events from 2017 has a sent date of 2023 but describes events from 2017. A timeline built on file creation dates will be wrong. Always use the temporal expressions extracted from the document content as the primary dating signal, with document metadata as a secondary verification. :::


Interview Q&A

Q: How would you design a contradiction detection system that handles subtle semantic contradictions rather than only surface-level keyword conflicts?

The system has three layers: (1) Statement extraction - segment deposition transcripts into discrete factual claims using sentence boundary detection and NLP parsing. Each statement should be a single factual claim about a specific entity, event, or attribute. (2) Semantic pairing - use a bi-encoder to find pairs of statements from different sessions that are semantically related (talking about the same topic, entity, and time period). This is necessary because directly comparing all pairs across a 600-page deposition would be O(n^2) and too expensive. Use entity overlap as a pre-filter. (3) NLI classification - for related pairs, run a cross-encoder NLI model to classify the relationship as entailment, neutral, or contradiction. NLI models trained on large text corpora have learned semantic relationships that go well beyond keyword matching. Fine-tune on legal deposition examples to improve domain accuracy.

Q: What is the difference between TAR for document review and the NLP tasks in litigation support?

TAR is a binary relevance classification task - is this document relevant to the litigation? Litigation support NLP involves a wider range of tasks: multi-class event extraction for timelines, pair-wise relation classification for contradiction detection, named entity recognition and tracking for witness analysis, and regression for damages modeling. TAR is largely unsupervised in its label generation (attorney reviews a seed set and the model generalizes). Litigation support NLP is more task-specific and often requires labeled training data developed by legal professionals. The models also differ: TAR uses binary classifiers with active learning; litigation support uses NLI models, sequence labelers, and information extraction pipelines.

Q: How do you handle the quality problem with deposition transcripts produced by automated speech recognition?

Three-stage approach: (1) Confidence scoring - most ASR systems output per-word or per-segment confidence scores. Flag segments below 0.85 confidence for human review. (2) Domain vocabulary correction - build a case-specific vocabulary of technical terms, proper nouns, and specialized language from case documents. Post-process ASR output by looking for near-matches to vocabulary terms (edit distance-based correction). "Methylation" transcribed as "methylay shun" is correctable if "methylation" is in the vocabulary. (3) Contextual correction - use a language model to flag grammatically implausible segments that likely contain transcription errors. The goal is not perfect correction but identification of segments that require human review before NLP analysis.

Q: Walk me through how you would use AI tools to prepare cross-examination of an expert witness.

Five-step process: (1) Expert opinion extraction - process the expert's report and all prior deposition transcripts to extract their specific opinions on all relevant topics. Create a structured opinion database: topic, opinion, case, date. (2) Prior testimony search - search the opinion database for the expert's prior opinions on the same topics in other cases. Identify any evolution or inconsistency in their opinions over time. (3) Methodology analysis - extract the methodological assumptions the expert relies on. Cross-reference against published literature for challenges to those methodologies. (4) Contradiction flagging - run NLI against paired statements from the current report and prior transcripts. Surface potential impeachment material. (5) Cross-exam outline generation - use an LLM to draft a cross-examination outline organized by topic, with the impeachment questions sequenced for maximum impact. Flag every question with the source document. The attorney reviews, modifies, and sequences the outline using her trial judgment.

Q: What are the ethical concerns about using case outcome prediction in litigation practice?

There are three primary concerns: (1) Client autonomy - if outcome predictions are presented as authoritative rather than probabilistic, clients may feel pressured into settlements that are not in their best interest. A 70% win probability means a 30% loss probability. The client's risk tolerance, financial situation, and the strategic value of establishing legal precedent all matter more than the predicted probability for many clients. (2) Access to justice - if large law firms with sophisticated AI tools have significantly better outcome prediction than plaintiffs represented by smaller firms or pro se litigants, the information asymmetry could systematically disadvantage less well-resourced parties. (3) Self-fulfilling prophecy - if all litigants use the same prediction model, and the model predicts a certain outcome, the predictive certainty could change litigant behavior in ways that influence the actual outcome. Responsible use means presenting predictions as one input among many, always combining with attorney judgment, and being transparent with clients about what the model can and cannot predict.

© 2026 EngineersOfAI. All rights reserved.