Intellectual Property and AI
The Patent That Should Not Have Been Granted
In 2004, the patent examiner at the USPTO had 18 hours per application to review prior art, assess claims, and issue a determination. The examination time has not increased substantially since - if anything, the time pressure is greater. Patent examiners review complex technical inventions in pharmaceuticals, semiconductor manufacturing, cryptography, and machine learning with a few hours per application and access to limited prior art databases.
The result: patents that should not have been granted. The software patent wars of the 2000s and 2010s were fought largely over patents that claimed abstract ideas that had existed in the prior art for years before the application was filed. Companies spent billions defending against patents that skilled searchers could have found prior art for - if they had been given the time and tools to do the search.
The problem is not examiner incompetence. It is information asymmetry at scale. There are approximately 3.5 million active US patents and tens of millions of published patent applications worldwide. The EPO adds 180,000+ applications per year. The prior art universe includes not just patents but academic papers, conference proceedings, technical standards, and product manuals. No human researcher can search this corpus comprehensively in 18 hours.
This is the core problem that AI-based patent search addresses. Not to replace the patent examiner, but to dramatically compress the time required to search a comprehensive prior art corpus. A dense retrieval system that embeds every patent claim and can find the 20 most semantically similar claims to a new application in milliseconds changes what comprehensive search means.
The economic stakes are enormous. A valid patent on a pharmaceutical compound can be worth billions. An invalid patent that should have been rejected creates 20 years of barriers to competition. An AI system that consistently surfaces prior art that human search would miss - this is one of the highest-value applications of NLP in the legal domain.
Why This Exists
Patent search is fundamentally a semantic search problem. A patent claim for "a method of training a neural network using gradient descent with adaptive learning rate adjustment" might be anticipated by prior art that describes "a machine learning optimization process using back-propagation with momentum-based step size modification." The concepts are the same; the words are entirely different.
Keyword-based patent search - which is what most patent examiners and many patent attorneys have relied on for decades - fails on semantic mismatch. IPC (International Patent Classification) codes help but are coarse-grained and inconsistently applied. The same invention can receive different IPC classifications from different examiners.
The freedom-to-operate analysis - determining whether a product infringes any existing patents before market launch - has the same problem. A company launches a new product and needs to know whether it infringes any of the 3.5 million active US patents. Keyword search misses semantically equivalent claims written with different vocabulary. AI-based semantic search changes the coverage possible within a reasonable time and cost envelope.
Trademark clearance has a parallel problem. A new brand name must be cleared for trademark conflicts with thousands of existing registered marks. Visual similarity (logo) and phonetic similarity (name) create conflicts even when the exact words differ. "Lyft" and "Lft" (hypothetical) would be confusingly similar even though they are different letter sequences.
Historical Context
Patent search has been automated in various forms since the 1970s. The USPTO's patent database became publicly searchable online in 1995. Google Patent Search (2006) and Lens.org (open access, 2010) expanded access. But these were all keyword-based.
The first semantic patent search systems emerged around 2015 with companies like Patsnap, Derwent Innovation, and Innography. These used TF-IDF-based similarity and early word embedding approaches (word2vec applied to patent text). They were significantly better than keyword search but had limitations in handling technical vocabulary and claim structure.
The transformer revolution fundamentally changed patent NLP. A series of papers from 2020-2022 showed that BERT-based models fine-tuned on patent data substantially outperformed all prior approaches on patent retrieval benchmarks (BigPatent, PatentMatch). Patent-specific pre-trained models like PatentBERT achieved state-of-the-art performance on claim-level retrieval.
The AI inventorship question became legally significant in 2020-2022 when Dr. Stephen Thaler filed patent applications in 16+ countries listing his DABUS AI system as the inventor. All major patent offices rejected these applications, holding that an inventor must be a natural person. The UK Court of Appeal upheld this in 2021 (Thaler v. Comptroller-General of Patents). The US Federal Circuit upheld the USPTO's rejection in 2022 (Thaler v. Vidal). The legal question of AI-generated inventions remains unsettled in terms of future legislative change.
Copyright in AI-generated content became highly contested in 2022-2023 with the surge of generative AI. The US Copyright Office issued guidance in 2023 stating that works generated entirely by AI without human creative control are not copyrightable, while works incorporating AI assistance with human creative contribution can be registered.
Core Concepts
Patent Claim Structure and Semantic Challenges
Understanding patent claims is essential for building effective patent NLP systems. A patent claim defines the scope of legal protection. It has a preamble (what the invention is), a transition ("comprising," "consisting of"), and elements (the specific components or steps).
Independent claim example:
Claim 1: A method comprising:
receiving, by a computing device, a first data stream;
applying a neural network to the first data stream to generate a predicted output;
comparing the predicted output to a ground truth label; and
updating parameters of the neural network based on the comparison.
Dependent claim: "Claim 3: The method of claim 1, wherein the neural network is a convolutional neural network."
For prior art searching, the goal is to find prior art that discloses each element of an independent claim. A claim is anticipated (invalidated) if a single prior art document discloses all elements. A claim is obvious if a combination of prior art references, taken together, discloses all elements.
The NLP challenge: each element is described in technical language that can vary enormously between patents in the same field. "A neural network," "a machine learning model," "a deep learning architecture," and "an artificial intelligence system" can all refer to the same concept. The prior art search must capture all of these variations.
Dense Retrieval for Patent Search
The architecture is standard bi-encoder retrieval:
- Query encoder: encodes the new patent application (or specific claims) into a query vector
- Document encoder: encodes each prior art patent document into a document vector
- Index: FAISS over all document vectors (3.5 million US patents + international patents = approximately 50-100 million chunks)
- Retrieval: top-k most similar document vectors
What makes patent retrieval different from general retrieval:
- Claim-level vs document-level: retrieve at the claim level, not the document level. A patent with 20 claims may have one claim that anticipates your new application. Document-level retrieval misses this.
- Technical vocabulary: patents use highly specialized technical vocabulary. Pre-training on patent text is essential.
- Claim dependencies: dependent claims add constraints to independent claims. The search must understand that a search for Claim 3 must also satisfy Claim 1.
Freedom-to-Operate Analysis
Freedom-to-operate (FTO) analysis asks: can we manufacture and sell this product in this market without infringing any active patents? The AI-assisted workflow:
- Generate a technical description of the product from engineering specifications
- Convert each functional component into a claim-like description
- Search for patent claims that might read on each component
- For each hit, claim chart: map the patent claim elements to the product
- Identify high-risk patents for attorney review
The claim charting step is particularly amenable to NLP: given a patent claim and a product description, extract the claim elements and find corresponding features in the product description.
Trademark Similarity Detection
Trademark conflicts arise from:
- Phonetic similarity: "Lyfts" and "Lifts" sound similar
- Visual similarity: logos with similar design elements
- Semantic similarity: marks with similar meanings in different languages
- Trade dress: overall appearance and marketing presentation
NLP-based trademark clearance:
- Phonetic encoding: Soundex, Metaphone, or Double Metaphone for phonetic similarity
- Edit distance: Levenshtein distance for string similarity
- Semantic embedding: word embeddings for semantic similarity
- Combined scoring: weighted combination of phonetic + visual + semantic similarity
The combined score is compared against a threshold. Marks above the threshold are flagged for attorney review. The attorney applies the LIKELIHOOD OF CONFUSION standard from trademark law, which involves additional factors beyond similarity (channels of trade, strength of the senior mark, sophistication of consumers).
Copyright in AI Training Data
The legal landscape here is genuinely unsettled. Several pending cases as of 2024-2025 address whether:
- Scraping publicly available text for LLM training constitutes copyright infringement (NYT v. Microsoft/OpenAI)
- Whether AI-generated outputs can infringe copyright (image generator cases)
- Whether memorized training data in LLM outputs creates infringement liability
The engineering implications of uncertainty: build systems that can support attribution when required. Keep records of what training data was used. For commercial deployments, opt for training data with clear licensing (CommonCrawl with reasonable use, licensed content, public domain). Implement content filtering to prevent verbatim reproduction of copyrighted content in outputs.
Code Examples
Patent Prior Art Search with Semantic Embeddings
"""
Patent prior art search system using dense retrieval.
Designed for searching USPTO and EPO patent databases.
"""
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass, field
import re
import json
@dataclass
class Patent:
"""A patent document with structured metadata."""
patent_number: str
title: str
abstract: str
claims: List[str]
description_excerpt: str
filing_date: str
grant_date: str
assignee: str
ipc_codes: List[str]
citations: List[str] = field(default_factory=list)
def get_independent_claims(self) -> List[str]:
"""Extract independent claims (those not referencing prior claims)."""
independent = []
for claim in self.claims:
# Independent claims don't reference other claim numbers
if not re.search(r"claim\s+\d+", claim, re.IGNORECASE):
independent.append(claim)
return independent
def get_searchable_text(self) -> str:
"""Combine claim and abstract text for embedding."""
independent_claims = self.get_independent_claims()
claim_text = " ".join(independent_claims[:5]) # Top 5 independent claims
return f"TITLE: {self.title}\nABSTRACT: {self.abstract}\nCLAIMS: {claim_text}"
class PatentEmbeddingIndex:
"""
FAISS-backed index for semantic patent search.
Embeds at claim level for fine-grained retrieval.
"""
def __init__(self, model_name: str = "sentence-transformers/all-mpnet-base-v2"):
# For production, use a patent-specific model:
# - PATBERT (fine-tuned BERT on patents)
# - AI2_SPECTER for scientific text
# - A custom model fine-tuned on (query, relevant patent) pairs
self.model = SentenceTransformer(model_name)
self.index: Optional[faiss.Index] = None
self.patent_map: List[Dict] = [] # Maps index position to patent metadata
self.dimension: int = 0
def _embed_patent(self, patent: Patent) -> np.ndarray:
"""
Embed a patent using its claims and abstract.
Uses mean pooling across claim-level embeddings.
"""
# Embed each claim separately then pool
claims_to_embed = patent.get_independent_claims()[:10]
if not claims_to_embed:
claims_to_embed = [patent.abstract]
embeddings = self.model.encode(
claims_to_embed,
normalize_embeddings=True,
show_progress_bar=False,
)
# Mean pool across claims
return np.mean(embeddings, axis=0)
def build_index(self, patents: List[Patent]) -> None:
"""Build FAISS index from patent corpus."""
print(f"Building index for {len(patents)} patents...")
embeddings = []
self.patent_map = []
for patent in patents:
# Index at claim level for finer retrieval
independent_claims = patent.get_independent_claims()
for i, claim in enumerate(independent_claims[:5]):
claim_embedding = self.model.encode(
[claim], normalize_embeddings=True
)[0]
embeddings.append(claim_embedding)
self.patent_map.append({
"patent_number": patent.patent_number,
"title": patent.title,
"claim_index": i,
"claim_text": claim[:500],
"assignee": patent.assignee,
"grant_date": patent.grant_date,
"ipc_codes": patent.ipc_codes,
})
embeddings_array = np.array(embeddings, dtype="float32")
self.dimension = embeddings_array.shape[1]
# Use IVFFlat for large patent databases
nlist = min(int(np.sqrt(len(embeddings))), 256)
quantizer = faiss.IndexFlatIP(self.dimension)
self.index = faiss.IndexIVFFlat(
quantizer, self.dimension, nlist, faiss.METRIC_INNER_PRODUCT
)
self.index.train(embeddings_array)
self.index.add(embeddings_array)
print(f"Index built: {self.index.ntotal} claim vectors")
def search(
self,
query: str,
k: int = 20,
ipc_filter: Optional[List[str]] = None,
date_before: Optional[str] = None,
) -> List[Dict]:
"""
Search for prior art patents similar to the query.
Returns ranked list of patent results.
"""
query_embedding = self.model.encode(
[query], normalize_embeddings=True
).astype("float32")
# Set nprobe for IVF index - higher = more accurate but slower
if hasattr(self.index, "nprobe"):
self.index.nprobe = 10
scores, indices = self.index.search(query_embedding, k * 3)
results = []
seen_patents = set()
for score, idx in zip(scores[0], indices[0]):
if idx == -1:
continue
patent_meta = self.patent_map[idx]
patent_number = patent_meta["patent_number"]
# Deduplicate - take best claim match per patent
if patent_number in seen_patents:
continue
# Apply filters
if ipc_filter and not any(
ipc in patent_meta.get("ipc_codes", []) for ipc in ipc_filter
):
continue
if date_before and patent_meta.get("grant_date", "9999") > date_before:
continue
seen_patents.add(patent_number)
results.append({
**patent_meta,
"relevance_score": float(score),
})
if len(results) >= k:
break
return results
class PriorArtSearchSystem:
"""
End-to-end prior art search for patent applications.
Generates structured prior art analysis reports.
"""
def __init__(self):
self.index = PatentEmbeddingIndex()
def search_prior_art(
self,
application: Patent,
k: int = 10,
) -> Dict:
"""
Search for prior art for a new patent application.
Returns structured report with claim-by-claim analysis.
"""
report = {
"application": application.patent_number,
"title": application.title,
"claim_analyses": [],
}
for i, claim in enumerate(application.get_independent_claims()):
# Search for each independent claim separately
results = self.index.search(
query=claim,
k=k,
# Filter to patents filed before the application
date_before=application.filing_date,
)
report["claim_analyses"].append({
"claim_number": i + 1,
"claim_text": claim[:300],
"prior_art_found": results[:5],
"highest_similarity": results[0]["relevance_score"] if results else 0.0,
"anticipation_risk": (
"HIGH" if results and results[0]["relevance_score"] > 0.85
else "MEDIUM" if results and results[0]["relevance_score"] > 0.70
else "LOW"
),
})
return report
# --- Trademark Similarity Detection ---
import unicodedata
from metaphone import doublemetaphone # pip install metaphone
class TrademarkSimilarityChecker:
"""
Multi-dimensional trademark similarity checking.
Combines phonetic, orthographic, and semantic similarity.
"""
def __init__(self):
self.semantic_model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
self.registered_marks: List[Dict] = []
self.mark_embeddings: Optional[np.ndarray] = None
def _normalize_mark(self, mark: str) -> str:
"""Normalize trademark for comparison."""
# Remove special characters, normalize unicode
normalized = unicodedata.normalize("NFKD", mark)
normalized = normalized.encode("ascii", "ignore").decode("ascii")
return normalized.upper().strip()
def _phonetic_code(self, mark: str) -> Tuple[str, str]:
"""Get double metaphone encoding for phonetic comparison."""
cleaned = re.sub(r"[^a-zA-Z\s]", "", mark)
words = cleaned.split()
if not words:
return ("", "")
# Encode first word (most important phonetically)
return doublemetaphone(words[0])
def _orthographic_similarity(self, mark1: str, mark2: str) -> float:
"""Compute normalized edit distance."""
s1, s2 = mark1.upper(), mark2.upper()
m, n = len(s1), len(s2)
if max(m, n) == 0:
return 1.0
# Levenshtein distance
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(m + 1):
dp[i][0] = i
for j in range(n + 1):
dp[0][j] = j
for i in range(1, m + 1):
for j in range(1, n + 1):
cost = 0 if s1[i-1] == s2[j-1] else 1
dp[i][j] = min(dp[i-1][j] + 1, dp[i][j-1] + 1, dp[i-1][j-1] + cost)
return 1 - dp[m][n] / max(m, n)
def _phonetic_similarity(self, mark1: str, mark2: str) -> float:
"""Compare phonetic encodings."""
code1 = self._phonetic_code(mark1)
code2 = self._phonetic_code(mark2)
# Check if primary or secondary codes match
if code1[0] and code2[0] and code1[0] == code2[0]:
return 1.0
if code1[1] and code2[1] and code1[1] == code2[1]:
return 0.8
return 0.0
def build_registry(self, registered_marks: List[Dict]) -> None:
"""Build searchable index of registered marks."""
self.registered_marks = registered_marks
mark_texts = [m["mark_name"] for m in registered_marks]
self.mark_embeddings = self.semantic_model.encode(
mark_texts, normalize_embeddings=True
)
def check_clearance(
self,
new_mark: str,
goods_services: str,
similarity_threshold: float = 0.75,
top_k: int = 10,
) -> List[Dict]:
"""
Check a new trademark for conflicts with registered marks.
Returns list of potential conflicts sorted by similarity.
"""
# Semantic search for candidate conflicts
query_embedding = self.semantic_model.encode(
[new_mark], normalize_embeddings=True
)
similarities = np.dot(self.mark_embeddings, query_embedding.T).flatten()
top_indices = np.argsort(-similarities)[:top_k * 3]
conflicts = []
for idx in top_indices:
registered = self.registered_marks[idx]
semantic_sim = float(similarities[idx])
phonetic_sim = self._phonetic_similarity(new_mark, registered["mark_name"])
ortho_sim = self._orthographic_similarity(new_mark, registered["mark_name"])
# Weighted combination
combined_score = (
0.4 * semantic_sim
+ 0.4 * phonetic_sim
+ 0.2 * ortho_sim
)
if combined_score >= similarity_threshold:
conflicts.append({
"registered_mark": registered["mark_name"],
"registration_number": registered.get("reg_number", ""),
"owner": registered.get("owner", ""),
"goods_services": registered.get("goods_services", ""),
"semantic_similarity": semantic_sim,
"phonetic_similarity": phonetic_sim,
"orthographic_similarity": ortho_sim,
"combined_score": combined_score,
"conflict_risk": (
"HIGH" if combined_score >= 0.85
else "MEDIUM" if combined_score >= 0.75
else "LOW"
),
})
return sorted(conflicts, key=lambda x: x["combined_score"], reverse=True)[:top_k]
Mermaid Diagrams
Patent Prior Art Search Pipeline
Trademark Clearance System
AI and Copyright Legal Landscape
Production Engineering Notes
Keeping Patent Indices Current
The USPTO publishes approximately 6,000 new patents per week. EPO publishes comparable volume. Your patent index needs weekly updates to remain current for FTO analysis (you need to know about patents granted last week before you launch a product).
Update pipeline:
- Subscribe to USPTO PatentsView bulk data dumps (weekly) and EPO Open Patent Services API
- Process new patent grants: extract claims, abstract, metadata
- Embed new patents at the claim level
- Add to FAISS index (or rebuild weekly for clean state)
- Update metadata store
Watch for continuation patents - a patent application family can generate dozens of patent numbers over years. A continuation may have slightly different claims from the parent. Index each continuation separately and track family relationships.
Claim Scope Analysis
Claim scope - determining what a claim does and does not cover - is the hardest problem in patent NLP. A claim like "a machine learning model comprising at least one neural network layer" is extremely broad. A claim like "a convolutional neural network comprising exactly 7 layers of 3x3 filters" is narrow.
For FTO analysis, claim scope determines whether a product infringes. A product that does not have exactly 7 layers of 3x3 filters does not infringe the narrow claim. A product that uses any neural network layer potentially infringes the broad claim.
Automated claim scope analysis uses:
- Claim breadth scoring: count the constraints in a claim. More constraints = narrower scope.
- Functional language detection: phrases like "configured to," "adapted to," "operable to" indicate functional claiming, which has different infringement analysis than structural claiming.
- Means-plus-function detection: claims using "means for [function]" are interpreted differently under 35 USC 112(f) - they cover only the specific structures described in the specification.
These are legal determinations that require attorney confirmation. The NLP output is a risk score and analysis rationale, not a legal conclusion.
Patent Valuation for Portfolio Management
Patent portfolios are business assets. Companies make acquisition, licensing, and litigation decisions based on patent portfolio value. ML-based patent valuation uses:
- Citation count and citation quality (who cited the patent)
- Family size (number of continuations and international equivalents)
- Remaining life (time to expiration)
- Claim breadth scores
- Prosecution history (was the application heavily rejected before grant?)
- Licensing revenue history (if available)
- Technology area growth metrics (is the technology area growing in importance?)
Gradient boosted trees work well for valuation - the features are heterogeneous (text + numeric + categorical) and the relationship between features and value is non-linear but not deep.
Common Mistakes
:::danger Treating semantic similarity as infringement analysis High semantic similarity between a patent claim and a product feature means the patent is worth investigating - not that the product infringes. Infringement analysis is a legal determination that involves claim construction (what does each claim term mean?), the doctrine of equivalents, prosecution history estoppel, and other legal concepts. An NLP system can flag candidates; only a patent attorney can render an infringement opinion. :::
:::danger Using patent-filing-date as prior art date cutoff Prior art for a US patent application includes anything public before the effective filing date, including publications by the applicant within one year under the grace period. For international patents under PCT, the priority date and national phase filing dates create a complex prior art landscape. Using the grant date (rather than the filing date) as your cutoff incorrectly includes documents filed after the application - those documents cannot be prior art to the application. :::
:::warning Ignoring claim dependencies in prior art searches A dependent claim adds limitations to the independent claim it depends from. If you are searching for prior art to invalidate Claim 3 (which depends from Claim 1), you need prior art that discloses all elements of Claim 1 AND the additional limitation in Claim 3. Searching only for the Claim 3 language without the Claim 1 context will find incomplete prior art. :::
:::warning Overconfidence in AI inventorship analysis The question of whether AI can be an inventor (or whether AI assistance in invention affects inventorship) is evolving law. Courts and patent offices have consistently held that AI cannot be a named inventor. But the question of when sufficient AI assistance to a human inventor creates joint inventorship issues - or requires disclosure to the patent office - remains legally uncertain. Do not treat current case law as settled for future AI-assisted invention scenarios. :::
Interview Q&A
Q: How does claim-level embedding improve patent prior art search compared to document-level embedding?
A patent document with 20 claims covers a range of subject matter. The abstract describes the broadest embodiment, but specific claims contain limitations that define what the patent actually protects. Document-level embedding averages across all 20 claims, producing a vector that represents the document broadly but may not accurately represent any specific claim. For anticipation analysis, you need to find prior art that discloses all elements of a specific independent claim - the document-level vector is too coarse. Claim-level embedding creates separate vectors for each independent claim. A search for prior art to Claim 1 retrieves the most semantically similar claim-level embeddings across the prior art corpus, which are much more precisely targeted to the actual claim scope.
Q: Walk me through how you would build an automated FTO analysis workflow for a new product feature.
Five stages: (1) Feature decomposition - work with the engineering team to decompose the product feature into discrete technical functions and structures. Each function becomes a search query. (2) Patent search - for each function, run semantic search against the active patent index (filtered to in-force patents in the relevant jurisdictions). Retrieve top-20 patents per function. (3) Claim mapping - for each retrieved patent, use NLP to map the patent's claim elements to the product feature description. Generate a claim chart showing which product features correspond to which claim elements. (4) Risk scoring - score each patent for infringement risk based on: how many claim elements are present in the product, claim scope (broader = higher risk), patent owner's litigation history, remaining patent term. (5) Attorney review - present the top-risk patents with claim charts to a patent attorney for infringement analysis. The attorney renders the FTO opinion.
Q: What makes trademark similarity a multi-dimensional NLP problem, and how do you combine the dimensions?
Trademark similarity has four independent dimensions: (1) Phonetic - do the marks sound similar when spoken? Two marks can be visually different but phonetically identical ("Color" and "Colour," "Smith" and "Smythe"). Phonetic encoding with Double Metaphone captures this. (2) Orthographic - are the marks visually similar when written? Edit distance and n-gram overlap measure this. (3) Semantic - do the marks have similar meanings? Word embeddings capture semantic similarity. "Apex" and "Summit" are semantically similar even though phonetically and orthographically different. (4) Visual (for logos) - are the logo designs similar? CNN-based image similarity for logo comparison. The combination is a weighted score, but the weights should be calibrated to the specific trademark category and goods/services class. For word marks, phonetic and semantic carry more weight. For design marks (logos), visual similarity dominates.
Q: What are the current legal positions on AI-generated inventions, and what does this mean for engineering teams building AI tools?
As of 2024, all major patent offices (USPTO, EPO, UKIPO, CNIPA) hold that an inventor must be a human natural person. DABUS-type applications have been rejected everywhere. The Federal Circuit affirmed this for the US in Thaler v. Vidal (2022). This means AI cannot be listed as an inventor. However, an AI tool that assists a human inventor does not necessarily affect inventorship - humans regularly use software tools without those tools becoming co-inventors. The disclosure obligation question is less clear: some argue that AI-assisted invention requires disclosure in the patent application or to the patent office, analogous to disclosing other enablement information. For engineering teams: document the human contribution to any AI-assisted inventive work clearly. Ensure that inventorship determinations are made by patent counsel with full knowledge of what AI tools were used and how they contributed.
Q: How do you handle the patent corpus update problem - specifically, keeping an FTO index current as new patents grant weekly?
Three-component approach: (1) Incremental indexing - rather than rebuilding the entire index weekly, maintain an online index that accepts new vectors. Vector databases like Weaviate, Qdrant, or Milvus support efficient online updates. Alternatively, maintain a small "recent patents" index that is merged with the main index monthly. (2) Alert monitoring - for active FTO matters, set up standing queries against a real-time patent publication feed. When a new patent publishes in a relevant technology class (IPC code filter), automatically flag it for the matter team. This catches patents that are relevant to your product before they make it into the next scheduled index rebuild. (3) Continuation tracking - monitor patent families associated with known risk patents. When a continuation or continuation-in-part publishes with new or different claims, alert the matter team immediately. Continuations can have broader claims than the parent and represent a significant risk.
