ARTICLE
Building an effective surveillance lexicon policy
Bloomberg Professional Services
How can compliance teams build lexicon policies that keep pace with shifting communication risks and regulatory scrutiny? This article explains what lexicons are, why they matter, and how to design effective, explainable policies that strengthen surveillance frameworks.
This article was written by Eugene Semetsky and Polly Abreu, product managers at Bloomberg Vault.
Lexicons – structured sets of keywords and phrases used to detect potential misconduct, regulatory breaches, or violations of firm policies across communications – remain a cornerstone of surveillance.
As artificial intelligence (AI) reshapes how firms monitor communication data, one question looms large: Will lexicons still be relevant?
For now, the answer appears to be yes. The combination of AI models and lexicon-based policies seems to offer the best of both worlds. Yet in this rapidly-evolving landscape, a key challenge remains: How can lexicons continue to evolve to stay both effective and relevant?
PRODUCT MENTIONS
This first article in our four-part series on the future of communications surveillance explores how to design effective and explainable lexicon policies that can adapt to an ever-changing regulatory and risk environment.
Part two explores how to build smarter lexicon searches. Part three dives into the lexicon calibration process, while part four looks at how AI-driven methods can complement lexicon-based approaches to strengthen surveillance programs.
Key definitions
- Lexicon: Structured set of words, phrases, and rules used to flag messages for review.
- Recall: Measures how effectively the surveillance system identified or flagged messages that were relevant to the targeted risk. Higher recall means fewer messages are missed that should have been identified or flagged.
- Precision: Measures the accuracy of the flagged messages. High precision means fewer false positives (as defined below).
- False positive: A flagged message that is not relevant to the targeted risk.
- Explainability: The ability to describe what a rule does, why it exists, and how it has changed over time.
The role of lexicons in communications surveillance
Lexicons remain central to communications surveillance because they provide a clear, explainable, and auditable way to detect risk. They can be deliberately crafted to flag words and phrases associated with potential misconduct, giving firms and regulators confidence that key indicators are consistently monitored.
While AI is rapidly advancing, questions remain about relying solely on AI models for this purpose, particularly around explainability and control. An effective surveillance program still depends on a firm’s ability to demonstrate a clear understanding of its communication risks, explain how tools capture them, and refine both on a regular and ongoing basis.
Regulatory requirements
Regulations generally do not explicitly mandate the use of lexicons, but regulators expect firms to maintain surveillance controls that are effective, proportionate, and explainable.
Regulatory expectations are more likely to be principles-based rather than prescriptive. While some jurisdictions provide detailed guidance and others focus more on outcomes, the underlying intent is consistent: Firms must demonstrate that they actively identify and manage communication risks.
Regulators also expect clear documentation, methodologies, robust governance, and evidence that surveillance tools (including lexicons) are tested and improved over time.
Strengths and limitations of lexicon-based surveillance
Lexicon-based surveillance frameworks have long provided a structured and defensible way to monitor communications in line with compliance expectations. Its transparency and traceability make it foundational to surveillance programs, but it has limitations.
Strengths:
- High precision for known risks: Effective at detecting specific indicators, such as references to prohibited actions or sensitive information.
- Transparency: Each rule can be documented and explained.
Weaknesses:
Lexicon systems are notorious for generating high volumes of false positives, creating heavy workloads for compliance teams to review. Additionally, lexicons require continuous maintenance: Firms that do not regularly update their processes, systems, and controls may be subject to increased regulatory scrutiny.
- Static lists and language drift: Language evolves quickly. Without regular updates, lexicons lose relevance and fail to capture new terminology.
- High false positives: Literal matches on common words can generate noise and alert fatigue.
- Context limitations: Tone, sarcasm, and coded language are difficult to detect using simple logic.
- Channel nuances: Each communication channel has its own structure, style, and shorthand. A single lexicon rarely performs consistently across all communication formats.
- Cultural nuances or contextual intent: Word-for-word translations often fail to capture cultural nuance or contextual intent, which can lead to missed risks or inaccurate alerts.
These limitations highlight the need for thoughtful design, regular calibration, and complementary AI techniques that can interpret context and sentiment more effectively.
Designing a thoughtful lexicon framework
The foundation of an effective lexicon policy begins not with words, but with risk assessment. Firms should identify the key risks they aim to mitigate and understand who communicates, on what channels, and in which languages.
When crafting your policy:
- Identify and define risks: Conduct a risk assessment to determine the types of market abuse, conduct, and/or other behavioral risks surveillance should monitor.
- Identify populations: Define in-scope and out-of-scope populations. Determine which roles, desks, and supervisors require shared or tailored lexicons.
- Specify channels: Establish an inventory of communication channels to determine where communications occur and how monitoring should be applied.
- Map languages: Assess the languages used across the business and design surveillance controls, including lexicons, that adequately provide coverage.
From there, lexicons can be designed to capture relevant behaviors, not just words. When building lexicons, firms can:
- Engineer explainable rules: Each rule should trace directly to a defined risk.
- Tailor to your business: Vendor lexicons are starting points, not templates.
- Balance coverage and relevance: Add context and exclusion logic to manage noisy terms like “quid pro quo.”
- Engage experts: Use regional and subject matter input to reflect real communication patterns.
- Iterate and test: Start small, pilot, refine, and expand based on results. Back-testing against historical data helps manage future alert volumes.
These features are increasingly considered minimum requirements, but they remain critical for operational effectiveness and regulatory defensibility.
How Bloomberg can help
Bloomberg provides an integrated archive and surveillance platform designed for explainable controls, ongoing calibration, and operational scale.
- Integrated compliance archive and surveillance in one system
Bloomberg Vault combines data capture, archiving, search, and communications surveillance with case‑management workflows, so teams can monitor, review, escalate, and report in a single environment. - Advanced lexicon and proximity search with AI assistance
Built‑in keyword, proximity, and policy tooling, augmented by AI, helps reduce false positives and surface contextually relevant results, supporting explainable detection and faster reviews. - Multi‑channel and multi‑language coverage (including voice)
Vault covers multiple channels, languages and supports voice capture/transcription, so the same policies can be applied to voice transcripts alongside email and chat. - Real‑time/near real‑time preventative controls
Firms can add preventative controls for Bloomberg instant messaging (Instant Bloomberg, “IB”) and Bloomberg message (“MSG”) in real time (and third‑party channels in near real time), stopping risky behavior before it has taken place. - Calibration, reporting, and governance
Permissions, on‑demand/scheduled reporting, and aggregate analytics support policy tuning, model governance, and regulatory attestations (including letters of undertaking where applicable).
Learn more here.
Conclusion
Lexicons remain the backbone of communication surveillance, offering explainable and auditable controls. Their effectiveness depends on how thoughtfully they are designed, tested, and maintained.
By grounding policies in risk assessment, building contextual and explainable rules, and combining them with AI’s interpretive strengths, firms can create surveillance frameworks that are both resilient and adaptive.
Now that we’ve explored how to design and maintain effective lexicon policies, part two explores how to build smarter lexicon searches. Part three will focus on lexicon calibration, how to test, tune, and measure effectiveness over time. Part four will examine how AI-driven methods can complement lexicon-based approaches, enhancing precision, context-awareness, and overall program performance.
FAQ
What is a communications surveillance lexicon?
A structured set of phrases and rules used to flag communications that may indicate risk, ranging from market abuse to non-financial misconduct.
How often should a lexicon policy be reviewed?
Lexicon policies should be reviewed on a regular basis to ensure they remain aligned with the firm’s risk profile, communication channels, and evolving language patterns.
What are common challenges when managing lexicons?
Common issues include high false-positive rates, overlapping rules, inconsistent regional coverage, and outdated or static keyword lists.
What metrics indicate lexicon effectiveness?
Useful indicators include alert volumes, false-positive ratios, true-positive findings, review time per alert, and the proportion of alerts leading to escalations or investigations. Monitoring these metrics helps calibrate and improve performance over time.
The information included in these materials is for illustrative purposes only and does not constitute legal, financial, or professional advice. Readers should not rely on this content as a substitute for advice from qualified legal or compliance professionals. Always consult your own legal and compliance teams before making decisions or taking action based on the information contained herein. The BLOOMBERG TERMINAL service and Bloomberg data products (the “Services”) are owned and distributed by Bloomberg Finance L.P. (“BFLP”) except (i) in Argentina, Australia and certain jurisdictions in the Pacific islands, Bermuda, China, India, Japan, Korea and New Zealand, where Bloomberg L.P. and its subsidiaries (“BLP”) distribute these products, and (ii) in Singapore and the jurisdictions serviced by Bloomberg’s Singapore office, where a subsidiary of BFLP distributes these products. BLP or one of its subsidiaries provides BFLP and its subsidiaries with global marketing and operational support and service. Certain features, functions, products and services are available only to sophisticated investors and only where permitted. BFLP, BLP and their affiliates do not guarantee the accuracy of prices or other information in the Services. Nothing in the Services shall constitute or be construed as an offering of financial instruments by BFLP, BLP or their affiliates, or as investment advice or recommendations by BFLP, BLP or their affiliates of an investment strategy or whether or not to “buy”, “sell” or “hold” an investment. Information available via the Services should not be considered as information sufficient upon which to base an investment decision. The following are trademarks and service marks of BFLP, a Delaware limited partnership, or its subsidiaries: BLOOMBERG, BLOOMBERG ANYWHERE, BLOOMBERG MARKETS, BLOOMBERG NEWS, BLOOMBERG PROFESSIONAL, BLOOMBERG TERMINAL and BLOOMBERG.COM. Absence of any trademark or service mark from this list does not waive Bloomberg’s intellectual property rights in that name, mark or logo. All rights reserved. ©Bloomberg.