Lexicon-based detection in electronic communications can be a bit like using a sledgehammer to crack a nut. Sure, it almost gets the job done, but creates huge inefficiencies and potentials risks. When it comes to lexicon-based detection, built around alerts that are triggered when certain keywords are used, one of the biggest problems is the enormous number of false positives that typically return. This makes the alerts that result from it virtually as ineffective as having no alerts at all.

Alert systems that are designed to pick up every instance of certain words being used will do this regardless of context. For example, an alert for “wash” (as in “wash trades”) could also be triggered by references to Washington or someone talking about dishwashing. An alert for “fix” or “fixing” could also pick up conversations about DIY or fixing dinner. And “churn” could just trigger alerts over a trader talking about clients leaving.

Massive numbers of false positives

This can cause firms to scale down the input parameters or only review %s of the total communications, leaving gaps, and therefore unseen risks.

In total, when it comes to a common, lexicon-based system, out of a total of 75,000 communications ingested and analyzed daily, approximately 3,000 alerts may be generated during that same period. This high number of alerts that are being generated this way is a massive burden for compliance surveillance teams, especially with the current market “norm” of 95-99.9% false-positive rate. Not only is this not an effective automated form of combatting market abuse and misconduct by actors, but it also adds more workload to already overworked analysts who have to comb through these alerts to ensure regulatory compliance. Due to the sheer overload of numbers, many of these alerts are ignored as a result.

Over time, lexicon-based and manual systems have only become less fitting for the job. Communication changes, whether it be language that’s reinvented or the growing number of communication channels that organizations are obliged to monitor. Monitoring these channels, in a world with SMS, video calls, WhatsApp, Skype, and far more, using outdated tools only adds to the number of false positives. Failing to do so creates major risk when it comes to compliance — which already costs offenders millions of dollars in fines.

Spending on compliance is ramping up

Today, financial institutions are spending more on compliance than ever. According to a Risk Management Association survey, 50% of firms that responded said they spent between 6-10% of revenue on costs related to compliance. Big firms frequently spend the equivalent of $10,000 per employee — in firms that can employ many thousands of employees. The size of potential fines for lack of compliance makes this money well spent. For example, in 2020 the largest regulatory fine was an enormous $3 billion fine for Wells Fargo. While this was the biggest sum, it was not a major outlier when it comes to damages.

For this amount of money, FIs must at least make sure that they are utilizing the right tools for the job — and not making their surveillance efforts more challenging than they need to be.

Shield and its focus

Shield’s approach focuses on the three C’s: Content, Context, and Characteristics. Content isn’t simply analyzed by looking for keywords. It employs a variety of text analysis tools including tokenization, stemming, fuzzy matching, AI algorithms, and Expert Driven Rules, in addition to Shield’s proprietary multilingual lexicons, which can be tailored to clients’ specific pain points. Context, meanwhile, uses machine learning and advanced Natural Language Processing (NLP) tools to determine the context of specific comments with a high level of precision. Finally, Characteristics means using data enrichment tools to build up a more complete picture of comms, including accurate profiles of the communicating parties.

This is all done in a Hybrid Surveillance system that’s capable of linking data from a variety of surveillance systems to accurately and comprehensively detect market abuse scenarios, information handling issues, and more. It manages this high level of data completeness, while also reducing false positives by more than 80%, compared to inaccurate lexicon-driven alerts.

The results are a game-changer when it comes to surveillance.