Bank Compliance Automation with Lantern and Ecliptor
Demo
Nanki Grewal
Oct 21, 2024
In the financial industry, businesses must ensure that every interaction with customers complies with strict regulations. Traditionally, this involves people manually reviewing conversations, which is slow and can lead to mistakes. In addition, as regulations change, this can be hard to maintain.
Rule-based automations can help automate this, but most customer interaction data is unstructured, such as call transcripts. Furthermore, data formats like PDFs or images can be difficult to extract information from. This makes it difficult to build simple rules over the data we have for automation.
In this article, we’ll use Lantern and Ecliptor to build an application that efficiently searches for relevant compliance context for customer interactions, and uses the context and LLMs to automate compliance checks.
We’ll use Ecliptor to parse and process unstructured documents into structured formats. Ecliptor helps financial institutions process messy data so that they can use the data to build applications. We’ll store this data in Lantern Cloud — Lantern enables vector search and text search in Postgres.
Step 1: Ingest compliance policy documents
Financial services organizations have compliance policies across a multitude of documents to adhere to. To make use of these documents, we’ll transform the PDFs to Markdown using Ecliptor's Document Ingest API. This endpoint preserves table formatting and document structure.
Dataset
You can download sample documents detailing compliance acts and regulations from here: Banking Compliance Regulations and Acts.
Make a call to Ecliptor's ingest endpoint
The resulting markdown file contains the information in the PDF — we can now process this text into chunks for vector search.
In this article, we’ll use the "EQUAL CREDIT OPPORTUNITY ACT", accessible here. Download a sample of the generated markdown from Github.
Step 2: Create chunks for analysis
Simply converting the documents into text isn’t enough for effective searching and comparison. These documents are often lengthy and cover multiple topics, making it difficult to extract the relevant subset of information.
To address this, we break the text into smaller, meaningful sections — also referred to as chunks. One naive way to do this is to simply split text based on character count or sentence boundaries. However, this can leave out relevant context.
Ecliptor’s Smart Chunking API generates semantically meaningful chunks by analyzing the structure of the document, and injecting additional relevant information from elsewhere in the text if necessary. This approach allows us to get the most relevant and sufficient information to answer questions.
Make a call to Ecliptor's chunking endpoint
Pass the generated markdown file to Ecliptor’s chunking endpoint to receive a list of chunks to embed.
Once the API call is completed, you will have a list of roughly uniformly sized chunks which can be embedded using any embedding model.
Step 3: Store the chunks and generate embeddings
Next, we’ll use Lantern to store the chunks and index them for fast retrieval. You can sign up for a free database at Lantern Cloud.
Connect to the database
Generate embeddings using Open AI's embeddings model
Lantern can automatically generate embeddings of our data. To do this, you can simply enable an embedding generation job.
This can be done in the Lantern Cloud dashboard, or with SQL inside your database. We use the Python client below to set the OpenAI token and add an embedding generation job
More information about the embedding job service can be found here.
To see what embeddings were generated on your data, you can run the SQL query below.
Create Indexes for efficient search
We now have the contexts of our compliance documents and the corresponding generated embeddings stored in the compliance_documents
table. The next step is to create indexes over the data we want to search, to enable faster search over a large number of documents.
We’ll create an HNSW index over our vectors with the L2 distance function.
We are now ready to implement our compliance check application.
Step 4: Build an application to check customer interactions for compliance risks
Finally, we’ll build an application to check customer chat logs for compliance with regulations.
We’ll follow the following steps:
Embedding: We will generate vectors for each customer support chat message.
Search: We will use Lantern’s vector search to find the most relevant compliance chunks for each chat message.
LLM: We will input the chat message and the relevant compliance text into an LLM to determine compliance, flagging potential violations.
Chat interactions data set
We’ll use a synthetically generated dataset of customer support chats.
In these chats, clients are asking the customer support agents questions about the bank’s credit assessment process.
The downloadable csv can be found here: Bank Customer Support
The dataset has the columns id
(int), speaker_role
(string), text
(string), compliant
(bool). We will use entries in the text
column as input queries and use the `compliance`` column as ground truth for evlauating our model.
Create queries from each of the texts into the index
First generate embeddings for all of the queries. We will use the same embeddings model used to embed the entire corpus.
Flag non-compliant responses
Once we have found all the similar chunks, we have the information we need to be able to judge whether each chunk was adhering to compliance principles. We will use an LLM as a judge to flag possible non-compliance and return those responses.
Summary
In this post, we demonstrated a system for ensuring compliance in banking customer support using a custom dataset. We leveraged document understanding APIs from Ecliptor, data storage and search in Postgres with Lantern Cloud, and LLMs to automatically reason about compliance.
Interested in learning more?
Lantern is building Postgres for AI applications. Learn more about how Lantern supports vector search at scale, or sign up for a free database at Lantern Cloud.
Ecliptor is currently in private beta for financial services companies. If you have complex documents and want to extract valuable insights for downstream applications like in this post, reach out to us at founders@ecliptor.ai or visit ecliptor.ai