CASE STUDY

Full-Stack AI Financial Document Analyzer for Leasing Decisions

ESKA

Industry Finance / Leasing

Region Europe

Timeline Full-cycle engagement

Team Trembit dedicated engineering team

PyTorch TensorFlow

Models

Custom Document Processing (OCR NER)

Language

Python

The Problem

A European leasing company was drowning in bank statements. Every leasing application required reviewing three to twelve months of bank statements from dozens of different European banks, each with its own format, layout, language, and level of detail. Analysts spent hours per application manually extracting income figures, recurring expenses, debt obligations, and cash flow patterns, then compiling everything into a standardized report for underwriters. The process was slow, error-prone, and could not scale — during peak periods the backlog grew to days, and every day of delay meant lost deals. OCR tools and template-based extractors failed because European bank statements vary too much in language, layout, and formatting. They needed an AI system that could understand financial documents the way a human analyst does — recognizing what a document is, finding the relevant numbers regardless of format, and producing a structured financial profile the underwriting team could trust.

Why Building an AI Financial Document Analyzer for European Leasing Is Hard

Automated financial document analysis sounds like a solved problem until you confront the diversity of real-world European bank statements and the accuracy requirements of leasing decisions:

Extreme format diversity across European banks — a Commerzbank statement looks nothing like a BNP Paribas statement, which looks nothing like an ING statement; layouts differ in columns, categorization, and balance presentation, before accounting for hundreds of smaller regional banks
Multi-language financial terminology — "Habenzinsen" (German), "virement" (French), and "storno" (Dutch/German) all describe financial concepts the system must understand; a general translator misses the financial semantics and a keyword dictionary cannot cover the variations
Scanned documents and degraded image quality — many applicants submit phone photos or scanned paper statements with skew, low resolution, stamps over text, and faded print, where standard OCR produces garbled output
Numerical accuracy at decision-grade precision — a single misread digit in a salary figure can change an approval to a rejection; the system must achieve near-perfect numerical extraction so underwriters can trust the output without re-checking every number
Transaction categorization without standardized labels — a salary payment might appear as "GEHALT," "SALAIRE," "Loonbetaling," or just a company name; the AI must infer categories from context, amounts, timing, and partial descriptions
Structured report generation from unstructured inputs — the output is a financial profile with calculated metrics (average income, recurring obligations, debt-to-income ratio, cash flow volatility) requiring temporal understanding across multiple statement pages

What We Did

Document Ingestion & Intelligent OCR

Built the multi-format document ingestion pipeline in Python — accepting native and scanned PDFs, spreadsheets, images, and mixed-format uploads with automatic format detection and routing
Developed custom OCR models using PyTorch trained on European financial documents — handling comma-as-decimal separators, varied date formats, and currency conventions that generic OCR struggles with
Implemented document deskewing and quality enhancement plus page classification — straightening skewed scans and distinguishing statement pages from cover letters and loan schedules

Financial Entity Extraction & NLP

Developed the financial entity extraction engine using TensorFlow — custom NER models identifying income entries, expenses, loan payments, balances, dates, and references regardless of formatting or language
Built multi-language financial NLP covering German, French, Dutch, Polish, Italian, Spanish, and English, with domain-specific models that understand banking terminology and abbreviations
Implemented transaction categorization (rule-based + ML) and cross-statement reconciliation — linking recurring income, detecting missed payments and new debt, and flagging inconsistencies that might indicate altered documents

Financial Profile & Report Generation

Built the automated financial profile generator — calculating average monthly income, recurring obligations, discretionary cash flow, debt-to-income ratio, income stability, and expense volatility from raw transactions
Implemented trend analysis over the statement period and a standardized report with source citations linking every figure back to the specific transaction and page
Built confidence scoring at every extraction and calculation step, flagging below-threshold figures for human review rather than presenting them as definitive

Integration, Validation & Deployment

Developed the analyst review interface — the AI-generated profile alongside the original documents, with highlighted source regions and the ability to correct extraction errors that feed back into training
Built automated validation rules cross-checking internal consistency (balances vs. transaction flows, plausible income totals) and batch processing across GPU workers for high-volume periods
Implemented a model retraining pipeline that captures analyst corrections as training data, continuously improving extraction accuracy, and load-tested under peak application volume

Discuss Your Project

Key Results

Minutes per app Down from hours of manual analyst review

Multi-format PDFs, scans, spreadsheets, and images from dozens of European banks

7+ languages German, French, Dutch, Polish, Italian, Spanish, English

Decision-grade Confidence-scored output with flagged uncertainty for human review

Standardized reports Financial profiles with source citations and calculated risk metrics

In Their Words

Trembit built us an AI system that reads bank statements the way our best analysts do — except it handles every European bank format, works in seven languages, and finishes in minutes instead of hours. Our underwriting team trusts the output because every number links back to the source document.

Leasing company operations director

Their proactive team gets things done as if it were their own project.

Trembit client

What We Learned

The solution to bank statement diversity is not more templates — it is teaching the model to read like a human

Early approaches built format-specific extractors (detect Deutsche Bank, apply its template). This worked for the top twenty banks but broke every time a layout changed or a new regional bank appeared. We trained the PyTorch models to understand semantic structure — a bold number at the bottom of a column is probably a balance; the largest regular monthly credit is probably salary. This generalized across banks we had never seen, because the underlying structure is universal even when formatting is not.

Numerical accuracy requires redundant verification, not just better OCR

OCR can misread a 3 as an 8 or drop a digit — and any of these can change a decision. We built a multi-pass verification system cross-checking extractions against mathematical relationships (do transactions sum to the balance difference?), temporal patterns (is this month's salary consistent?), and formatting cues. When checks disagree, the system flags the specific number for review. This catches over 95% of OCR numerical errors before they reach the report.

The financial profile is more valuable than the raw extraction — but only if every number is traceable

Underwriters want calculated metrics (debt-to-income, income stability), but they cannot trust a number they cannot verify. We built the report so every metric drills down: the ratio links to income and debt figures, which link to specific transactions, which link to the highlighted region of the original page. This traceability turned the AI from a black box analysts were skeptical of into a tool they actively preferred.

Need AI Document Analysis?

Book a 30-minute architecture session — we'll discuss your document processing requirements and the infrastructure decisions that matter most. No pitch deck. Just engineering clarity.