project Mar 3, 2026

How I Taught Bob to Download My Bank Statements

Reading style:

automation reverse-engineering web banking

TLDR: I was tired of manually logging into three bank websites every month to download statements. So I figured out how to make Bob (my AI partner) do it for me by talking to the same behind-the-scenes system the bank’s own website uses. It worked, and the approach applies to basically any website that guards its data behind a login.

The ten minutes I couldn’t stand

Every month, same routine. Log into bank number one. Navigate to statements. Click download. Save the PDF. Repeat for bank number two. Bank number three. Ten minutes, tops.

But here’s the thing: I’d already built a system that reads those PDFs and automatically sorts my expenses into categories. The whole pipeline was hands-free, except for the very first step, where a human (me) has to click buttons in a browser like it’s 2005.

So I decided to cut myself out of the loop.

The secret door every bank leaves open

No Canadian bank offers a way for your own software to download your statements. There’s no official connection point. But here’s what I realized: every modern bank website is really just an app running inside your browser. When you click “download statement,” that app sends a request to a behind-the-scenes system, which sends back your data.

The bank built that system for their own website to use. But Bob can send the exact same requests. I’m not tricking anything or breaking in. I’m just knocking on the same door the bank’s own website knocks on, using the same key.

Think of it like a restaurant with no public phone number, but the waiters call the kitchen on a house phone to place orders. If you figure out the extension, you can call the kitchen yourself.

Why it’s harder than it sounds

Banks, understandably, don’t want robots logging in. They’ve built layers of protection.

First, there’s a bouncer at the door. Before you even see the login page, the website checks whether you’re a real person using a real browser. My first attempts just hung forever. The bank’s system saw a non-human visitor and simply refused to respond. Not an error, not a rejection. Just silence.

Then there’s the login itself. Card number on one page, password on another, then a verification code sent to my phone. Three separate steps, each depending on the last.

And finally, once you’re logged in, the website gets a digital pass that proves “this session is authenticated.” That pass is what you attach to every request. But the bank hides it in an unusual place. It’s not stored anywhere I could easily grab it. It lives only in the browser’s short-term memory, attached to requests as they fly out the door.

How I actually pulled it off

The approach has two parts.

First, I read the bank’s own website code. Every website ships its instruction manual right to your browser. It’s compressed and ugly, but it’s all there: every request the site makes, every address it talks to, every piece of data it expects. It’s like the restaurant printing its internal phone directory on the back of the menu in tiny font.

Second, I used a real browser (the only thing that gets past the bouncer) just long enough to log in. About thirty seconds. During that window, I watched the outgoing requests and grabbed the digital pass as it went by. Then I closed the browser and switched to a much faster, lighter approach for the actual downloads.

What surprised me

Every lesson came from something going wrong.

I locked myself out of my own bank account. Each failed login attempt counts against you. I was doing what any tinkerer would do: try something, see it fail, tweak, try again. Except each attempt required a new verification code sent to my phone, and after several abandoned attempts, the bank decided I was a fraud risk. Account locked. Lesson learned the hard way: you get very few shots at this. Build it right on paper first. Test once.

Sometimes there’s no login page. If a previous session left behind some data, the bank skips the login entirely and drops you straight onto the dashboard. My code expected the login page every time, so it just sat there, confused, staring at a dashboard it didn’t recognize. I had to build something smarter: look at where you actually landed, then figure out what to do next.

A tiny misunderstanding cost me way too long. One of my tools used a pattern-matching system that looked like one thing but worked like another. I wrote a pattern using rules I assumed applied, but the tool used different rules entirely. Everything just silently hung. No error, no clue.

What breaks and what doesn’t

Not everything is equally fragile. The behind-the-scenes system the bank uses? That almost never changes. Changing it would break their own mobile app. But the visual stuff, button labels, page layouts, the look and feel? That changes all the time.

So I built it so the fragile visual layer is thin and isolated. When the bank redesigns their login page, I update a few descriptions of what to click. The rest of the system doesn’t care.

What’s next

The first bank is fully automated. One command, and statements for the past seven years download themselves into the right folder, where my expense system picks them up automatically.

Two more banks to go. Same approach, different buttons to click.

The real discovery isn’t the script. It’s that this pattern works for almost any website that hides behind a login: healthcare portals, government services, school platforms. They all use the same architecture. Once you’ve figured out one, the next one is mostly just learning where the new buttons are.

Ten minutes a month isn’t much to save. But it’s one more thing I never have to think about again. And honestly? That’s the whole point.

TLDR： 每個月手動登入三家銀行下載對帳單，這件事終於讓我受夠了。所以我去研究銀行網站背後的系統，找到它跟自家前端溝通的方式，然後讓 Bob（我的 AI 助手）用同樣的方式去要資料。成功了。而且這個模式幾乎適用於任何藏在登入牆後面的網站。

那讓我受不了的十分鐘

每個月，同樣的流程。登入第一家銀行，找到對帳單頁面，按下載，存 PDF。第二家再來一次。第三家也是。全部加起來頂多十分鐘。

但問題在這裡：我已經建好一套系統，可以讀取那些 PDF 然後自動把支出分門別類。整條 Pipeline 都是全自動的，唯獨第一步，得有個人（就是我）像活在 2005 年一樣，打開瀏覽器點來點去。

換句話說，整條自動化鏈裡最脆弱的環節，是我自己。

所以我決定把自己從這個流程裡移除。

每家銀行都留了一扇門，只是不是留給你的

加拿大沒有任何一家銀行提供讓你的軟體直接下載對帳單的方式。沒有公開的 API，什麼都沒有。

但我意識到一件事：每個現代銀行網站，本質上就是一個跑在你瀏覽器裡的 App。當你按下「下載對帳單」，那個 App 會發一個 Request 到後端系統，後端再把資料丟回來。

銀行建這套後端是給自家前端用的。但 Bob 也可以發一模一樣的 Request。我不是在破解什麼東西，也不是在闖入。我只是敲了同一扇門，用的是同一把鑰匙。

想像一家餐廳沒有公開訂位電話，但服務生會用店內分機打到廚房點餐。如果你知道分機號碼，你也可以直接打。

為什麼這比想像中難

銀行不希望機器人來登入，這完全合理。所以它們堆了好幾層防護。

首先，門口有個門衛。你連登入頁面都還沒看到，網站就已經在檢查你是不是真人、用的是不是真正的瀏覽器。我最初幾次嘗試直接卡住。銀行系統偵測到非人類訪客，選擇了最不友善的回應方式：沉默。不是報錯，不是拒絕，就是不理你。

然後是登入本身。第一頁輸入卡號，第二頁輸入密碼，接著還要輸入手機收到的驗證碼。三個步驟，每一步都依賴前一步的結果。

最後，登入成功後，網站會拿到一個數位通行證，證明「這個 Session 已經驗證過了」。之後每個 Request 都要帶上這張通行證。但銀行把它藏在一個很刁鑽的地方，不是存在什麼我可以輕鬆讀取的位置，而是只存在瀏覽器的短期記憶體裡，在 Request 送出去的那個瞬間才被掛上去。

我怎麼做到的

方法分兩步。

第一步，我去讀了銀行自家的網站 Code。每個網站都會把原始碼直接送到你的瀏覽器裡。壓縮過、混淆過、醜到不行，但全部都在那裡：每個 Request 的目標位址、每筆預期的資料格式、每個 API endpoint。就像餐廳把內部電話簿印在菜單背面，字小到需要放大鏡，但印了就是印了。

第二步，我用了一個真正的瀏覽器（唯一能通過門衛檢查的東西），只開了大約三十秒，剛好夠完成登入。在那個窗口裡，我監聽送出去的 Request，趁通行證飛出去的瞬間把它攔截下來。然後關掉瀏覽器，改用更快、更輕量的方式去執行實際的下載。

讓我意外的事

每一個教訓都來自一次搞砸。

我把自己的銀行帳號鎖了。 每次登入失敗都會被記一筆。我做的事情跟任何愛折騰的工程師一樣：試一下，失敗，調整，再試。問題是每次嘗試都需要一組新的簡訊驗證碼，連續好幾次半途放棄之後，銀行判定我有詐騙風險。帳號鎖定。（是的，我被自己的自動化實驗鎖在自己的帳戶外面。）教訓很直白：你能試的次數非常有限。先在紙上把流程想透，測試的機會只有一次。

有時候根本沒有登入頁面。 如果前一次的 Session 資料還留在瀏覽器裡，銀行會直接跳過登入，把你丟到主頁面。我的程式每次都預期會看到登入頁，結果就傻傻地坐在那裡，盯著一個它不認識的畫面發呆。我得重新設計邏輯：先判斷你實際落在哪個頁面，再決定下一步該做什麼。

一個小小的誤解浪費了不少時間。 我用的一個工具有個 Pattern Matching 系統，看起來像是某種語法，實際上運作規則完全不同。我用我以為適用的規則寫了一個 Pattern，但工具用的是另一套邏輯。結果就是無聲無息地卡住。沒有報錯，沒有任何提示，就是不動。（沒有錯誤訊息的 Bug，永遠是最貴的那種。）

什麼會壞、什麼不會

不是所有東西都一樣脆弱。銀行背後的 API？幾乎不會變，動了的話它們自己的 Mobile App 也會跟著壞。但視覺層的東西，像是按鈕文字、頁面排版、整體外觀，隨時都在改版。

這個觀察決定了我的架構：把脆弱的視覺層做得極薄、完全隔離。銀行重新設計登入頁面的時候，我只需要更新幾行「在哪裡點什麼」的描述。系統其餘的部分完全不受影響。

換句話說，穩定的部分承擔核心邏輯，易變的部分只負責接觸面。跟軟體設計裡依賴反轉的道理一樣：依賴 Abstraction，不要依賴細節。

接下來

第一家銀行已經完全自動化。一行指令，過去七年的對帳單就會下載到正確的資料夾，然後記帳系統自動接手處理。

還有兩家銀行要處理。同樣的方法，不同的按鈕。

每個月省十分鐘，聽起來不值一提。但收穫的不是省下的時間，而是把一個手動流程拆解成可重複的模式。醫療入口網站、政府服務、學校平台，它們的底層架構都類似。搞懂一家網站的運作邏輯之後，下一個系統就只是套用同樣的框架。這才是自動化真正複利的地方。

The problem, specifically

I have a downstream pipeline that ingests bank statement PDFs and categorizes expenses automatically. The entire thing is hands-free except for the dumbest possible bottleneck: a human manually logging into three separate bank websites each month, navigating to statements, and clicking download.

No Canadian bank offers a public API for statement retrieval. No OAuth flows, no developer portal, nothing. But their SPAs are just HTTP clients with nice CSS. The browser dev tools show every endpoint, every header, every payload. The “API” exists — it’s just undocumented and intended for first-party use only.

The goal: automate statement downloads by replaying the same network calls the bank’s own frontend makes, across all three institutions, triggered by a single command.

Architecture and design decisions

The system has two distinct phases with very different tooling requirements.

Phase 1: Auth (browser-based)

Banks run bot detection before the login page even renders. My initial attempts with raw HTTP requests got silent hangs — no error response, no timeout, just nothing. The detection layer saw a non-browser client and dropped the connection.

So phase 1 uses a real headless browser. There’s no way around it. The bot detection checks for browser fingerprint signals that are impractical to fake reliably with HTTP clients alone. I accepted this constraint early rather than sinking time into a cat-and-mouse game with their detection stack.

The auth flow itself is multi-step: card number on page one, password on page two, then an SMS verification code. Three sequential form submissions, each gated on the previous response. The browser session handles cookie propagation and any client-side state transitions the SPA manages internally.

The key extraction problem. Once authenticated, the SPA attaches a bearer token to every API call. But the token isn’t stored in cookies or localStorage — it lives only in JS runtime memory and gets attached to outgoing requests via an interceptor. I couldn’t just read it from storage after login. Instead, I intercept outgoing network requests during the browser session and extract the auth header as it flies by. Once I have the token, the browser’s job is done.

Phase 2: Downloads (raw HTTP)

With the bearer token in hand, I close the browser entirely and switch to direct HTTP calls against the bank’s internal API. This is faster, lighter, and far more reliable than driving a browser through UI interactions for each statement download.

The API surface I needed was small: list available statements for an account, then download each as a PDF. Reverse-engineering these endpoints meant reading the SPA’s bundled JavaScript — minified and ugly, but structurally intact. Request paths, query parameters, expected headers — it’s all there if you’re willing to read it.

What I rejected

Fully browser-driven approach. I could have automated the entire flow — login through download — using the headless browser. This is the obvious path and what most people reach for. It’s the wrong approach because it couples everything to the DOM. Every button label, every CSS selector, every page transition becomes a dependency. Banks redesign their UIs constantly. The API endpoints behind those UIs almost never change (breaking their own mobile app would be a disaster). By minimizing the browser surface to just auth, I minimized the maintenance surface.

Credential replay without a browser. I tried to replicate the login flow with pure HTTP, constructing the same POST requests the browser sends. The bot detection layer made this a non-starter. You could probably defeat it with enough effort, but the ROI is terrible — you’d be maintaining a bot-detection evasion layer on top of everything else.

Tradeoffs explicitly called out

SMS OTP means this isn’t fully unattended. The auth flow requires a verification code sent to my phone. I have to be present for the ~30-second login window. This is the biggest limitation. For a monthly task, it’s acceptable. For daily automation, it would be a dealbreaker. There’s no clean way around this without compromising account security, and I’m not willing to do that.

Token lifetime is unknown and untested. I extract the token and use it immediately. I haven’t mapped token expiry behavior. If I ever need to batch downloads across a long time window, this could bite me. For now, the entire download phase completes in seconds, so it hasn’t mattered.

Fragile auth layer, stable API layer. This is a deliberate architectural bet. The browser-driven auth phase will break when the bank redesigns their login page. But the fix is updating a few selectors and maybe adjusting a navigation flow. The HTTP-based download phase should survive UI redesigns untouched. I’m trading occasional small maintenance for structural stability.

Legal gray area. I’m accessing my own data using my own credentials through endpoints the bank’s own software exposes to my browser. I’m not circumventing access controls — I’m using the same ones. But there’s no explicit permission for this use, and terms of service are vague. My read is that this is fine for personal use, but I wouldn’t build a product on it.

Implementation details that matter

Bot detection: the silent failure mode

The most disorienting debugging experience was the bot detection. No HTTP error code. No rejection message. The connection just hangs indefinitely. If you’re not expecting this, you’ll waste hours checking your network config, your TLS settings, your headers — everything except the actual problem, which is that the server decided you’re not a browser and chose to ghost you.

The fix was straightforward: use a real browser. But recognizing the problem took longer than solving it.

Session state non-determinism

If a previous session left cookies or local storage artifacts, the bank skips the login flow entirely and drops you on the dashboard. My initial code assumed it would always land on the login page. When it didn’t, the automation stalled — waiting for a login form that wasn’t there.

The fix: check the actual page state after navigation rather than assuming a fixed flow. If we’re already authenticated, skip to token extraction. If we’re on login, proceed with auth. This is obvious in retrospect but easy to miss when you’re building against the happy path.

Account lockout: the expensive lesson

Each failed login attempt counts toward a lockout threshold. During development, I was doing what felt natural: try, fail, tweak, retry. But each attempt also triggers an SMS code, and after several abandoned attempts (where I received the code but the automation failed before submitting it), the bank flagged the account as compromised and locked it.

This is the kind of mistake you only make once. The lesson: you get very few shots at the auth flow. Map it completely from network logs before writing any automation code. Test against the real system only when you’re confident it’ll work end-to-end. Treat each login attempt as expensive.

Pattern matching gotcha

One of the tools I used for request interception had a pattern-matching API that looked like it accepted standard glob syntax but actually used a different matching grammar entirely. My patterns matched nothing, and the tool didn’t warn — it just silently intercepted zero requests. Hours of debugging a “hang” that was actually a filter returning no results.

No broader lesson here except the usual: read the actual docs for the actual matching semantics, don’t assume from the API shape.

Results with numbers

Bank one: fully automated. A single command downloads all available statements (roughly seven years of history) into the directory where the expense pipeline picks them up. The browser phase takes about 30 seconds (mostly waiting for the SMS code). The download phase completes in a few seconds for a typical monthly run.

Maintenance so far: zero. The bank hasn’t redesigned their login page since I built this. When they do, I expect a 15-30 minute fix — updating selectors in the auth phase. The API-based download phase should require no changes.

What didn’t work: scaling my time. I built this for one bank. Two more remain. The architecture is identical, but the specifics differ — different login flows, different API shapes, different token storage mechanisms. Each bank is essentially a new reverse-engineering exercise. The framework helps, but the upfront investigation is the real time cost.

What’s next

Two more banks. Same pattern: reverse-engineer the SPA, identify the auth flow, find the token, map the statement API. The approach is proven; the work is just institution-specific.

The general pattern. This technique works for any SPA behind authentication: healthcare portals, government services, school platforms. They all have the same architecture — a JS frontend talking to a REST-ish backend. The specifics change, but the methodology doesn’t: browser for auth, HTTP for data, minimize the surface that touches the DOM.

Open problem: OTP automation. The SMS verification code is the remaining manual step. Solutions exist (SIM-based SMS forwarding, TOTP where supported) but each introduces its own security and reliability tradeoffs. For a monthly task, the 30-second human-in-the-loop is the pragmatic choice. For now.

Have you ever watched a waiter in a restaurant walk through a door into the kitchen? You can’t see what happens back there. But you know they’re telling the chef what you ordered, and a few minutes later, your food shows up.

A waiter walks through swinging doors from a colorful dining room into a busy kitchen, with a curious child watching from a table.

Websites work the same way.

The boring chore nobody likes

Imagine you had to do the same homework three times every month. Not different homework. The exact same thing. Log into a website. Click a few buttons. Save a file. Then do it again on a different website. And again on a third one.

That’s what one person had to do with their bank websites. Every single month. It only took about ten minutes, but here’s the thing: their computer already knew what to do with those files once they were downloaded. The only part that needed a human was the clicking.

So they thought: what if my computer could do the clicking too?

The kitchen behind the website

Here’s something wild. When you use a website, you see the pretty part. Buttons, pictures, colors. But behind all of that, your browser is having a secret conversation with another computer far away.

A child at a computer sends invisible whispered messages through the air to a distant server on a hilltop, which whispers back.

Every time you click a button, your browser whispers a message to that faraway computer. Something like: “Hey, can I see the October statement?” And the faraway computer whispers back: “Sure, here it is.”

You never see these whispers. But they’re happening constantly.

And here’s the cool part: you can learn to send those same whispers yourself.

It’s like figuring out the kitchen phone number at that restaurant. The waiters use it to place your order. But if you knew the number, you could call the kitchen directly.

The guards at the door

Banks really, really don’t want a robot pretending to be you. So they set up guards.

The first guard checks if you’re a real person using a real browser. If you’re not? It doesn’t yell at you or say “go away.” It just… ignores you. Like knocking on a door and nobody answering. Ever.

Three expressive guards stand before a vault door, each holding a different security item: a magnifying glass, a ring of keys, and a glowing wristband.

The second guard is the login itself. You type your card number on one page, your password on another, and then you get a secret code sent to your phone. Three locks on one door.

And the third guard? Once you’re in, the website gives your browser a special invisible pass. Like a wristband at a fair that says “this person already paid.” Every time your browser asks for something, it flashes that wristband. Without it, you get nothing.

Figuring it out (and messing up)

Here’s where it gets fun. Every website actually sends its own instruction manual to your browser. It’s squished and messy, but it’s all there. Every secret message, every address, every whisper. It’s like a restaurant accidentally printing its kitchen phone number on the back of the menu in really, really tiny letters.

So this person read the instructions. They figured out the whispers. They used a real browser just long enough to get past the guards and grab that invisible wristband. Then they switched to something much faster to do the actual downloading.

But things went wrong. A lot.

They tried logging in too many times while testing, and the bank thought someone was trying to break in. Account locked. Oops. Imagine getting locked out of your own locker because you jiggled the handle too many times.

Another time, the website skipped the login page entirely because it remembered them from before. Their program just sat there, confused, staring at a page it didn’t expect. Like walking into school ready for math class and finding out it’s actually a pizza party. Great for you, confusing for a computer.

The part that almost never breaks

Here’s something interesting. The pretty part of a website, the buttons and colors, changes all the time. Banks love redesigning things. But the secret whispers behind the scenes? Those almost never change. If the bank changed those, their own phone app would stop working too.

So the clever move was to keep the part that deals with buttons small and easy to swap out. When the website gets a new look, you just update which buttons to click. Everything else stays the same.

What this really means

One bank is fully working now. One command, and years of files download themselves. Two more banks to go.

But the real discovery is bigger than bank statements. Almost every website that makes you log in works this way. Health websites. School portals. Government pages. They’re all restaurants with kitchens, waiters, and secret phone numbers.

Once you learn how one kitchen works, the next one is just figuring out where they put the phone.

So here’s something to think about: the next time you click a button on a website, your browser is whispering a secret message you can’t see. What do you think it’s saying?