Keeping track of finances from many places

Multi point access to one plain text ledger via hledger{,-web} and git.
Published on Fri August 13, 2021 with tags: git, wireguard, finances, linux.
Updated on Fri August 13, 2021.

It’s almost time to start attending University, which means I have to move. Consequently, my economic responsibilities are growing, and so is the complexity of my personal finances and banking. I need a solution to keep things organized.

I had used hledger in the past, a Plain Text Accounting tool written in Haskell, to track my personal finances, though that came to a stop due to the pandemic. My solution was also quite underdeveloped and inconvenient, so that needed to change.

Disclaimer

I am not an accountant, and I do not have much experience in finances. I’m just a guy with many devices, who works mostly with cash, and who is often on the move.

Demo

Video recorded at 1600x900. There is no audio. The demo shows adding a transaction using git and the web interface each, and fulfilling a transaction over the git interface.

NOTE: The demo alternates between slashes and dashes often. This is due to the version of hledger used for some of the entries in the ledger being out of date. In newer versions dashes are preferred.

Outline of the issue

The problem is simple: synchronising a single plaintext ledger between several devices:

an Android phone,
two GNU/Linux PCs, and
a GNU/Linux laptop.

I also want to

keep track of finances in git, for a more in-depth history,
eliminate as any non-linear branches in Git history, and
keep this data on a private server.

The goal is pretty simple, but met with two large issues:

GHC and programs compiled with it do not run well on Android, and
Entering transactions in the slightly rigid format ledger uses is hard on an on-screen keyboard.

When you add git into the mix, the problem complicates further. I want to integrate a graphical entry method into my preferred VCS + text approach.

Meet hledger-web: a nice, responsive web UI that works across platforms. Unfortunately, it does not at all integrate with git. Adding this network-accessible component also means upping the security requirement.

Solution

I started by creating a blank ledger in a git repository:

~$ mkdir .ledger
~$ cd .ledger
~/.ledger$ git init
~/.ledger$ ln -s 2021.txt ledger
~/.ledger$ >2021.txt echo "# vim: ft=ledger sw=4 et :
# 2021 ledger"

To complement this, I added export LEDGER_FILE="$HOME/.ledger/ledger" to my ~/.profile.

I attached a README and came up with a “commit discipline”: commit after each transaction, and ensure that only one transaction is edited in any single commit (for instance, if editing an old transaction to mark it as complete). This is enough for the first commit.

Git repository setup

The Git Book has a chapter on setting up git for use over SSH, so I won’t be covering that here.

My idea was for each possible contributor, including hledger-web, to have its own clone of the repository, same as a normal Git project. For this to work, I’d have to notify the hledger-web tree when origin updates. Luckily, these are on the same machine, so a very simple git hook should do the job:

~/ledger.git$ cat hooks/post-update
#!/bin/sh
# SPDX-License-Identifier: BSD-3-Clause
# This hook is to be placed into the ledger bare upstream repository. It
# notifies the downstream hledger-web repository upon a push happening, so that
# it can pull appropriately.
unset GIT_DIR  # set to "." in a bare repo like this one
git -C /var/lib/ledger/ledger pull --ff-only

Under high loads a setup like this would easily break but, thankfully, I’m not a high load on my ledger.

Now that we have a separate copy (crucially, with a working tree) on the server, we can integrate hledger-web into the setup.

Automating hledger-web

Hooks are a good way to integrate automation capabilities into existing software, if done right. Sadly, though, hledger-web does not have any hooking support yet.

This means we will need to rely on some other method of deciding when we check our ledger for updates. We could do this periodically, with a cron job that compares whether our copy of the ledger is a superset of the upstream, and if it is, commits, but that has the issue of there being a possibly large gap in which new transactions would cause conflicts. Realistically this is not an issue, due to transactions being a relatively infrequent thing.

I didn’t take this approach despite it probably being adequate, it theoretically isn’t as reliable and simply isn’t as fun as the other approach: inotify.

This approach consists of a watcher parent process, which forks to run and manage a hledger-web child. The obvious choice of language for this task is Python, since it contains relatively advanced process manipulation tools: performance shouldn’t be an issue either. Shell isn’t flexible enough for this, and C is too flexible, making this more difficult than it needs to be. This is the resulting code:

#!/usr/bin/env python3
# SPDX-License-Identifier: BSD-3-clause
import attr
import os.path as path
import re
import shutil
import signal
from subprocess import Popen, check_call, check_output

import pyinotify

DESCRIPTION_SEARCH = re.compile(r"""
^\+        # Lines starting with +
\d{4}[-/.] # Year (4 digits and a dash or slash)
\d{2}[-/.] # Month (2 digits and a dash or slash)
\d{2}\s+   # Day of month (2 digits)
[*!]?      # Status of the transaction
(.+)$      # Description
""", re.M | re.X)


def commit_change():
    realledger = path.realpath("ledger")
    shutil.copy("ledger.tmp", realledger)

    gs = check_output(["git", "status", "--porcelain", "--", realledger],
                      text=True)
    if not gs.startswith(" M "):
        # A git pull also triggers this event, but naturally, after a pull we
        # have nothing to commit. To prevent this error from screwing us up,
        # just abort here.
        return

    msg = "Commit modifications over hledger-web"
    diff = check_output(["git", "diff", "--", realledger], text=True)
    matched = DESCRIPTION_SEARCH.search(diff)

    # TODO: maybe figure out what to do with other matches, if somehow there
    # are any.
    if matched:
        msg = f"Add transaction: {matched.group(1).strip()}"

    check_call(["git", "commit", "-m", msg, realledger])
    check_call(["git", "push"])


@attr.s
class InotifyHandler(pyinotify.ProcessEvent):
    childproc = attr.ib()

    def process_IN_CLOSE_WRITE(self, ev):
        self.childproc.send_signal(signal.SIGSTOP)
        try:
            commit_change()
        finally:
            self.childproc.send_signal(signal.SIGCONT)


def main():
    wm = pyinotify.WatchManager()
    shutil.copy2("ledger", "ledger.tmp")
    with Popen(["hledger-web", "--serve", "-f", "ledger.tmp",
                "--base-url=http://ledger.arsen.local",
                "--host=127.0.0.1", "--port=6714"]) as web:
        try:
            notifier = pyinotify.Notifier(wm, InotifyHandler(web))
            wm.add_watch("ledger.tmp", pyinotify.IN_CLOSE_WRITE)
            notifier.loop()
        finally:
            web.send_signal(signal.SIGCONT)  # just in case
            web.terminate()


if __name__ == "__main__":
    main()

The script is somewhat lengthy but ultimately very simple. It operates on the copy of the current ledger, since it predates the inotify idea, and I’m not sure how hledger-web operates on symlinks.

The script watches a copy of the ledger for changes and uses those events to update the upstream git repository. For this duration of time, it pauses the hledger-web process, to give the user feedback and prevent further API requests until completion.

However, there’s an issue, what if someone else updates the upstream? This repository pulls, but the copy of the ledger isn’t updated, and a conflict happens. For this purpose, we use a second hook:

$ cat .git/hooks/post-merge
#!/bin/sh
# SPDX-License-Identifier: BSD-3-Clause
# This hook is to be placed into the hledger-web repository. It runs after a
# merge to replace the ledger that hledger-web reads and writes.
cp --dereference ledger ledger.tmp

Since hledger-web re-reads the ledger properly after change, this works well.

NOTE: if you intend to use this outside GNU/Linux, look into watchdog.

Supervising it

To finish this off, I added meaningful user data to the ledger repository:

.../ledger$ git config user.name 'hledger-web automation'
.../ledger$ git config user.email automation@aarsen.me

Then I let the git user linger, so that it can run services at boot:

$ sudo loginctl enable-linger git

And wrote a systemd --user unit for the service:

# ~/.config/systemd/user/hledger.service
[Unit]
Description=hledger-web

[Service]
ExecStart=.../ledger/startweb
WorkingDirectory=.../ledger/

[Install]
WantedBy=default.target

Now we have a web service serving on localhost that suits our original requirements.

Accessing it, securely

There are two ways I considered for accessing this service: HTTP authentication over TLS, which should be sufficiently safe, or WireGuard. I am not a big fan of password authentication, especially when I intend to use the interface directly, so I opted for WireGuard instead. It provides me an additional layer of security and key authentication¹. I use nginx as a reverse proxy in order to not run anything else as a privileged process.

WireGuard setup

I decided to create a VPN in an IPv6 ULA, and gave my server the first usable address on that network, my phone the second one, etc. for the other devices. This network is set up in a star, and some peers do periodic keepalive due to being mobile.

WireGuard configs are pretty simple and symmetrical, and setup is very easy, but I’ll go over it regardless:

# /etc/wireguard/wg-ledger.conf
[Interface]
PrivateKey = wF9tWW3k5+zbd8BnvpJuzzjAhcGPngObrpoyirXTEGc=
Address = fd98:16d7:04c5::1/64
ListenPort = 29918
# allow IP forwarding on this vpn
# this lets this node act as a router for packets received on it
PostUp = sysctl -w net.ipv4.conf.%i.forwarding=1

[Peer]
PublicKey = KGRXTksw1F6M4vvuVBGQ6LN8u9pPUhBIHyBJ5WYCsxg=
AllowedIPs = fd98:16d7:04c5::2/128
PresharedKey = L60URv00ypz1ZSeaIFIuarEiqWJGEK63T5gLBrcyGzk=

This config is supposed to be on the server. It sets up a single peer, but it can be expanded further by just adding more Peer blocks

You can conveniently generate keypairs with wg genkey | tee >(echo "pub: $(wg pubkey)"), and preshared keys with wg genpsk. The preshared keys are shared between every pair of peers, in this case, each device only gets one PSK since it only ever connects to the server. You can enable this config via systemctl enable --now wg-quick@wg-ledger on systemd-based systems.

Then on each client², add a config that looks like this:

# /etc/wireguard/wg-ledger.conf
[Interface]
PrivateKey = CDhsQyEVqlcze2gvcsDVUT+AUc3UcS0CvaAb2jgNDXE=
Address = fd98:16d7:04c5::2/64

[Peer]
PublicKey = K5yDLzl78oReElWQO7CcDntswy79aVMCWJQh6RGP+XA=
Endpoint = yourserver.you:29918
AllowedIPs = fd98:16d7:04c5::1/64
PresharedKey = L60URv00ypz1ZSeaIFIuarEiqWJGEK63T5gLBrcyGzk=

Of course, bump IPs as you go. These files don’t need to specify all peers, just the one server, whose AllowedIPs field specifies the whole network.

nginx setup

The role of nginx in this setup is to act as a reverse proxy, so that we don’t have to give any capabilities or special privileges to the hledger-web process. We want nginx to serve only over WireGuard too. We can achieve this pretty easily, with a few lines of configuration:

server {
    listen [fd98:16d7:04c5::1]:80;
    server_name ledger.arsen.local;

    location / {
        # port defined in the python script
        proxy_pass http://127.0.0.1:15196/;
    }
}

Client setup

We’re almost done! I just need to add a hosts entry on all clients, since I don’t have a resolver for my .local zone:

fd98:16d7:04c5::1    ledger.arsen.local

On GNU/Linux, this is just /etc/hosts, on Android, I did that via an AdAway redirect rule, which is really just a hosts entry.

You made it!

We now have a convenient way to enter transactions into our ledger on the go, using FOSS tools and private infrastructure! Enjoy! Don’t overspend, it’s all on the record now. :P

To be able to trust IPs, you need to enable rp_filter on the kernel, otherwise people in the same L2 network could spoof the IPs and bypass authentication. See ip-sysctl.txt. See also this issue, which deals with the same issue in a different tool.↩︎
WireGuard has no concept of clients, but in this case each peer of the original server can be considered a client.↩︎