Adventures in Self-hosting: The Perfect Storm of AdGuard Home, Cloudron, Lets Encrypt, and Android

💡
Update 5/18/22: After posting over on Cloudron's forums, I got a very quick response from girish, one of Cloudron's founders, who came up with a VERY quick (like, less than a week turnaround) solution, basically just by simply dynamically replacing the bad cert with an empty string if it exists, and it immediately worked! Even though it was not Cloudron's problem to solve, they did anyway! :)

A bit of background on this project

A while ago, I decided to switch from using a Pi-Hole to using AdGuard Home as a DNS sinkhole.  I might have a post all about DNS sinkholes coming up soon, but for now, check out Wikipedia's article.

For me, it is more convenient and effective to have a remote DNS sinkhole for blocking malware and tracker domains, since I can use the same instance for blocking outgoing traffic from my home network, as well as my mobile devices while on the go.  Previously, with a Pi-Hole inside my home network, I did not have the protection I wanted for my mobile devices.

I, being me, wanted to make this solution self-hosted, so I would know for sure that the server was secure.  I also wanted it to be easy to set up and maintain.  For these reasons, I chose to install AdGuard Home running in an instance of Cloudron.

I could write an entire article on Cloudron, but suffice it to say that it is a platform that supports running multiple self-hosted services and apps easily and securely with relatively little fuss.  Each application is fully sandboxed in its own read-only virtual container using Docker (another topic that could be its own article).  This keeps them separate, secure, and much less likely to allow a potential breach to leak out to other applications or the host server itself.

Android and Let's Encrypt certificate issues

I installed my instance of AdGuard Home on a VPS running Cloudron, added my favorite blocklists (Steven Black hosts, Peter Lowe's List, and Fanboy's Anti-Facebook, to name but a few), and added my own custom filtering rules that I have gathered over the years of running a Pi-Hole.

Since I already have a few TLS (transport layer security) certificates issued to me by LetsEncrypt.org, I decided to use one of those to give me the added security of DNS over TLS (DoT) for my Android devices and DNS over HTTPS (DoH) for my desktop browsers.

Everything went well for my plain DNS (Apple) devices and DoH devices, but somehow I kept running into an issue with my Android devices...  Since Version 9, Android has supported DoT to a server of your choice, but with one caveat - it has to be a hostname, not an IP.  This is great if you want to use Cloudflare, Google DNS, OpenDNS, or another public provider (if you're into that sort of thing), but requires you to have a domain if you are self-hosting.

When I would point the Private DNS setting to my AdGuard Home server, I would get a "Couldn't Connect" error, followed by "no internet access." notifications for both WiFi and Mobile data connections.

A temporary solution

After a bit of hunting around, I found this article on icarus.sg, which is a great read.  It outlines the cause of the problem - in short, it is down to an expired root certificate in certs issued by Let's Encrypt.  While most devices and services still recognize Let's Encrypt certificates as valid, giving the cert a green light at the first valid root certificate encountered, Android is an exception, requiring ALL certs in the chain to be valid.

I decided not to immediately follow the ultimate solution that Will suggested (renewing the certificate using only the valid root certificate), as I have Cloudron managing the certificate renewal, and I don't want to mess with that right now...  So what I did do was try a workaround by simply removing the expired cert from the chain, leaving the other 2 valid certificates (the one issued to me, and Let's Encrypt's current "ISRG Root X1" root certificate).

Everything went well when I pasted the modified certificate text into AdGuard Home's encryption settings tab, and I was able to connect my Android devices.

Hooray!

Everything went fine until....

This worked for a grand total of 2 days.

At one point, I had to remote in and reboot the server that AdGuard Home lives on due to an issue with another application, and suddenly my Android devices could no longer connect.  I had to disable Private DNS on my phone in order to troubleshoot the issue remotely.

It turns out that AdGuard Home did not save my modified certificates.  I am not sure whether or not this is because AdGuard Home runs on a read-only filesystem in its Docker container - It does save my other settings, so there is clearly some preservation of settings from the application layer to the disk image between reboots, but for some reason this was not saving my certificates.

Digging into the configuration

When I got home, I logged into my server and started poking around.

First off, I had to grab the path to the AdGuard Home instance from the Cloudron GUI.  Each app in Cloudron is given a universal unique identifier (UUID) that serves as an App ID - something like 3fe6cf4a-d109-56d9-af6c-2f5976db6bdd.  Every app then has a 'data' folder within that app's home path, which contains the application's own configuration and data.

Within the /home/yellowtent/appsdata/{appid}/data/ folder, I was able to track down the configuration file, 'AdGuardHome.yaml'.  Yellowtent was the working title for Cloudron, by the way - and it was left behind as part of the file structure - Neat!

I could see where I had pasted the certificates into the encryption settings page while it was running.  I thought that perhaps this file gets rebuilt from scratch on every reboot, although, again, it does seem to preserve all of my other settings including general settings, blocklists, upstream DNS server configuration, and even some of the other encryption settings...  For some reason though, manually-pasted certs are not preserved between reboots, and it always replaces the certificate configuration with a path to where the default certificate file is in the filesystem:  /etc/certs/tls_cert.pem.  That could be a bug in AdGuard Home, but just as likely is that they might purposefully strip this info out between reboots for security reasons.  I can understand that, and it makes sense for the private key, but why do it for the public certificates?

Finding the template (or not)

My next thought was that there might be a template of the configuration file that was used to create the the config file initially on install, and then rebuild parts of that file on reboots.  Could I modify this template?  First I had to find it...

sudo find / -name 'AdGuardHome.yaml*'

A bit here about the way docker containers work - again, this deserves its own article, but for the sake of brevity - when a docker container is started, the Overlay filesystem driver copies the read-only container image into working layers where the app's writable filesystem lives while the app is running.  These layers are given unique id's, resulting in the long folder names pictured above.

All of the instances of 'AdGuardHome.yaml.template' I could find were in the upper container layers, and not in the actual image itself, therefore any changes to any of these would likely not be preserved on reboot either.

The yaml file stored in the 5th path listed  might have been what I was looking for even though it didn't appear to be a template file - I probably could have dug a little deeper to find it, but I was beginning to realize that I did not want to add the entire contents of the certificates to the template anyway, for 2 reasons:

  1. Any future update to the AdGuard Home application might overwrite the template and break the change
  2. I wanted to fix this as close to the root of the problem as I can get (short of renewing the certs manually, as mentioned above).

Replacing the cert file

The next idea was to just replace the 'tls_cert.pem' file used by the app with the modified cert.  So began the next quest my searching for that filename...

find -name "tls_cert.pem*"

Wow...  OK.  So that happened.  After a cursory look at a few of these, I noticed that they are all test or dummy files that are part of Haraka, which is an SMTP (Simple Mail Transfer Protocol) server included with a of apps on the system.

Dead end.  Anyway, searching for 'tls_cert.pem' turned out to be a red herring anyway.  You'll see why later on.

Next, I wanted to see if I could just directly modify the tls_cert.pem file in the AdGuard Home image.  You are able to start a terminal session on a running app instance using the Cloudron GUI, but since the filesystem is read-only, any changes here won't be allowed.  I also knew that you can start an app in recovery mode, so I thought I'd try that, on the off chance that the filesystem might be mounted as read-write in recovery mode.  Unfortunately the filesystem was still read-only, even to the Root user.

Another idea was to see if I could modify the way the app configures the container, or at least gain some understanding into how it is configured.  After a bit more research, I happened to spot what appeared to be a reference to the 'tls_cert.pem' file, and that turned out to be in the script that Cloudron uses to configure the container for AdGuard Home - namely docker.js.

sudo grep -R "tls_cert.pem"

That script turned out to contain the following code:

        mounts.push({
            Target: '/etc/certs/tls_cert.pem',
            Source: bundle.certFilePath,
            Type: 'bind',
            ReadOnly: true
        });

        mounts.push({
            Target: '/etc/certs/tls_key.pem',
            Source: bundle.keyFilePath,
            Type: 'bind',
            ReadOnly: true

So clearly, this script binds the cert and key files from the "certFilePath"  variable, presumably in the host operating system, to the path where it is found in the container's filesystem.  But where does "certFilePath" point to?  The answer wasn't in docker.js, but must lie in another file that is linked in or executed first.  Not being familar with this code, I used grep to look for where it might be declared in some other script or code file:

grep -rE "certFilePath=|cergFilePath ="

This shows that the nginx (web server and reverse proxy) cert path is used, meaning this is likely pointing directly to the main certs for my site all along, in /etc/nginx/cert/.  It was simply linking back to those with a different name!  All along I thought that Cloudron had created separate copies of the certs to use within each application, but that is not the case.  This is why the filename of 'tls_cert.pem' was a red herring when I was searching within the host filesystem...

Mystery solved!

All I had to do now was backup my existing cert file, and remove the 3rd passage pertaining to the expired cert, after which I restarted the AdGuard Home app.

Finding that it still did not work with Android, I validated that the modified cert file was indeed coming through into the container as 'tls_cert.pem' and that it had only the first 2 passages, then rebooted the container again from the Cloudron GUI.

Finally, it worked!

My Android devices could now connect successfully with Private DNS, and the configuration survives a reboot.

This fix is still temporary, and will likely be wiped out once the certificates are renewed, but now I have time to test out renewal with only the valid root cert.

Even through this little adventure, I am still very pleased with my instance of AdGuard Home, and using Cloudron, it's still less work than rolling my own install from scratch.

I hope this experience can be useful to anyone else who might encounter this little issue...

Until next time, happy self-hosting!

lxwagno

lxwagno

lxwagno is a data analyst and programmer, and is a privacy and cybersecurity enthusiast with more nerdy hobbies than one should ever be burdened with.
USA