The challenge
When I set out to create this blog, I wanted something quick to throw together, and easy to deploy. I didn't want to spend too much time getting it up and running, for fear that the whim would pass, and I would never do it.
I should explain here too that I firmly believe that websites today gather too much information about their visitors. Thus my second criterion was the ability to do just the opposite. No email subscriptions or comments, no cookies, no trackers, and no IP logging. So naturally, I didn't need any of that functionality to begin with. Leaving all of that out would make for better privacy, with the added benefit of leaner code and faster load times.
That brings me to my third criterion - it was to be simple and lightweight. One of my biggest pet peeves is code libraries included just for the sake of flexibility that won't get used. I feel that code should be designed for the purpose it is intended for, and not for future functionality that probably will never be needed. Although today's beefy CPUs and speedy internet connections make the overhead of transmission of bulky code trivial, it can still add up, especially for a beginner with a server in the $5/month tier.
I knew given my first criteria, namely quick and easy, that I was going to have to sacrifice some leanness for convenience, by leveraging an existing content management system (CMS) platform. But the second criterion, my guests' privacy, was non-negotiable.
After looking over a few options, I narrowed my choices down to two platforms: WordPress and Ghost. Both are open-source and pretty well trusted by professionals. I decided against WordPress, because it includes a lot of features I don't need. Ghost seems to be a little more streamlined, or at least offer the opportunity to turn off much of the fluff.
Having chosen my CMS, I settled in to choose a starting theme. Again, looking for one without a bunch of junk I didn't need. I chose a theme that appeared to meet my criteria (Alto). Alto is simple and clean in appearance, but includes all of the basic blogging features as well as looking decent on any device. Most of the other themes I looked at had sidebars and footers begging for email subscriptions, links to Facebook, Twitter, and Instagram, or other junk I didn't want or need. I didn't see as much of this in Alto, with the only exception of socials in the footer, but this should be easy enough to remove...
The bliss of naïveté
So after spinning up a virtual private server (VPS), hardening it for security, and setting up DNS records with my registrar, I installed all the necessary prerequisites, and then installed Ghost and downloaded Alto.
I began publishing a few test posts to try out the functionality of Ghost and work out a format. I was pretty happy with what I got, and since it was self-hosted and I didn't add any junk to needlessly collect information from my guests, I was pretty confident in the privacy I would be able to afford as well, and started on my privacy policy. Things were looking great! Well... Not quite.
Here is where I admit that I didn't really look very closely at Alto's code before settling on it... Luckily I did actually dig through it before the site went live. Here is what I found.
Sneaky fonts
Right out of the gate when I looked the default page in Alto, I spotted Google Fonts included, by default, and linked, not local. By Default. There appears to be no way to switch this off in the blog settings (not that I was able to find anyway). I started to realize this might not be as easy as I had hoped.
Here's the problem... While Google Fonts makes it easy for developers to get great fonts for free, which is awesome. What isn't awesome, and what many developers don't consider or in many cases don't even know - is that, by linking this in, they are automatically sending their users to Google for assets, which Google logs, including the IP address of the visitor. What is really bad about this is twofold:
- This happens without the visitor's consent, and in fact, by nature of the "preconnect" relationship in the link, this fetch happens before the page even loads. Which means that, even if the first page viewed is a site's privacy policy and the user intends to opt out, their approximate location is already sent to Google before they can even read the page. Yikes.
- At least as of this writing, I can find no mention of this in Ghost's documentation for the Alto theme, nor in the theme's readme file. This means that most people who want to stand up a new blog quickly, and choose Ghost, might be giving away their users private information without even knowing it. There is a brief mention of Google Fonts and how to host them locally within the documentation on iveel.co, but since Iveel was aquired by Ghost, most people are not going to know this or think to look there for additional documentation...
What this means is that if a new blogger using Alto wants to remove these links, they have to at least be able to read HTML, and also know how to create backups of the theme in case they break something. All of this seems a bit antithetical to the point of choosing an easy-to-use CMS like Ghost, namely getting up and running with as little fuss as possible, but I digress.
I contacted Ghost about this, and I will post an update once I recieve a response.
The Social Club
Having found this and realizing the implications of it left a sour taste in my mouth, and so I resolved to scour the code for any additional bits that I did not want to be a part of my site. I already knew that there were social media links, so I decided to start there.
Including links to social media sites is not by default bad for privacy, especially if you don't actually utilize them, (and your visitors don't have to click on them, either), but I had no plan to use them. There is a toggle in the blog settings to turn them off, but I wanted them off permanently.
I could have just commented these links out and called it a day, but I decided to go a little further. In my desire to not pass code that I didn't need across to people's browsers, I resolved to delete any mention of social media entirely from the HTML files. It turned out to be a tad more involved than that.
I actually found code related to social media and sharing all over the place in Alto. Again, most of this is benign if not actually executed, but by now you probably get my aim. Here is the complete list of files where I found and removed code related to social media sites, including Facebook, Twitter, and Instagram, and/or Reddit, Pinterest, Linkedin, Tumbler, Telegram, and VK:
..alto/assets/built/screen.css
..alto/assets/built/screen.css.map
..alto/assets/css/blog/share.css
..alto/assets/css/vendor/mdi.css
..alto/assets/fonts/Alto.svg
..alto/assets/fonts/selection.json
..alto/partials/author.hbs
..alto/partials/header.hbs
..alto/partials/footer.hbs
..alto/partials/share.hbs
You're gonna want to embed some videos, right?
Well, no, in fact, but thanks. Can we get rid of that, too?
Again, once an embedded asset loads from its source, the content provider (Youtube, Vimeo, or, oddly, Kickstarter in this theme's instance) logs the user's IP address. So I had no plans to embed videos. Might as well remove that code as well. As usual, it wasn't enough just to comment out code designed to interact with those sites, but remove it entirely, as well as any supporting libraries I didn't need. This included FitVids, a plugin designed to handle the sizing and aspect ratio of embedded videos. Again, not inherently malicious, but I didn't need it.
..alto/assets/js/lib/jquery.fitvids.js
There was also code in the main theme files referencing that library, from which I removed all references and function calls:
assets/built/main.min.js
assets/built/main.min.js.map
assets/js/main.js
Getting updates on my content
Since I wasn't going to support email subscriptions, I at least wanted visitors to be able to subscribe using RSS. This however, entailed removing the default code that makes users subscribe through Feedly.com (again, they don't give an easy way to do that in settings). The link just shows up as "RSS", misleading users who don't happen to check the link before they click on it.
Most people don't think of Feedly as being unfriendly with privacy, however you cannot even read their Privacy Policy without creating an account, since it is embedded under their Account section (feedly.com/i/account/privacy). In order to even view their main site to sign up, you have to enable scripting and cookies. Below is a list of the domains that run scripts when Feedly.com is loaded:
- calendly.com
- feedly.com
- google-analytics.com
- googlesyndication.com
- googletagmanager.com
- sentry.io
- stripe.com
- twitter.com
Sooooo... Not the best, huh? I replaced this with a static page letting you know you can subscribe with a reader of your choice, and providing the link you need to copy/paste into your reader.
Do you really need to put me in your pocket?
Then, I moved on to another thing I had noticed... Each blog post listed on the home page showed two links underneath them: > Read Now and > Read Later.
The Read Now link takes the visitor to the post. Great! As it should! So far, so good. The Read Later link, however, is another story. It takes the visitor to getpocket.com <sigh...>
At least Pocket is owned by Mozilla, and their privacy policy is certainly one of the least radioactive I have seen from social bookmarking services. Yet again, though - I don't need this, and I'd rather not offer tracking, no matter how benign in intent, to my guests.
jsDelivr (added 7/1/24)
Recently, after an update from Ghost 4 to Ghost 5, I noticed, when I visited any page on my site, that there were scripts being loaded from cdn.jsdelivr.net. I don't really associate jsDelivr with anything nefarious, but still, I did need to perform some due diligence to ensure that 1) my readers don't have to worry about their data getting hoovered up, and 2) I don't need to update my privacy policy to include jsDelivr.
After reviewing jsDelivr's Privacy Policy, I began to grow uncomfortable when I read that they do collect your browser and device information. This is not OK with me. Then I read section 9 - Sharing your data with others - Specifically the following sentence:
We use third-party Service Providers to serve all of our traffic under the domain cdn.jsdelivr.net. This means all of these providers have access to your IP address and other information sent by your web browser.
Yup. Dealbreaker. I need to get rid of this code. I viewed the source of my home page and found this inside script tags:
My site does not have search, so I am definitely not in need of that feature. Time to do another grep (terminal command to search in linux and UNIX) through Ghost's source files to find any references to "sodo-search.min.js" I found the offending line in my site's root in path current/core/shared/config/defaults.json.
Since comments in JSON don't really work the way they do in normal files (it's a long story), I made a backup of the file and then deleted these lines from the original:
After deleting that and restarting Ghost, viola! The call to cdn.jsdelivr.net was gone!
All in the name of protecting you, my dear reader, from having your data harvested by someone you don't know. 😃
<End added 7/1/24>
And finally...
Last but not least, I chose to self-host the JQuery library from the OpenJS Foundation (a project of the Linux Foundation). Why? Again, no need to send users out needlessly on a quest for code from another domain that may or may not track them, if it can be served up in-house. I'll just have to keep it updated with the latest version myself, which is no real skin off of my back.
What does it all mean?
Well, honestly, it makes me feel both sympathetic and somewhat disheartened.
I feel sympathy towards start-up and small-time website owners who don't have time and resources to have custom code written exactly to their specs. They likely have to put up with a lot of junk and risk that they don't even know they are exposing themselves and their customers to.
I am disheartened in that it is necessary today to put your trust in an ever-lengthening chain of third-party building-block code - code that you don't own and aren't invested in - in order to produce content online.