Talking Privacy of Journaling Apps

”Will my journal entries stay private?” is one of the questions I get asked frequently by Journalistic diarists, so I want to address it in detail in this post.

In a perfect world, the answer would be ”Yes, of course!”, but in reality, if we are honest, it’s not that simple. First of all, ”What do you mean with ‘private’?”. We will see that it is not so much a matter of technical challenges and monetization strategies, but more about UX tradeoffs, convenience, and legislation.

Grab a coffee and get comfy, this is a long one.

The White Rhino

It’s always a good idea to lay out the ideal scenario and then try to minimize the concessions that inevitably will have to be made.

The perfect, privacy-focused journaling app would IMO satisfy – among others – the following requirements:

Only users can access their entries without exception
Entries are synced instantly across all of the user's devices
If a device (or all devices) are damaged, lost, or stolen, the entries can be fully recovered
Recovery should also be possible if a user forgets their password or is not able to provide the required means of authentication (fingerprint, etc.)

The White Rhino is a metaphor for rare or elusive things.

I’ll try to make the case that achieving all of the above simultaneously is (virtually?) impossible and that we have to make some compromises to optimize user happiness.

As a side note... While journaling apps have extremely high privacy requirements, similar challenges apply to almost any web-based app that deals with private user data.

Privacy-enhancing strategies

First, let’s look at some methods that are commonly used to enhance the data privacy of consumer apps and afterward analyze whether it makes sense to utilize them for our journaling app or not.

Encryption

Using cryptography to keep data private is an ancient concept, dating back to the medieval ages. But advances in Mathematics and access to great amounts of processing power in every device make it now feasible to seamlessly integrate encryption mechanisms into real-time applications. When appropriate algorithms are used it is currently impossible to viably decipher encrypted data. Journal entries stored on cloud servers could therefore be reliably protected from unwanted access when encrypted.

The Achilles Heel of encryption is the encryption key, which has to be taken care of by the user. If the key is lost, the data is gone for good. More on that later.

Pseudonymity

Pseudonymity refers to the separation of data from identity. For instance, a user could create a brand new email address, which doesn’t hint at their name, and then use it to sign up for the journaling service. If they do not tell anyone about their new virtual identity their data might be accessed, but it cannot be linked to them.

Pseudonymous character holding up the ace of spades

I guess this is a good opportunity to introduce the two types of private data we have to worry about:

Personal data
People usually do not want strangers to know what they think and what they do at any given time unless they choose to share it. This is more or less our common definition of privacy and also exactly the kind of information someone would store in a journal.
Secret data
Secret data either holds value in itself, e.g. intellectual property, insider knowledge, etc., or can endanger the safety or property of individuals or groups if exposed. Think credit card information for example.

Pseudonymity is only useful (to a certain degree) to protect personal data. Most people would probably not care if their daily logs got leaked if they could be 100% sure, that nobody would ever find out that they wrote them. In practice, this guarantee can never be given and the more personal the data is (think names aso) the easier it becomes to link it to a person.

To protect secret data, pseudonymity is useless.

A great example of pseudonymity is the Bitcoin protocol.

While the protocol leverages strong encryption to protect the network itself, wallet addresses and transactions are pseudonymous and easily accessible to everyone in the public ledger. Since the addresses are only strings of seemingly random letters and numbers, privacy is protected until a wallet is at some point linked to an identity.

Local data storage and peer-to-peer sync

The most stringent strategy to enhance privacy is to store data exclusively on devices that the user physically controls and that nobody else has access to, e.g. a mobile phone. This is usually referred to as local data storage.

To synchronize locally-only stored data between devices they have to communicate directly with each other – peer to peer – without a middleman. Thanks to encryption, this is even possible through public networks.

Analyzing the requirements

Let’s go through the requirements one by one, clearly define why they are important, how they can be implemented, and also where the challenges are.

Only users can access their entries without exception

Why is it important?

As mentioned above, users want their entries to be private, and if third parties can access a user’s journal entries, the user has to trust each of those parties to keep the data private and secure. There is no way around it.

In general, the more parties have access to your data, the more likely it is that one of those parties is not acting in good faith, accidentally leaks the data, or becomes the target of a hack.

Depending on the jurisdiction in which the third parties are located, it is also possible that a court can subpoena private data, or that a government agency actively spies on citizens (see for instance the Snowden leaks ¹).

How can it be done?

The two ways to make sure that no one can access a user’s private data, except for the user, are:

Encryption
Local data storage only

What are the problems?

First of all, both techniques only work reliably if the software is flawless (app and encryption mechanisms/libraries). Bugs might easily lead to data leaks (e.g. the Heartbleed bug ²). The hardware can also have design flaws, see for instance the Meltdown vulnerability ³.

But let’s assume that software, hardware, and app are immaculate. Then it comes down to the single user. The elephant in the room for this one is data loss. If only the user has access to their data, they also bear all the responsibility for storing and/or accessing it.

In the case of encryption, data can be backed up the same way as unencrypted data, but if the encryption key is lost, your journal entries are encrypted for good. Potentially years’ worth of reflections – gone. Bummer!

Violation of requirement #4.

Here's a story where losing encryption keys went wrong big time: Half a Billion in Bitcoin, Lost in the Dump

For local storage-only, the problem is more on the backing-things-up side (discussed in the next section). A user might also only have one device – most likely a phone – which can be lost or stolen; or multiple devices, that all get destroyed simultaneously, for example in a fire.

Violation of requirement #3.

Entries are synced instantly across all of the user's devices

Why is it important?

It’s simply a matter of convenience and great user experience. Depending on the situation, you might want to write entries on your MacBook, where typing is more convenient; or access them on the go, on your phone, when you do not have your laptop at hand. Being always with you is one of the great advantages of an app over a paper journal. In the case of local storage-only, instant sync is also essential for backups.

How can it be done?

Data syncing has its own challenges, especially if devices do not have a stable internet connection or are disconnected from the web entirely for long periods. But that’s a topic for another day.

Data can be synced either peer-to-peer (see above), or traditionally via a central web server. Almost all web services you know (Gmail, Dropbox, Evernote, you name it...), use the latter.

What are the problems?

If devices need to communicate directly to exchange data in the peer-to-peer case, continuous synchronization is difficult, e.g. all devices would need to be ON all the time and connected to the network to be able to provide their latest updates when another device performs a sync.

Since there is no single device in a peer-to-peer network that serves as a master node, inconsistencies can occur when different devices resolve conflicts and do not immediately broadcast the changes to all other devices.

Using a central server circumvents these problems by being always available and serving as the single source of truth by always keeping the most up-to-date state.

The downside is that the service provider and/or a third party have physical access to the hardware on which the data is stored.

If a device (or all devices) is damaged, lost, or stolen, the entries can be fully recovered

Why is it important?

In almost all cases complete data loss is probably the worst-case scenario.

How can it be done?

Data should be backed up multiple times (at least 3x, ideally more) in as many different physical locations as possible. Physical distance is important in cases of black swan events where all devices in a location can be lost or destroyed, e.g. burglary, flooding, etc.

What are the problems?

With a local storage-only system, it might be inconvenient or even impossible for a user to establish 3+ backups on different devices and separate them spatially at all times.

In the conventional case, this responsibility is shifted to the cloud service provider. Their advantage is that they are professionals and safely storing data is their business.

Recovery should also be possible if a user forgets their password or is not able to provide the required means of authentication

Why is it important?

Again, complete data loss might be the worst-case scenario.

How can it be done?

The only way to achieve this is to entrust the service provider or a third party with a master key to your data. This applies both to encryption and authentication.

Violation of requirement #1.

What are the problems?

Being able to recover your entries after losing your password or encryption key can only be achieved by giving up a certain degree of privacy, at least if you define it most stringently. Password reset always requires trust.

Keep in mind that if a web service claims your data is encrypted, but you can reset your password, they at least have a key as well, otherwise they would not be able to help you out. And then, what’s the point?

Concessions, trust, and risks

As we’ve seen, it is not possible to fulfill all four requirements we set simultaneously, so we have to prioritize and balance the cost versus benefit of each of the privacy strategies and accommodate for their incompatibilities. This is also where we’ll leave the realm of objectivity and enter the world of the opinionated design principles and ideologies behind Journalistic specifically (see About page).

I’ll try to explain the reasoning behind all privacy-related design decisions and will attempt to answer the question that led us here:

”Will my journal entries stay private?”

Avoiding data loss at all cost

I already mentioned that IMO complete data loss is the worst-case scenario. How many times have you reset a password in the past? And what if you couldn't do that?

Therefore, unfortunately, we have to compromise on both encryption and local storage-only. At least in the default case. It might be possible to opt-in to either of those in the future if you prioritize privacy over data recovery. Have a look at the Voynich Project if you want to know more.

Does that mean my data is not private?

No!

It means that you have to trust us and our meticulously selected third-party service providers to keep your data safe and private.

Who you need to trust

Journalistic

You have to trust us. Obviously.

In any case, unless you are an expert in software and security, you will always have to trust at least the service provider (the organization behind the app/website you are using). If a service you use wanted to spy on you they could just add a backdoor to their encryption algorithm or broadcast data without your knowledge.

But you also need to trust the service provider that their highest priority is user satisfaction, not profit and that they take privacy seriously.

Being a tiny organization is advantageous here, versus a big corporation with stakeholders that are only in for the bucks.

We strongly believe that transparency builds trust. In the end, this is why I’m sitting here writing this lengthy post.

Digital Ocean

We use Digital Ocean as our cloud service provider. DO is a US company, but the servers we rent from them are located in the European Union. European data protection rules (GDPR) therefore apply.

As explained in this thread, it is not in their interest to go out of their way to access data we store on one of their servers, even though they theoretically could.

Relying on a professional cloud service provider is, especially for small organizations, more secure than running, protecting, and maintaining an in-house infrastructure.

Third-party services

Specifically for privacy reasons, we keep reliance on third-party services at a minimum and currently only use Sentry for error monitoring and Plausible Analytics to gather usage data.

Encryption in transit

While end-to-end encryption is risky, encrypting data while it moves over the wire is standard as of 2022 via SSL (known as HTTPS). Your data will always be encrypted when it moves through infrastructure that is not controlled by you, by us, or by Digital Ocean.

Risks and word of caution

While we do everything in our power to keep your data safe, there is always the risk of a hack. I therefore strongly encourage you to not add content to your journaling app that could harm you or anyone else if exposed. This warning applies to Journalistic as well as for any other journaling app, whether it is encrypted or not!

What about pseudonymity?

It is completely up to you to activate “stealth mode” and add an additional safety layer to your privacy. Here is how it works. Just create a new email address that doesn’t have your name in it and exclusively use it to sign up for Journalistic. You can get free email accounts from Gmail, Yahoo! Mail, etc. In case of a data leak, your journal will be only one among thousands.

We might integrate pseudonymity directly into our service in the future via a service like Evervault.

Verdict, and looking ahead

Beware of the black swan!

The ... theory of black swan events is a metaphor that describes an event that comes as a surprise, has a major effect, and is often inappropriately rationalized after the fact with the benefit of hindsight (Wikipedia).

I would claim that any web-based service that asserts that your data is 100% safe is either ignorant, rounding up, or straight out lying. And it surely is the more responsible approach to be rather cautious and transparent than overconfident. There is always the risk of a system failure or an unforeseen security vulnerability.

That being said, Journalistic is designed to be secure and private, and if we assume that our software works as intended your journal entries will stay private. We will not ever sell or intentionally give away your data to third parties without your explicit consent!

To answer another often asked question: this is also why we chose to rely on a business model with a paid PRO version for advanced users (not yet available) to fund the development and operation of Journalistic instead of selling data for dollars.

Regarding the future… We are committed to continuously refining our software design to make systemic failure as unlikely as possible. Furthermore, we’ll keep experimenting and testing opt-in features like encryption and local-only data storage and give you the option to use them.

Feedback and suggestions

If you have any suggestions regarding our approach towards data privacy, our Privacy Policy, or general feedback, please send them our way to feedback@journalisticapp.com.

Thanks for reading and have a marvelous day.

This post has been composed while exploring the Balinese culture and surfing the amazing waves of Uluwatu and Canggu.

In 2013, Edward Snowden exposed a secret, large-scale spying operation on private citizens, coordinated between the NSA and the US government: NSA Files Decoded ↩
Heartbleed was a security bug in the OpenSSL cryptography library, which is a widely used implementation of the Transport Layer Security protocol (Wikipedia). ↩
Meltdown is a hardware vulnerability affecting Intel x86 microprocessors, IBM POWER processors, and some ARM-based microprocessors. It allows a rogue process to read all memory, even when it is not authorized to do so. Meltdown affects a wide range of systems (Wikipedia). ↩