As I write this piece, I’m preparing for the graduation of some of the first Bachelor of Cybersecurity students from James Cook University Singapore, which means it’s approaching two years since I joined and started teaching Cybersecurity. So now seems like a good time to reflect on some recurring tools and techniques that I’ve found useful so far. This post will focus on one: GNU Privacy Guard (GPG).

Unsurprisingly, cryptography comes up a lot in a course about cybersecurity, so no matter what subjects I’m teaching, there’s often a reason to introduce, or reiterate, some essential concepts. The big three that come up are symmetric/asymmetric encryption, signatures and public key infrastructure (PKI).

I subscribe to the philosophy of “show, don’t tell”, much like good playwrights, show-runners and game authors do. On top of that, I like to show students things that they can then do for themselves to find out more. So let me explain how I use GPG to glue together an array of cryptography concepts, while at the same time encouraging analytical thinking from my students.

Outline

I usually spend 20-30 minutes in a tutorial type environment going over this problem, when it’s relevant to either the work students are doing, or the recent lecture content they’ve seen. The rough outline is this:

  1. Introduce what GPG is and what it’s for
  2. Create a problem that GPG can help solve
  3. Together, establish our expectations about how GPG will do it
  4. Try it out, to see if our expectations are met
  5. Analyse and discuss any discrepancies
  6. Pick apart the process to see what really happens
  7. Discuss why it happens

It’s a mini-experiment, of sorts, that takes the base knowledge the students have, creates a hypothesis and tests it. Let me walk through each of the steps.

What is GPG?

The GNU Privacy Guard is an open source tool similar to Pretty Good Privacy (PGP). The idea is simple: by publishing, sharing and trusting public keys, individuals can send encrypted messages and data to each other and provide authenticity guarantees to those messages.

GPG can be used in various ways, but the two examples I use are:

  • It can be integrated with e-mail clients to enhance e-mail privacy, independent of the mail provider’s assurances.
  • Some companies publish a GPG key for bug bounty or vulnerability disclosure schemes, so that details of a bug or vulnerability can be shared safely with them.

It doesn’t really matter the use, so long as the problem we choose is something relatable and that can be solved with GPG. Speaking of which…

The problem

I usually keep the problem simple: I have a PNG image file that I want to encrypt and send (presumably via e-mail) to two recipients. Knowing that GPG seems to be designed to solve this kind of problem, and that it works with public keys, that should be enough to set the expectation of what it will do.

The expectation

Intuitively, GPG will apply asymmetric encryption to the file because that’s where public keys would be used. Students might expect two encrypted files to be produced โ€” one for each recipient โ€” or a single file containing both encrypted versions of the data.

Here I try to emphasise that for asymmetric cryptography to work, there must be two encrypted outputs, because the file will need to be encrypted by two different public keys, and that each recipient will use their own unique private key to decrypt the file later.

flowchart LR Pub_A>A's public key] F[(Plaintext)] Pub_B>B's public key] subgraph Asymmetric encryption E_F_a[[Encrypt file
for A]] E_F_b[[Encrypt file
for B]] end E_F[(Ciphertext)] Pub_A --> E_F_a F --> E_F_a Pub_B --> E_F_b F --> E_F_b E_F_a --> E_F E_F_b --> E_F

The experiment

Now we run gpg and encrypt the file. Out comes a single file with a .gpg extension. At this point I ask my students if there are any clues we can look for to see if we can prove how GPG works. Hopefully, the question of file size comes up, but I can guide things in that direction if needed.

At this point, assuming our experimental session gives us possession of all keys, we can also prove that the file is decrypt-able by both recipients (and whether the sender can decrypt it as well).

Subversion of expectations

Here is where things get interesting, because together, we observe that the file size of the .gpg file is not very different from the .png file. In fact, sometimes it’s a little smaller. So we’re left with a big question:

If the file has to be encrypted twice, how come the file size isn’t doubled?

At this point, hopefully, the students have some ideas, which usually include:

  • The image file was compressed during encryption
  • The public keys are used together, somehow, to produce a single encrypted file that both recipients can read
  • Some other form of encryption is going on

Some students zero in on exactly what’s going on at this point.

To contradict the compression argument, at this point I usually produce a second encrypted file with only a single recipient. If the compression is good, then based on what we’ve seen, surely the new encrypted file will be half of the size of the original? Spoiler: it isn’t. I explain to those not familiar, that PNG files are already a compressed image format, so any additional compression isn’t likely to make a huge difference. So something else must be going on…

The revelation

The big reveal comes through the use of gpg --list-packets --verbose ..., wherein we can see a bit more information about what GPG does when it tries to decrypt a file, and what component parts are inside that file. Here, I see what the students can spot in the output, but we’re looking for:

  • Packets containing public keys of the intended recipients
  • Compressed content
  • Encrypted content
  • Signature content
  • File information
  • Mention of unexpected encryption algorithms

The gotcha moment is if somebody can spot three magic letters: AES. On a good day, students will recall that this is symmetric encryption, not asymmetric. But we can still see public keys mentioned in the packet listing, so now we must arrive at a new understanding of what’s going on.

I take our earlier diagram of how we think it might work, and modify it to something closer to the truth: GPG creates a random session key, which is used to symmetrically encrypt the file. This session key is then itself encrypted asymmetrically by each recipient’s public key. Then we end up something more like this:

flowchart LR SK_gen[[Generate
session key]] Pub_A>A's public key] SK>Session key] Pub_B>B's public key] F[(Plaintext)] subgraph Asymmetric encryption E_F_a[[Encrypt key
for A]] E_F_b[[Encrypt key
for B]] end subgraph Symmetric encryption E_F_s[[Encrypt file]] end E_F[(Ciphertext,
encrypted
session key)] SK_gen --> SK SK -->|Use as: data| E_F_a SK -->|Use as: data| E_F_b Pub_A -->|Use as: key| E_F_a Pub_B -->|Use as: key| E_F_b F -->|Use as: data| E_F_s SK -->|Use as: key| E_F_s E_F_a -->|A-encrypted
session key| E_F E_F_b -->|B-encrypted
session key| E_F E_F_s -->|Session-encrypted
file| E_F

This is a simplified interpretation, and it omits some components that are present in the output .gpg file, such as information on the public keys used.

Review

Having seen what really happens, we come to appreciate a few things:

  • The file size is more or less the same because the bulk of it is indeed just one symmetrically encrypted copy of the file
  • Asymmetric encryption is used on the session key to share it with recipients
  • Compression does play a part, but we have a better understanding of how

I use this as an opportunity to discuss the performance considerations of (a)symmetric encryption, such that aside from ballooning file sizes, encrypting a whole file with various public keys would probably be slow. So now we have two reasons why it’s done this way.

I can also use this as an opportunity to talk about signatures, key distribution, etc., but it depends on the relevant to the particular subject as well how the students respond to the activity.

Resources

If you want to explore this subject further, perhaps for your own teaching or learning purposes, here are some additional resources I’ve prepared or found useful.

Try for yourself

I’ve prepared a GitLab project called GPGlue that allows you to walk through this process for yourself. The README file gives some explanation as well. The project and this blog post can be considered complementary. I decided to use ECC in the project, but identify the critical differences between it and RSA where necessary. An RSA variant would be easy to include too (merge requests welcome).

The RFCs

I seem to be making a habit of referring to RFCs, but in my line of work they’re eternally useful documents. So here is RFC 4880: OpenPGP Message Format. The section most pertinent to this blog post is ยง5.1 - Public-Key Encrypted Session Packets, leading to the TL;DR for this whole post:

The message is encrypted with the session key, and the session key is itself encrypted and stored in the Encrypted Session Key packet(s). The Symmetrically Encrypted Data Packet is preceded by one Public-Key Encrypted Session Key packet for each OpenPGP key to which the message is encrypted. The recipient of the message finds a session key that is encrypted to their public key, decrypts the session key, and then uses the session key to decrypt the message.

The best thing about a good magic trick is that it’s still cool even when you learn how it’s done.

For Elliptic Curve Cryptography (ECC) in OpenPGP, there is RFC 6637. There are plenty of additional RFCs, other IETF documents and relevant NIST publications mentioned in these two RFCs as well.

Additional reading

GnuPG.org also has a presentation by Neal H. Walfield titled An Advanced Introduction to GnuPG, which dissects the packets too. It explains a few areas I have glossed over either here or in the aforelinked code repository.

The full story

For completeness (at least on the encryption side), here’s a full diagram of how the packets are produced and assembled for a GPG message to two recipients. It contains the public keys that were used, the session key that’s encrypted by each of those public keys, the file, with signature produced over this data using the sender’s private key, compressed and then encrypted by the session key. Mermaid struggles with the complexity of the graph, so for any GPG experts, forgive any ordering inconsistencies along with the rearrangement to top-down.

flowchart TD SK_gen[[Generate
session key]] Pub_A>A's public key] SK>Session key] Pub_B>B's public key] Priv_X>X's private key] F[(Plaintext)] subgraph Asymmetric encryption E_F_a[[Encrypt key
for A]] E_F_b[[Encrypt key
for B]] end subgraph Symmetric encryption E_F_s[[Encrypt file]] end subgraph Signing S_F[[Sign]] OPS>One-pass info] Sig>Signature] end Z[[Compress]] E_F[(Ciphertext of
compressed signed
plaintext,
encrypted
session key)] SK_gen --> SK SK -->|Use as: data| E_F_b Pub_A --->|Use as: key| E_F_a SK -->|Use as: data| E_F_a Pub_B --->|Use as: key| E_F_b SK -->|Use as: key| E_F_s Z -->|Use as: data| E_F_s E_F_a -->|A-encrypted
session key| E_F E_F_b -->|B-encrypted
session key| E_F E_F_s -->|Session-encrypted
file| E_F Priv_X --> S_F F --> S_F S_F -->|Key ID,
sig types| OPS S_F --> Sig OPS --> Z F --> Z Sig --> Z

One item to note in the diagram is the one-pass information, which is placed before the plaintext so that GPG has enough information to start checking the signature of the file before the full file and signature data has been read. In other words, it can start using the appropriate hashes on the data it’s reading straight away.

Conclusions

In this post I’ve done two things: I’ve explained how GPG works, and I’ve explained how I explain it in a classroom environment. Hopefully you can see why I like to use this example, but if you can’t, I’d be interested to know why.

One excellent question you might be thinking is “why not look at TLS?” and it’s a good point. However, I find that diving into Public Key Infrastructure is best left for a separate activity, just like how I’ve ignored the parts of GPG that involve publishing, sharing and establishing trust in public keys in this example. Further, Michael Driscoll already has an excellent collection of projects that illustrate how TLS works, which in combination with Wireshark and OpenSSL make for a great exploration of certificates and secure sessions.

Whether you wanted to find out more about teaching cryptography, or if you were searching for information about GPG and some algorithm brought you here, I hope you learned something.