I recently revisited deplying IPv6 on my home network, having previously been frustrated by my ISP only giving me a single /64 prefix. That remains the case, but I was determined to find some creative ways of dealing with the situation. The solution turned out to involve an Amazon Web Services VPC and a pull-request to the OPNsense project.

TL;DR

I allocated myself a /56 subnet (in this case from AWS), used it on my internal networks to overcome issues with IPv4 being preferred over IPv6 ULAs, and at the network edge applied SNAT of the /56 onto the /64 delegated to me by my ISP. I’d argue it’s better than some “solutions”, but is still worse than the preferred solution, which is ISPs providing more generous prefixes to their customers.

If you’re already familiar with the background behind these problems with IPv6, then you can skip to my solution.

How IPv6 is supposed to work

It’s quite simple:

  • Your router is given an address, quite likely a /128
  • You are also given a network prefix, which RIPE 690 suggests should be a /48 or /56
  • In the case of residential customers, these are probably assigned via DHCPv6 and Prefix Delegation (PD).

And that’s it!

How it actually works

They found a way to take something awesome, and make it terrible:

  • ISPs often ignore the RIPE recommendations and give you a /64 PD.

This is certainly the case in Singapore right now.

What’s the big deal?

There are three main issues with the disparity between recommendation and reality:

  1. There is an abundance of addresses in IPv6 so there’s no reason to be stingy with prefixes
  2. Limiting customers to a single /64 means they cannot have more than one v6 network behind their router
  3. It encourages people to break more rules and expectations, such as using NAT on IPv6 or using prefixes longer than /64.

Before getting into my solution diabolical workaround to this issue, let’s examine why the above three points are an issue.

Address abundance

IPv6 addresses are four times longer than IPv4 addresses. That means we go from address ranges in the range 0–4,294,967,296 (2^32 or around 4.3 billion), to 0–340,282,366,920,938,463,463,374,607,431,768,211,456 (2^128 or around 340 undecillion). We ran out of IPv4 addresses some time ago, but that problem is solved by IPv6’s enormous address space. To put it into perspective, there are around 2^165 water molecules on earth, and 2^63 grains of sand. So while we can’t address each water molecule individually with IPv6, we could address each grain of sand in a single /64 subnet.

Registry authorities give ISPs prefixes depending on their needs, but we’re talking in the range of /29/32. That is to say, each ISP is given the equivalent of the entire IPv4 address space or larger, to subnet out to their customers. But let’s think for a second about their constraints if following RIPE 690, where in the “worst” case, they are giving /48 prefixes to customers. If they have a /32 prefix of their own to work with, then they have 16 bits to play with. They can allocate 2^16 subnets - that’s 65,536 customers. Take the other extreme - a /29 for the ISP and /56 for customers, that’s 2^27, or 134,217,728 customers. Got more customers? Get more prefixes. A second prefix of the same size will double the number of customer subnets.

But won’t the registry’s run out of addresses? They are given /12 prefixes as a starting point. Ignoring national and local intermediary registries, if the ISPs are getting /29, then the RIR can provide 2^17 prefixes, which is 131,072 subnets for ISPs. And we’ve still got eleven bits left…

Of course, I’ve skipped over some details here, focusing on base-2 orders of magnitude. This is more than adequate to get the point across, which is: there are plenty of address bits to go around.

/64 is very limiting

While it’s true that we could address all of the sand grains on our beaches in a single /64 subnet, there are some non-numerical reasons that giving end-customers a /64 is restrictive. The IPv6 standard is built around the principle that the 128 bits of the address are used like this:

Bits 48 (or more) 16 (or fewer) 64
Field Routing prefix Subnet ID Interface identifier

In other words, the last 64 bits are for hosts, not networks; you cannot subnet a /64 any further. If we take this as given for a second, what that means is, if you have more than one subnetwork in your home or small office, then your prefix delegation can only apply to one of those. You can’t deploy ISP-assigned addresses to all of your networks. This is what infuriates home-lab folks, but it shouldn’t be limited just to them. A number of home routers allow separate networks to be used for IoT devices or guest access, and it’s a very good idea to keep those separate from your main network for security reasons. So at best, it’s a shame that those networks don’t get to join the v6 party.

Rule breaking

There are lots of ways we can “fix” these problems. But they each have drawbacks.

Smaller subnets

Of course, there are rules, then there are Rules, and there are also expectations. You can subnet a /64. There are lots of good reasons to do so. You might have a bunch of virtual machines on a host, or Docker services, or Kubernetes pods, and want to use a /80 or similar to allocate addresses to them. AWS caters to exactly this, allowing customers to assign /80 PDs onto EC2 elastic network interfaces.

A counterexample such as this is not a justification to do it on a wider scale, though. Containerised and virutalised environments can be considered something of a special case, with additional software that will handle things, such as a DHCPv6 server. This is where things get interesting, for two reasons:

  1. Not all devices support DHCPv6
  2. The alternative will not work on prefixes larger than /64.

More on these follows.

SLAAC - the alternative to DHCPv6

I am referring to SLAAC, or StateLess Auto Address Configuration. It works something like this:

  • A new device performs neighbour discovery using NDP, and a neighbouring router will reveal the network’s prefix.
  • The device will then compute an IPv6 address for itself by generating the lower 64 bits (host bits) of the address, combined with the 64 network bits.
  • Collisions/duplicates are identified through neighbour solicitation of the chosen address - if a neighbour has the address, then the device must choose a new one.

In a 64-bit address space, duplicates are highly unlikely if address selection is random, but of course, a mechanism is still required to guarantee uniqueness.

Here we see why /64 is our hard boundary for subnets: SLAAC requires all 64 host bits. It will not work on longer prefixes.

DHCPv6 - the alternative to SLAAC

The obvious “solution” to this problem is to use DHCPv6, right? But remember:

  1. Not all devices support DHCPv6

Cloud infrastructure, servers, containers and virtualised environments are all likely to have DHCPv6 clients. But IoT devices, some PCs and smartphones may not. Android is the prime example of this, because Google doesn’t see a reason to add DHCPv6 support to Android and Chromebook devices when, if everybody’s following the rules, SLAAC is already an option.

Android (and ChromeOS) works fine on dualstack and ipv6-only IPv6 networks, provided they’re SLAAC enabled (which is a single bit in the RA) and provide RDNSS information.

Just use private addresses

In IPv6, private addresses are known as “Unique Local Addresses (ULAs). Just like 10/8, 192.168/16, and 172.16/20 were carved out for private use in IPv4, in IPv6 we have fc00::/7, with fd00::/8 being the only half of this space currently in use. So, if we give each network something from this space, we can use /64 or bigger for each subnetwork, SLAAC will work, and everything is good, right? Well…

  1. Private addresses aren’t internet-routable
  2. Operating systems may prefer IPv4 over an IPv6 interface with a ULA

Both of these are solvable. To fix the first one, we need NAT. More on that later. To fix the second issue, we need to update the OS configuration. For example, in Windows, the protocol preference is determined by a prefixpolicies table:

>netsh interface ipv6 show prefixpolicies
Querying active state...

Precedence  Label  Prefix
----------  -----  --------------------------------
        50      0  ::1/128
        40      1  ::/0
        35      4  ::ffff:0:0/96
        30      2  2002::/16
         5      5  2001::/32
         3     13  fc00::/7
         1     11  fec0::/10
         1     12  3ffe::/16
         1      3  ::/96

The line with ::ffff:0:0/96 matches for IPv4 addresses, while our ULAs are further down the table. This follows RFC 6724 which covers address selection preferences in IPv6, and has been identified as a bit of a nuisance, but here we are. You can change this policy tables, but you’d need to do that on every device on the network for them to be sure to prefer a ULA over an IPv4 address, assuming a dual-stack environment. Using internet-routed addresses, or Global Unicast Addresses (GUAs), would overcome this issue, as they are preferred over IPv4 by default.

Nobody likes NAT

Network Address Translation solves a problem in IPv4 that doesn’t need to exist in IPv6. If your ISP gives you one public IP, but you need to route traffic from multiple devices on a private network, you need NAT to translate between unroutable private addresses, and the public address of your router. A translation table is maintained by the router, mapping original source/destination IPs/ports, to translated IPs/ports. So long as you don’t have too many devices and active connections, the router will have enough space in its translation table to be able to track your connection and handle the translations.

Given the limited space in IPv4, this makes sense. It’s messy, but it works. It works so well, that carrier-grade NAT (CG-NAT) is implemented in many cellular provider’s networks to service the huge number of mobile devices that want an Internet connection. With IPv6, none of this is necessary.

If you seek advice online about how to do NAT with IPv6, you’ll often be met with comments along the lines of “get your ISP to give you a proper prefix delegation instead”. I agree with the principle, but unfortunately, the precedent has been set against this principle at a scale that I don’t think we can easily walk back from.

How I fixed it

Firstly, well done on making it this far. Really. If you skipped to this bit, still I congratulate you on being interested enough to get here.

To overcome the issues discussed in the previous section, I set out to do two things:

  1. Get a better IPv6 prefix from somewhere.
  2. Map that onto the /64 my ISP gives me.

Re-read that second one again. Yes, my solution is NAT… again…

The prefix

I have an AWS account. In AWS, you can allocate IPv6 networks to your Virtual Private Clouds. I’m sure you can do similar with other Cloud Service Providers, and indeed there are other types of providers from whom you can get IPv6 allocations. So, what if I allocate one of these prefixes, then delegate it down to my home network?

My initial idea was this:

flowchart BT L[LAN] G[Guest] O[Other...] HR((Home
Router)) I[ISP] C[Cloud
provider] CR((Cloud
Router)) U((Upstream)) L ---|2001:db8:cccc:0::/64| HR G ---|2001:db8:cccc:1::/64| HR O ---|2001:db8:cccc:2::/64| HR HR ---|2001:db8:1111::x/128| I I --- C C --- CR HR -...-|VPN| CR CR --- |2001:db8:cccc::/56| U

Sufficient to say I spent a lot of time exploring this, and eventually realised that while you can route traffic however you like within your VPC, externally, AWS’s gateways will not route traffic into your VPC subnets unless the target IPs are allocated to interfaces within that VPC. /80 prefixes can be delegated onto interfaces, like I mentioned earlier, but this is insufficient for SLAAC to work. Amazon want you to use their VPN services, or else Bring Your Own IP (BYOIP) addresses to the party if you want to enjoy proper ingress routing.

Then I realised, I didn’t need to route the traffic over AWS at all. I just needed to use the addresses locally at home, and translate them on the way out. I am in effect “borrowing” the IPv6 prefix from AWS, using it in my local network, but translating it externally so nothing ever tries to route to it. The subnet is allocated to me by AWS. Nobody else will ever use it, unless I release it. I could choose to start using it in my VPC. I’d only be inconveniencing myself if I created any conflicts.

NAT

There are various forms of NAT, and I’m not going to get into all of them here. The kind referred to up to this point is symmetric NAT. Translations are established by traffic initiated within the local network, and are tracked between source and destination IP and port. In a home enviornment, there is usually only one external IP that can be used as the translated address.

Another approach worth mentioning is prefix translation, also known as NPTv6 or sometimes NAT66. With NPTv6, IPs are translated simply by replacing the source prefix with a target prefix when they traverse the router, or vice-versa in the other direction. There is a 1:1 mapping between source and translated IPs. This means it can be done statelessly, because the translation is simply applying a mask to addresses and replacing the mask with a static value. Obviously, this only works if the source and translated prefixes are the same length. So, with a single public /64, NPTv6 can only be applied to one subnetwork if features like SLAAC are still desired.

The in-between solution

What I set out to do is translate source IPs onto target IPs within my ISP-allocated /64. My source IPs happen to be GUAs themselves, but never exposed outside of my network thanks to the NAT being applied here. The source addresses reside in a larger network, as big as /56, and somehow must be translated into the /64. Prefix translation won’t work here, but there are a few options:

  1. Round-robin
  2. Random
  3. Hash-based

I chose hash-based, meaning a given source address would be consistently mapped onto the same translated outbound address. This would be very useful for tracking traffic and identifying issues or unusual behaviours.

The following examples use addresses from 2001:db8::/32, which are reserved for use in examples:

  • 2001:db8:cccc::/56: A subnet of GUAs allocated to me by somebody (say, AWS VPC)
  • 2001:db8:1111::/64: The PD given to me by my ISP
  • eth0: The WAN interface

Local network interfaces can each be given a /64 from the borrowed /56, and do router/RDNSS advertisements to enable SLAAC.

My attempt at depicting it in a mermaid chart looks like this:

flowchart BT L[LAN] G[Guest] O[Other...] N{{NAT pool}} HR((Home
Router)) I[ISP] L ---|2001:db8:cccc:0::/64| HR G ---|2001:db8:cccc:1::/64| HR O ---|2001:db8:cccc:2::/64| HR HR ---|2001:db8:1111::x/128
WAN IP| I HR --- N N ---|NAT 2001:db8:cccc:2::/56
-> 2001:db8:1111:y::/64| HR N -...-|2001:db8:1111:y::/64
PD| I

Linux

I first played with this in AWS EC2 on a Linux box, so determined the ip6tables rule to apply to achieve what I wanted. I’ve annotated the command:

ip6tables -tnat -A POSTROUTING -s 2001:db8:cccc::/56 -o eth0 -j SNAT --to-source 2001:db8:1111::-2001:db8:1111::ffff:ffff:ffff:ffff --persistent
#                                 |                     |                        |                                                    |
#                                 |                     |      /64 expressed as a start-end range of IPs                       Source given same
#                                 |                     |                                                                    translation each time
#                      Prefix used internally      WAN interface

In nftables, the ipv6 NAT table will look something like this:

table ip6 nat {
        chain PREROUTING {
                type nat hook prerouting priority dstnat; policy accept;
        }

        chain INPUT {
                type nat hook input priority srcnat; policy accept;
        }

        chain OUTPUT {
                type nat hook output priority dstnat; policy accept;
        }

        chain POSTROUTING {
                type nat hook postrouting priority srcnat; policy accept;
                ip6 saddr 2001:db8:cccc::/56 oifname "eth0" snat to 2001:db8:1111::-2001:db8:1111::ffff:ffff:ffff:ffff persistent
        }
}

In the SNAT subsection of the iptables-exensions’s Target Extensions documentation we can see how this rule is arrived at. By omitting --random, source ports are not randomly mapped, and should stay the same during translation (unlike traditional symmetric NAT with single outbound IP). The jhash2 “Jenkins hash” is used to do the address translation.

You will note that the target address range is fixed. So, if our ISP renews our WAN IP and PD over DHCP, this will become invalid. One would need to implement a hook for their DHCP client to update this rule on address change. This isn’t something I’ve done yet, but the documentation for dhclient’s script hooks would be a good starting point.

FreeBSD

I use an OPNsense router/firewall at home, so I also figured out how to do this in FreeBSD (the base used to create OPNsense). FreeBSD’s packet filter, pf, is quite different to iptables and nftables. Writing NAT rules is documented in the pf.conf manpage, but there are a couple of details to be aware of. Let’s see the rule first, then discuss it.

nat on eth0 inet6 from 2001:db8:cccc::/56 to any \
   -> (eth9:network:0) source-hash 0x123456789abcdef00000000000000000Β static-port

As it’s a long rule, let’s look at it piece by piece:

Phrase Description
nat on eth0 Perform NAT on traffic leaving eth0 (WAN)
inet6 Rule applies only to IPv6 traffic
from 2001:db8:cccc::/56 Source address is our internal prefix
to any Any destination address
-> (eth9:network:0) Translate to addresses on eth9’s network, excluding aliases (i.e. the interface’s main subnet)
source-hash 0x... A static key to salt the hashing algorithm with
static-port Source port and translated port are the same

In Packet Filtering -> Parameters in the pf.conf docs, we can see how to specify a network interface’s subnet by appending a modifier to the interface name:

:network Translates to the network(s) attached to the interface.

This gives something like eth9:network. To use this, I created eth9 as a separate Ethernet device (in my case on an unused VLAN - I think it could be a virtual device also), and set it up to obtain my ISP’s PD. In this situation, eth9 would resolve to its IP, for example 2001:db8:1111::1234/64, while eth9:network would resolve to 2001:db8:1111::/64, i.e. the whole subnet.

The next thing we have to consider is the pool options that are used to map source addresses onto a target address pool. There are several options, but some are only valid in certain conditions. Of particular importance is this:

When more than one redirection address is specified, round-robin is the only permitted pool type.

A redirection (or translation) address has to be a singular entry, but that entry can be a subnet. Back to the interface modifiers:

:0 Do not include interface aliases.

Combining this with the previous modifier, we get eth9:network:0. In effect, eth9:network might be considered the list [ "2001:db8:1111::/64" ] and so only round-robin pools are valid translation targets, while eth9:network:0 is the single entry "2001:db8:1111::/64" and so more options are available, including source-hash.

These addresses are all determined when the rules are loaded. So if the interfaces addresses change, the rules are not automatically updated. Back to the modifier documentation for the last time:

Surrounding the interface name (and optional modifiers) in parentheses changes this behaviour. When the interface name is surrounded by parentheses, the rule is automatically updated when ever the interface changes its address. The ruleset does not need to be reloaded. This is especially useful with nat.

And so we arrive at (eth9:network:0). A dynamically tracked delegated prefix assigned to an otherwise unused network interface, for the purposes of SNAT.

By providing a key to source-hash, translations are stable, like with --persistent in iptables. Without it, the salt would be different every rule reload, making tracking addresses in logs a log harder. Using static-port is akin to omitting --random in iptables.

OPNsense

I don’t actually use FreeBSD (any more - I spent years as a sysadmin of FreeBSD servers, though). I use OPNsense. Underneath the hood, OPNsense uses FreeBSD and pf. Unfortunately, some layers of indirection and interface restrictions meant that I could not apply the above rules easily. That was until I modified the OPNsense source code. These changes allow OPNsense’s source NAT configuration to use an interface’s first network — (iface:network:0) — where before, only the interface’s first IP was selectable.

You may have noticed that in the previous section I didn’t mention how I actually assigned the PD to my extra interface. That’s because I was using OPNsense, and simply set that interface to track the WAN interface which will then handle the delegation automatically. OPNsense also makes it easy to enable SLAAC on a network by configuring the router advertisement options. I was using unmanaged mode in my testing.

Windows, OpenBSD, OS X, Cisco iOS, VxWorks…

Don’t you think I’ve done enough?

Comparison

Here’s how my solution, which I’ve termed “borrowed GUA + NAT”, fares against the “conventional” approach with ULAs, as well as how things would work if ISPs played nice:

ISP /64 only ULA + NAT Borrowed GUA + NAT Properly sized ISP PDs
Preferred over IPv4 by default ❌ ❌ βœ… βœ…
Unique GUA mapping per ULA ❌ ❌ βœ… βœ…
Inbound connections without port forwarding βœ… ❌ ❌ βœ…
Multiple local subnetworks with SLAAC ❌ βœ… βœ… βœ…
Makes v6 purists happy ❌ ❌ ❌ βœ…

It’s not perfect, but it’s better. My traffic is no longer crammed behind a single IP, but rather each host has its own GUA. On a dynamically assigned WAN address and PD, inbound connections without port forwarding is a moot point anyway, even if the PD is /56. For me, most importantly, devices will prefer IPv6 over IPv4, thanks to the use of GUAs within the local networks, rather than ULAs.

Concerns

In my opinion, there are two main concerns to consider. Complexity and Privacy.

Complexity

Once setup, it’s not that complex conceptually. There’s many-to-many SNAT that’s not as easy to follow as NPTv6, but it’s easier to follow than single-address symmetric NAT with re-written ports. My solution is still stateful, meaning the firewalls state and translation tables have lots of work to do. But they did anyway, because my network is dual-stack (for now), and even if the translation can be stateless, the firewall will still be doing some stateful work. I don’t foresee my router struggling at any point, based on what I’ve observed of its CPU usage, state table occupancy and data throughput.

Privacy

Statically mapped addresses mean that sources are trackable at the individual device level. However, they are not identifiable, because features like temporary addresses and stable privacy addresses can still be used on the local network, and are each also uniquely mapped. In any case, my ISP knows the traffic is coming from my network. Nothing changes there with regards to responsibility, accountability or ethics, and anonymising methods at higher levels of the network stack should still be applied where necessary.

Conclusions

In this post I’ve described some of the issues around IPv6 deployment, and yet another way of dealing with it. None of what I’ve done here is recommended, but hopefully it’s informative. Perhaps you have opinions of your own, in which case please share them with me on linkedin.