Latest news about Bitcoin and all cryptocurrencies. Your daily crypto news habit.
The United States Naval Research Laboratory developed The Onion Routing Protocol (T0r) to project U.S. intelligence communications online. Ironically, Tor has seen widespread use by everyoneâââeven those organisations which the U.S. Navy fights against.
You may know Tor as the hometown of online illegal activities, a place where you can buy any drug you want, a place for all things illegal. Tor is much larger than what the media makes it out to be. According to Kings College much of Tor is legal.
This article doesnât talk about whatâs on Tor, or how to access Tor. This article gives a technical rundown of how the technology works, without speculation and without exaggeration of what Tor is.
The core principle of Tor is onion routing which is a technique for anonymous & secure communication over a public network. In onion routing messages are encapsulated in several layers of encryption.
Onions have multiple layers to them, and so does a message going through Tor. Each layer in Tor is encryption, you are adding layers of encryption to a Tor message, as opposed to just adding 1 layer of encryption.
This is why itâs called The Onion Routing Protocol, because it adds layers at each stage.
The resulting onion (fully encapsulated message) is then transmitted through a series of computers in a network (called onion routers) with each computer peeling away a layer of the âonionâ. Each layer contains the next destinationâââthe next router the packet has to go to. When the final layer is decrypted you get the plaintext (non-encrypted message).
The original author remains anonymous because each node in the network is only aware of the preceding and following nodes in the path (except the first node that does know who the sender is, but doesnât know the final destination).
This has led to attacks where large organisations with expansive resources run servers to attempt to be the first and last nodes in the network. If the organisationâs server is the first node, it knows who sent the message. If the organisation server is the last node, it knows the final destination and what the message says.
Now we have a basic overview of Tor, letâs start exploring how each part of Tor works.
Overview
Onion Routing is a distributed overlay network designed to anonymise TCP-based applications like web browsing, secure shell and instant messaging.
Clients choose a path through the network and build a circuit where each onion router in the path knows the predecessor and the successor, but no other nodes in the circuit.
The original author (the question mark on the far left) remains anonymous, unless youâre the first path in the node as you know who sent you the packet.
No one knows what data is being sent until it reaches the last node in the path; who knows the data but doesnât know who sent it. The second to last node in the path doesnât know what the data is, only the last node in the path does.
This has led to attacks whereby large organisations with expansive resources create Tor servers which aim to be the first and last onion routers in a path. If the organisation can do this, they get to know who sent the data and what data was sent, effectively breaking Tor.
Oh no! Now large organisation knows you watch Netflix đż
Itâs incredibly hard to do this without being physically close to the location of the organisations servers, weâll explore this more later.
Throughout this article Iâll be using Netflix as a normal service (Bob) and Amazon Prime Video as the adversary (Eve). In the real world, this is incredibly unlikely to be the case. Iâm not here to speculate on what organisations might want to attack Tor, so Iâve used 2 unlikely examples to avoid the political side of it.
Each packet flows down the network in fixed-size cells. These cells have to be the same size so none of the data going through the Tor network looks suspiciously big.
These cells are unwrapped by a symmetric key at each router and then the cell is relayed further down the path. Letâs go into Tor itself.
Tor itself
There is strength in numbers
Tor needs a lot of users to create anonymity, if Tor was hard to use new users wouldnât adopt it so quickly. Because new users wonât adopt it, Tor becomes less anonymous. By this reasoning it is easy to see that usability isnât just a design choice of Tor but a security requirement to make Tor more secure.
If Tor isnât usable or designed nicely, it wonât be used by many people. If itâs not used by many people, itâs less anonymous.
Tor has had to make some design choices that may not improve security but improve usability with the hopes that an improvement in usability is an improvement in security.
What Tor Isnât
Tor is not a completely decentralised peer-to-peer system like many people believe it to be. If it was completely peer to peer it wouldnât be very usable. Tor requires a set of directory servers that manage and keep the state of the network at any given time.
Tor is not secure against end to end attacks. An end to end attack is where an entity has control of both the first and last node in a path, as talked about earlier. This is a problem that cyber security experts have yet to solve, so Tor does not have a solution to this problem.
Tor does not hide the identity of the sender.
In 2013 during the Final Exams period at Harvard a student tried to delay the exam by sending in a fake bomb threat. The student used Tor and Guerrilla Mail (a service which allows people to make disposable email addresses) to send the bomb threat to school officials.
The student was caught, even though he took precautions to make sure he wasnât caught.
Gurillar mail sends an originating IP address header along with the email thatâs sent so the receiver knows where the original email came from. With Tor, the student expected the IP address to be scrambled but the authorities knew it came from a Tor exit node (Tor keeps a list of all nodes in the directory service) so the authorities simply looked for people who were accessing Tor (within the university) at the time the email was sent.
Tor isnât an anonymising service, but it is a service that can encrypt all traffic from A to B (so long as an end-end attack isnât performed). Tor is also incredibly slow, so using it for Netflix isnât a good use case.
Now that we have a good handle on what Tor is, letâs explore onion routing.
Onion Routing
Given the network above, we are going to simulate what Tor does. Your computer is the one on the far left, and youâre sending a request to watch Stranger Things on Netflix (because what else is Tor used for đ). This path of nodes is called a circuit. Later on, weâre going to look into how circuits are made and how the encryption works. But for now weâre trying to generalise how Tor works.
We start off with the message (we havenât sent it yet). We need to encrypt the message N times (where N is how many nodes are in the path). We encrypt it using AES, a symmetric key crypto-system. The key is agreed using Diffie-Hellman. Donât worry, weâll discuss all of this later. There is 4 nodes in the path (minus your computer and Netflix) so we encrypt the message 4Â times.
Our packet (onion) has 4 layers. Blue, purple, orange, and teal. Each colour represents one layer of encryption.
We send the onion to the first node in our path. That node then removes the first layer of encryption.
Each node in the path knows what the key to decrypt their layer is (via Diffie-Hellman). Node 1 removes the blue layer with their symmetric key (that you both agreed on).
Node 1 knows you sent the message, but the message is still encrypted by 3 layers of encryption, it has no idea what the message is.
As it travels down the path, more and more layers are stripped away. The next node does not know who sent the packet. All it knows is that Node 1 sent them the packet, and itâs to be delivered to Node 3.
Now Node 3 strips away a layer.
The final node knows what the message is and where itâs going, but it doesnât know who sent it. All it knows is that Node 3 sent them the message, but it doesnât know about anyone else in the path. One of the key properties here is that once a node decrypts a layer, it cannot tell how many more layers there are to decrypt. It could be as small as 1 or 2 or as large as 200 layers of encryption.
Now thereâs no way Amazon can find out you watch Netflix! Netflix sends back a part of Stranger Things.
Letâs see how it works in reverse.
Node 4 adds its layer of encryption now. It doesnât know who originally made the request, all it knows is that Node 3 sent the request to them so it sends the response message back to Node 3.
And so on for the next few nodes.
Now the response packet is fully encrypted.
Now the packet is fully encrypted, the only one who still knows what the message contains is Node 4. The only one who knows who made the message is Node 1. Now that we have the fully encrypted response back, we can use all the symmetric keys to decrypt it.
You might be thinking âIâve seen snails đ faster than thisâ and you would be right. This protocol isnât designed for speed, but at the same time it has to care about speed.
The algorithm could be much slower, but much more secure (using entirely public key cryptography instead of symmetric key cryptography) but the usability of the system matters. So yes, itâs slow. No itâs not as slow as it could be. But itâs all a balancing act here.
The encryption used is normally AES with the key being shared via Diffie-Hellman. Iâve written another article about Diffie-Hellman here.
The paths Tor creates are called circuits. Letâs explore how Tor chooses what nodes to use in a circuit.
How is a circuit created?
Each machine, when it wants to create a circuit, chooses the exit node first, followed by the other nodes in the circuit. Tor circuits are always 3 nodes. Increasing the length of the circuit does not create better anonymity. If an attacker owns the first and last nodes in the network, you can have 1500 nodes in the circuit and it still wouldnât make you more secure.
When Tor selects the exit node, it selects it following these principles:
- Does the clientâs torrc (the configuration file of Tor) have settings about which exit nodes not to choose?
- Tor only chooses an exit relay which allows you to exit the Tor network. Some exit nodes only allow web traffic (HTTP/S port 80) which is not useful when someone wants to send email (SMTP port 25).
- The exit node has to have the available capacity to support you. Tor tries to choose an exit node which has enough resources available.
All paths in the circuit obey these rules:
- We do not choose the same router twice for the same path.
If you choose the same node twice, itâs guaranteed that the node will either be the guard node (the node you enter at) or the exit node, both dangerous positions. There is a 2/3 chance of it being both the guard and exit nodes, which is even more dangerous. We want to avoid the entry / exit attacks.
This isnât okay. Node colour changes to show itâs the same.
- We do not choose any router in the same family as another in the same path. (Two routers are in the same family if each one lists the other in the âfamilyâ entries of its descriptor.)
Operators who run more than 1 Tor node can choose to signify their nodes as âfamilyâ. This means that the nodes have all the same parent (the operator of their network). This is again a countermeasure against the entry / exit attacks, although operators do not have to declare family if they wish. If they want to become a guard node (discussed soon) it is recommended to declare family, although not required.
- We do not choose more than one router in a given /16Â subnet.
Subnets define networks. IP addresses are made up of 8 octets of bits. As an example, Googleâs IP address in binary is:
01000000.11101001.10101001.01101010
The first 16 bits (the /16 subnet) is 01000000.11101001 which means that Tor does not choose any nodes which start with the same 16 bits as this IP address. Again, a counter-measure to the entry / exit attacks.
If subnets sound confusing, Iâve written this Python code to help explain them:
# have a play around with these# ip addresses are in binary, not the usual base 10# subnets are usually powers of 2, this is 2^4.IP = "01000000.11101001.10101001.01101010"subnet = 16
# this will store the subnet address once we find itsubnet_ip = []
IP_list = list(IP)counter = 0
# for every number in the ip addressfor i in IP_list: # we want to end the loop when we reach the subnet number if counter >= subnet: break # the ip address segments each oclet of bits with full stops # we don't want to count a fullstop as a number # but we want to include it in the final subnet if i == ".": subnet_ip.append(".") continue else: # else it is a number so we append and increment counter subnet_ip.append(i) counter = counter + 1print("Subnet is " + ''.join(subnet_ip))
- We donât choose any non-running or non-valid router unless we have been configured to do so. By default, we are configured to allow non-valid routers in âmiddleâ and ârendezvousâ positions.
Non-running means the node currently isnât online. You donât want to pick things that arenât online. Non-valid means that some configuration in the nodes torrc is wrong. You don't want to accept strange configurations in case they are trying to hack or break something.
- The first node must be a Guard node.
A guard node is a privileged node because it sees the real IP of the user. Itâs âexpensiveâ to become a guard node (maintain a high uptime for weeks and have good bandwidth).
This is possible for large companies who have 99.9% uptime and high bandwidth (such as Netflix). Tor has no way to stop a powerful adversary from registering a load of guard nodes. Right now, Tor is configured to stick with a single guard node for 12 weeks at a time, so you choose 4 new guard nodes a year.
This means that if you use Tor once to watch Amazon Prime Video, it is relatively unlikely for Netflix to be your guard node. Of course, the more guard nodes Netflix creates the more likely it is. Although, if Netflix knows you are connecting to the Tor network to watch Amazon Prime Video then they will have to wait 4 weeks for their suspicions to be confirmed, unless they attack the guard node and take it over.
Becoming a guard node is relatively easy for a large organisation. Becoming the exit node is slightly harder, but still possible. We have to assume that the large organisation has infinite computational power to be able to do this. The solution is to make the attack highly expensive with a low rate of success.
The more regular users of Tor, the harder is if for a large organisation to attack it. If Netflix controls 50/100 nodes in the network:
The chance of you choosing a guard node from Netflix is 50%.
If suddenly 50 more normal user nodes join then thatâs 50/150, reducing the probability of Netflix owning a guard node (and thus, a potential attack) and making it even more expensive.
There is strength in numbers within the Tor service.
Tor Hidden Services
Ever heard those rumours âthere are websites on the dark-web, on Tor that when you visit them youâll see people doing nasty things, selling illegal things or worse: watching The Hangover Part 3â
When people talk about these websites they are talking about Tor Hidden Services.
These are a wild concept and honestly deserve an entire blogpost on their own. Hidden services are servers, like any normal computer server.
Except in a Tor Hidden Service it is possible to communicate without the user and server knowing who each other are.
The device (the question mark) knows that it wants to access Netflix, but it doesnât know anything about the server and the server doesnât know anything about the device thatâs asked to access it. This is quite confusing, but donât worry, Iâm going to explain it all with cool diagrams. âš
When a server is set up on Tor to act as a hidden service, the server sends a message to some selected Onion Routers asking if they want to be an introduction point to the server. It is entirely up to the server as to who gets chosen as an introduction point, although usually they ask 3 routers to be their introduction points.
The introduction points know that they are going to be introducing people to the server.
The server will then create something called a hidden service descriptor which has a public key and the IP address of each introduction point. It will then send this hidden service descriptor to a distributed hash table which means that every onion router (not just the introduction points) will hold some part of the information of the hidden service.
If you try to look up a hidden service the introduction point responsible for it will give you the full hidden service descriptor, the address of the hidden serviceâs introduction points.
The key for this hash table is the onion address and the onion address is derived from the public key of the server.
The idea is that the onion address isnât publicised over the whole Tor network but instead you find it another way like from a friend telling you or on the internet (addresses ending in .onion).
The way that the distributed hash table is programmed means that the vast majority of the nodes wonât know what the descriptor is for a given key.
So almost every single onion router will have minimal knowledge about the hidden service unless they explicitly want to find it.
Letâs say someone gave you the onion address. You request the descriptor off the hash table and you get back the services introduction points.
If you want to access an onion address you would first request the descriptor from the hash table and the descriptor has, letâs say 4 or 5 IP addresses of introductory nodes. You pick one at random letâs say the top one.
Youâre going to ask the introduction point to introduce you to the server and instead of making a connection directly to the server you make a rendezvous point at random in the network from a given set of Onion Routers.
You then make a circuit to that rendezvous point and you send a message to the rendezvous point asking if it can introduce you to the server using the introduction point you just used. You then send the rendezvous point a one time password (in this example, letâs use âLabradorâ).
The rendezvous point makes a circuit to the introduction point and sends it the word âLabradorâ and its IPÂ address.
The introduction point sends the message to the server and the server can choose to accept it or do nothing.
If the server accepts the message it will then create a circuit to the rendezvous point.
The server sends the rendezvous point a message. The rendezvous point looks at both messages from your computer and the server. It says âwell, Iâve received a message from this computer saying it wants to connect with this service and Iâve also received a message from the service asking if it can connect to a computer, therefore they must want to talk to each otherâ.
The rendezvous point will then act as another hop on the circuit and connect them.
In short, a hidden service works like this, taken from here:
- A hidden service calculates its key pair (private and public key, asymmetric encryption).
- Then the hidden service picks some relays as its introduction points.
- It tells its public key to those introduction points over Tor circuits.
- After that the hidden-service creates a hidden service descriptor, containing its public key and what its introduction points are.
- The hidden service signs the hidden service descriptor with its private key.
- It then uploads the hidden service descriptor to a distributed hash table (DHT).
- Clients learn the .onion address from a hidden service out-of-band. (e.g. public website) (A $hash.onion is a 16 character name derived from the serviceâs public key.)
- After retrieving the .onion address the client connects to the DHT and asks for that $hash.
- If it exists the client learns about the hidden serviceâs public key and its introduction points.
- The client picks a relay at random to build a circuit to it, to tell it a one-time secret. The picked relay acts as rendezvous point.
- The client creates a introduce message, containing the address of the rendezvous point and the one-time secret, before encrypting the message with the hidden serviceâs public key.
- The client sends its message over a Tor circuit to one of the introduction points, demanding it to be forwarded to the hidden service.
- The hidden service decrypts the introduce message with its private key to learn about the rendezvous point and the one-time secret.
- The hidden service creates a rendezvous message, containing the one-time secret and sends it over a circuit to the rendezvous point.
- The rendezvous point tells the client that a connection was established.
- Client and hidden service talk to each other over this rendezvous point. All traffic is end-to-end encrypted and the rendezvous point just relays it back and forth. Note that each of them, client and hidden service, build a circuit to the rendezvous point; at three hops per circuit this makes six hops in total.
Tor is a fascinating protocol full of algorithms that have been refined over the years. Iâve come to appreciate Tor, and I hope you have to. Unfortunately, this article is far too long to summarise shortly. If you want to learn more, check out the paper on Tor titled âTor: The Second-Generation Onion Routerâ.
If you liked this article and want more like it, sign up to my email list below âš Iâll only send you an email when I have something new, which is every month / 2 months or so.
If youâre feeling extra generous, I have a PayPal and even a Patreon. Iâm a university student who writes these articles in my spare time. This blog is my full time job, so any and all donations are appreciated
Originally published at skerritt.blog on March 1, 2019.
How does Tor actually work? was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.
Disclaimer
The views and opinions expressed in this article are solely those of the authors and do not reflect the views of Bitcoin Insider. Every investment and trading move involves risk - this is especially true for cryptocurrencies given their volatility. We strongly advise our readers to conduct their own research when making a decision.