Latest news about Bitcoin and all cryptocurrencies. Your daily crypto news habit.
As a student of programming and, to an extent, computer science, I frequently find new and exciting historical achievements to fascinate me, at least within the realm of computer science. This time, it was Machine Learning, but more than that, it was Google’s Pagerank algorithm. What’s more, it started with an interest in how to increase my internet visibility, then it turned to how does Google decide what gets shown and what stays hidden, finally it turned to “how do I implement this?”. These questions that I continuously ask myself lead to greater and more frequent discoveries.
Well, how does Google decide what gets shown and what stays hidden? In the simplest of terms, it is based on how many times your webpage links to a different webpage and what is the quality of information on the second webpage. In essence, the more your webpage links to other web pages, the more likely Google will promote your webpage.
How does Google determine this? When sifting through web pages, the chance of you clicking a link to go to another web page must add to 100%. Granted, in more advanced implementations of this algorithm, you must account for the chance someone will type in a web page to the URL. For now, at least, we will ignore this possibility for simplicity’s sake. In order to determine what website will send you where and with what frequency, the algorithm forms a matrix using the mathematics of Linear Algebra. The matrix is built by taking a count of each time your webpage links to another webpage. For example, take a look at the illustration below.
Image credit goes to Imperial College London.
In the illustration,
A links to B, C, and D.
B links to A and C.
C links to A, D, and F.
D links to C
E links to D and B
F links to D, G, and C
G links to G.
This provides a matrix that looks like the one pictured below.
Image from https://matrix.reshish.com/
To implement the Pagerank algorithm, you take a vector, let's call it ‘r’. Vector r will represent the people currently on the internet looking at a particular topic. Now, repeatedly take the dot product with the matrix until the numbers stabilize. You do this to check the probability that the people on the internet will be going to each of the particular websites allotted. After so many repetitions, the numbers will converge. The resulting vector will have the probability that each of these people will be on any particular site.
For example, with the matrix above and the vector r containing 100 people random selecting one of the websites above, the result will converge at:
r1 = 0
r2 = 0
r3 = 0
r4 = 0
r5 = 0
r6 = 0
r7 = 100
Why does r7 = 100?
The longer these 100 people are on the internet, the more likely they are to get to webpage G. Webpage G only links to itself, so they will be unable to leave it once they arrive. When the convergence occurs, the only possible solution is that everyone will be on webpage G.
What if we choose a different set of web pages?
Image from https://matrix.reshish.com/
Removing the possibility of webpage G gives the result shown above. Everything else remains the same, except for web page F’s link to G and web page G no longer exists.
When we implement the above process to the new matrix, the result is as follows:
r1 = 16
r2 = 16/3
r3 = 40
r4 = 76/3
r5 = 0
r6 = 40/3
16 + 16/3 + 40 + 76/3 + 0 + 40/3 = 100
Therefore, when we reach convergence, the majority of people will be on webpage 3, followed by webpage 4.
That is a basic implementation of Google’s Pagerank algorithm. If you have any questions, comments, or concerns, please leave them as a comment down below.
Thank you for reading!
Implementing Google’s Pagerank Algorithm was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.
Disclaimer
The views and opinions expressed in this article are solely those of the authors and do not reflect the views of Bitcoin Insider. Every investment and trading move involves risk - this is especially true for cryptocurrencies given their volatility. We strongly advise our readers to conduct their own research when making a decision.