educational

PageRank Explained: Part 1

Not long ago, there was just one well-known PageRank Explained paper, to which most interested people referred when trying to understand the way that PageRank works. In fact, I used it myself. But when I was writing the PageRank Calculator, I realized that the original paper was misleading in the way that the calculations were done. It uses its own form of PageRank, which the author calls "mini-rank". Mini-rank changes Google's PageRank equation for no apparent reason, making the results of the calculations very misleading.

Even though the author abandoned mini-rank as a result of this and another paper, the original, unchanged paper is still available on the web. So if you come across a PageRank Explained paper that uses "mini-rank", it has been superceded and is best ignored.

What is PageRank?
PageRank is a numeric value that represents how important a page is on the web. Google figures that when one page links to another page, it is effectively casting a vote for the other page. The more votes that are cast for a page, the more important the page must be. Also, the importance of the page that is casting the vote determines how important the vote itself is. Google calculates a page's importance from the votes cast for it. How important each vote is is taken into account when a page's PageRank is calculated.

PageRank is Google's way of deciding a page's importance. It matters because it is one of the factors that determines a page's ranking in the search results. It isn't the only factor that Google uses to rank pages, but it is an important one. From here on in, we'll occasionally refer to PageRank as "PR".

Note: Not all links are counted by Google. For instance, they filter out links from known link farms. Some links can cause a site to be penalized by Google. They rightly figure that webmasters cannot control which sites link to their sites, but they can control which sites they link out to. For this reason, links into a site cannot harm the site, but links from a site can be harmful if they link to penalized sites. So be careful which sites you link to. If a site has PR0, it is usually a penalty, and it would be unwise to link to it.

How is PageRank Calculated?
To calculate the PageRank for a page, all of its inbound links are taken into account. These are links from within the site and links from outside the site:

PR(A) = (1-d) + d(PR(t1)/C(t1) + ... + PR(tn)/C(tn))

That's the equation that calculates a page's PageRank. It's the original one that was published when PageRank was being developed, and it is probable that Google uses a variation of it today – but they aren't telling us what it is. It doesn't matter though, as this equation is good enough.

In the equation 't1 - tn' are pages linking to page A, 'C' is the number of outbound links that a page has and 'd' is a damping factor, usually set to 0.85.

We can think of it in a simpler way: A page's PageRank = 0.15 + 0.85 * (a "share" of the PageRank of every page that links to it) where "share" = the linking page's PageRank divided by the number of outbound links on the page.

A page "votes" an amount of PageRank onto each page that it links to. The amount of PageRank that it has to vote with is a little less than its own PageRank value (its own value * 0.85). This value is shared equally between all the pages that it links to.

From this, we could conclude that a link from a page with PR4 and 5 outbound links is worth more than a link from a page with PR8 and 100 outbound links. The PageRank of a page that links to yours is important but the number of links on that page is also important. The more links there are on a page, the less PageRank value your page will receive from it.

If the PageRank value differences between PR1, PR2,.....PR10 were equal then that conclusion would hold up, but many people believe that the values between PR1 and PR10 (the maximum) are set on a logarithmic scale, and there is very good reason for believing it. Nobody outside Google knows for sure one way or the other, but the chances are high that the scale is logarithmic, or similar. If so, it means that it takes a lot more additional PageRank for a page to move up to the next PageRank level that it did to move up from the previous PageRank level. The result is that it reverses the previous conclusion, so that a link from a PR8 page that has lots of outbound links is worth more than a link from a PR4 page that has only a few outbound links.

Whichever scale Google uses, we can be sure of one thing. A link from another site increases our site's PageRank. Just remember to avoid links from link farms.

Note that when a page votes its PageRank value to other pages, its own PageRank is not reduced by the value that it is voting. The page doing the voting doesn't give away its PageRank and end up with nothing. It isn't a transfer of PageRank. It is simply a vote according to the page's PageRank value. It's like a shareholders meeting where each shareholder votes according to the number of shares held, but the shares themselves aren't given away. Even so, pages do lose some PageRank indirectly, as we'll see later.

Ok so far? Good. Now we'll look at how the calculations are actually done:

For a page's calculation, its existing PageRank (if it has any) is abandoned completely and a fresh calculation is done where the page relies solely on the PageRank "voted" for it by its current inbound links, which may have changed since the last time the page's PageRank was calculated.

The equation shows clearly how a page's PageRank is arrived at. But what isn't immediately obvious is that it can't work if the calculation is done just once. Suppose we have 2 pages, A and B, which link to each other, and neither have any other links of any kind. This is what happens:

Step 1: Calculate page A's PageRank from the value of its inbound links. Page A now has a new PageRank value. The calculation used the value of the inbound link from page B. But page B has an inbound link (from page A) and its new PageRank value hasn't been worked out yet, so page A's new PageRank value is based on inaccurate data and can't be accurate.

Step 2: Calculate page B's PageRank from the value of its inbound links. Page B now has a new PageRank value, but it can't be accurate because the calculation used the new PageRank value of the inbound link from page A, which is inaccurate.

It's a Catch 22 situation: We can't work out A's PageRank until we know B's PageRank, and we can't work out B's PageRank until we know A's PageRank.

Now that both pages have newly calculated PageRank values, can't we just run the calculations again to arrive at accurate values? No. We can run the calculations again using the new values and the results will be more accurate, but we will always be using inaccurate values for the calculations, so the results will always be inaccurate.

The problem is overcome by repeating the calculations many times. Each time produces slightly more accurate values. In fact, total accuracy can never be achieved because the calculations are always based on inaccurate values. 40 to 50 iterations are sufficient to reach a point where any further iterations wouldn't produce enough of a change to the values to matter.

This is precisiely what Google does at each update, and the reason the updates take so long.

One thing to bear in mind is that the results we get from the calculations are proportions. The figures must then be set against a scale (known only to Google) to arrive at each page's actual PageRank. Even so, we can use the calculations to channel the PageRank within a site around its pages so that certain pages receive a higher proportion of it than others.

You may come across explanations of PageRank where the same equation is stated but the result of each iteration of the calculation is added to the page's existing PageRank. The new value (result + existing PageRank) is then used when sharing PageRank with other pages. These explanations are wrong for the following reasons:

1. They quote the same, published equation - but then change it from PR(A) = (1-d) + d(......) to PR(A) = PR(A) + (1-d) + d(......) – this isn't correct, and it isn't necessary.

2. We will be looking at how to organize links so that certain pages end up with a larger proportion of the PageRank than others. Adding to the page's existing PageRank through the iterations produces different proportions than when the equation is used as published. Since the addition is not a part of the published equation, the results are wrong and the proportioning isn't accurate.

According to the published equation, the page being calculated starts from scratch at each iteration. It relies solely on its inbound links. The 'add to the existing PageRank' idea doesn't do that, so its results are necessarily wrong.

Confused? We've only just begun! Stay Tuned for More...

Copyright © 2024 Adnet Media. All Rights Reserved. XBIZ is a trademark of Adnet Media.
Reproduction in whole or in part in any form or medium without express written permission is prohibited.

More Articles

opinion

Best Practices for Payment Gateway Security

Securing digital payment transactions is critical for all businesses, but especially those in high-risk industries. Payment gateways are a core component of the digital payment ecosystem, and therefore must follow best practices to keep customer data safe.

Jonathan Corona ·
opinion

Ready for New Visa Acquirer Changes?

Next spring, Visa will roll out the U.S. version of its new Visa Acquirer Monitoring Program (VAMP), which goes into effect April 1, 2025. This follows Visa Europe, which rolled out VAMP back in June. VAMP charts a new path for acquirers to manage fraud and chargeback ratios.

Cathy Beardsley ·
opinion

How to Halt Hackers as Fraud Attacks Rise

For hackers, it’s often a game of trial and error. Bad actors will perform enumeration and account testing, repeating the same test on a system to look for vulnerabilities — and if you are not equipped with the proper tools, your merchant account could be the next target.

Cathy Beardsley ·
profile

VerifyMy Seeks to Provide Frictionless Online Safety, Compliance Solutions

Before founding VerifyMy, Ryan Shaw was simply looking for an age verification solution for his previous business. The ones he found, however, were too expensive, too difficult to integrate with, or failed to take into account the needs of either the businesses implementing them or the end users who would be required to interact with them.

Alejandro Freixes ·
opinion

How Adult Website Operators Can Cash in on the 'Interchange' Class Action

The Payment Card Interchange Fee Settlement resulted from a landmark antitrust lawsuit involving Visa, Mastercard and several major banks. The case centered around the interchange fees charged to merchants for processing credit and debit card transactions. These fees are set by card networks and are paid by merchants to the banks that issue the cards.

Jonathan Corona ·
opinion

It's Time to Rock the Vote and Make Your Voice Heard

When I worked to defeat California’s Proposition 60 in 2016, our opposition campaign was outspent nearly 10 to 1. Nevertheless, our community came together and garnered enough support and awareness to defeat that harmful, misguided piece of proposed legislation — by more than a million votes.

Siouxsie Q ·
opinion

Staying Compliant to Avoid the Takedown Shakedown

Dealing with complaints is an everyday part of doing business — and a crucial one, since not dealing with them properly can haunt your business in multiple ways. Card brand regulations require every merchant doing business online to have in place a complaint process for reporting content that may be illegal or that violates the card brand rules.

Cathy Beardsley ·
profile

WIA Profile: Patricia Ucros

Born in Bogota, Colombia, Ucros graduated from college with a degree in education. She spent three years teaching third grade, which she enjoyed a lot, before heeding her father’s advice and moving to South Florida.

Women In Adult ·
opinion

Creating Payment Redundancies to Maximize Payout Uptime

During the global CrowdStrike outage that took place toward the end of July, a flawed software update brought air travel and electronic commerce to a grinding halt worldwide. This dramatically underscores the importance of having a backup plan in place for critical infrastructure.

Jonathan Corona ·
opinion

The Need for Minimal Friction in Age Verification Technology

In the adult sector, robust age assurance, comprised of age verification and age estimation methods, is critical to ensuring legal compliance with ever-evolving regulations, safeguarding minors from inappropriate content and protecting the privacy of adults wishing to view adult content.

Gavin Worrall ·
Show More