educational

Multilingual Sites and Search Engines

The web has become a truly global medium, with the result that the content on many websites is either not in English, or contains a mixture of English and other languages. How do search engines handle such pages? More importantly, if you have a website that fits that description, how can you be sure that your site will turn up in the search results?

One of the characteristics of the global world is its diversity — cultural, religious, and linguistic. The internationalization and globalism of today’s world can be seen on the web as well. Although the dominant language for web content is definitely English, a large percentage of websites are written in a human language other than English.

Consider the fact that about 60 to 70 percent of the total global population is poorly educated in English and cannot use it as a language for research and communication.

Although it is true that not all of those people are online, they are the ones who will join the web in the next few years to contribute to the further growth of the Internet. This fact alone implies that there is a vast market for web content in at least ten more major languages (not to mention the hundreds of small national languages and dialects), which, although not as widespread as English, globally cover millions of people in total. And when this multilingual content exists, the next big question is how to make it retrievable by search engines.

APPROACHES TO CREATING MULTILINGUAL SITES

Very often websites are in more than one language. One approach to developing and maintaining sites for audiences who speak different languages is to make separate site (and often a separate domain) for each language. This is the general rule for global multinational corporations, who have a primary site (most often in English) and country-specific websites for each country they operate in. This approach requires more resources to create and maintain sites in several languages.

The second possibility is to have one site with versions in different languages, or even one site in which part of the content is in one language and part of it is translated into another one. In this case it is likely that the site will have several languages on the same page. There is nothing wrong with this. What is more, sometimes this is the wisest choice.

Search engines do not have a specially written discriminative approach to multilingual sites, but if the sites are not optimized for being searched as multilingual, the result is that they become more difficult to find. In other words, when you have a site in multiple languages, consider optimizing it in all of them. There are many SEO companies that offer optimization of sites in several target languages.

Most of the search engines provide options to search according to language and country preferences, thus giving users the chance to narrow down the search results to fewer but (supposedly) more relevant ones. But as practice shows, users rarely use this option when searching for non-English results. Instead, more users simply continue searching for non-English results using the default settings and do not bother changing them. Or maybe users believe that by limiting a search to a particular language, there is a risk of omitting useful results.

In any case, search engines do provide options for language specific searches. So the question is, how do search engines distinguish languages? There are several possible ways in which search engines distinguish language. Search engines are not humans; they do not recognize a language when they see it. Instead they use other means to classify whether a particular page is in a particular language. One place from which they get language information is the html lang=”xx” tag.

THE HTML LANG=”XX” TAG, SEARCHING BY LANGUAGE

Using this tag to set the language of the page is not mandatory, because search engines will find the page even if you leave the encoding to “en,” but it is better to indicate the language, because this will make it easier to distinguish that the page is in the particular language.

Of course, search engines do crawl the text itself and will notice the foreign language, but it will be far easier and much more effective if you provide them with more information, instead of relying on their ability to find it and display it. Providing the language information in the html lang=”xx” tag increases your chances to appear in the beginning of the search results when users search for results in a particular language only. On the other hand, the presence of the html lang=”xx” tag is not a guarantee that your page will top the lists of search results, because Google shows search results for pages where the search string has been found, no matter what the language of the page is set to.

Since Google is the No. 1 (and often only) search engine for the majority of users, it makes sense to discuss its options for searching for pages by their geographic location and how these options affect search results.

Searching by country is the second option, in addition to searching for pages in a particular language only, that Google offers for language (or dialect) specific content. In a sense, this feature allows Google to imitate a local search engine, where results are displayed only if they come from sites hosted in the local Internet space. The criterion for “local” is the IP of the hosting server — if the IP is registered as belonging to the local Internet space, then all pages hosted on it are local.

There is no need to say that searching for local pages for a particular search query reduces the number of search results several times in comparison to performing the search for the same query globally. What is worse, SEO experts rarely have control over deciding where to host the site, but in those cases when it is vital to achieve presence in the search results for a particular language, hosting it on a server in the target country seems a must.

As I already mentioned, global corporations as a rule have separate sites (and most often separate domains) for the countries they operate in. One reason for this is the IP and TLD filtering mechanism of search engines just described. Google might not be the most dreadful one because, by default, search by IP and domain is turned off, but many local search engines (for instance some French ones) do not list sites that do not have .fr as a toplevel domain and/or are not hosted on a site with a French allocated IP address.

Although character set issues are not directly related to languages but more to alphabets, it is worth mentioning them in this article. It is not enough to specify the language of the page only; its encoding must be specified as well. The general rule is that one encoding can be used for more than one language (i.e. Windows 1251 is for Cyrillic, and it can be used for Russian, Bulgarian, and other pages). The opposite is also true: there can be more than one encoding (ISO, for Windows, for Mac, and so forth) for a language. Of course, there is Unicode, but it often causes more problems (in the proper displaying of pages) than it solves. Because of this, web developers are reluctant to use it as an universal approach.

Since encoding is more about display than search, is there a relationship between encoding and search results? Yes, there is. First, it affects indexing. Although most major search engines index pages in any encoding, there are still search engines (starting with national ones) that index only a limited number of charsets.

Second, there are search engines which perform indexing and results retrieval of pages with not-so-popular encoding by recoding the character set (i.e. converting it to a different set). This operation (performed back and forth) can also influence search results. This is especially true for languages that have special symbols, for instance accented characters.

Third, for those search engines that allow wild card symbols and truncation, very often these functions are not fully supported for non-Latin charsets.

CONTENT REVEALS LANGUAGE

It is hardly surprising that when servicing requests for pages in a particular language only, Google determines the language based on the content on the page and on the context in which the search string occurs. How do search engines know so many languages?

Well, the answer is simple: they use NPL (natural language processing), i.e. they have some type of database that contains words in different languages, together with some grammar and structural rules specific to that language, which allows them to analyze the text and determine the dominant language of a page.

Marco Cevoli is tech manager for Translationsxxx.com. He can be reached at tech@translationsxxx.com

Related:  

Copyright © 2024 Adnet Media. All Rights Reserved. XBIZ is a trademark of Adnet Media.
Reproduction in whole or in part in any form or medium without express written permission is prohibited.

More Articles

profile

VerifyMy Seeks to Provide Frictionless Online Safety, Compliance Solutions

Before founding VerifyMy, Ryan Shaw was simply looking for an age verification solution for his previous business. The ones he found, however, were too expensive, too difficult to integrate with, or failed to take into account the needs of either the businesses implementing them or the end users who would be required to interact with them.

Alejandro Freixes ·
opinion

How Adult Website Operators Can Cash in on the 'Interchange' Class Action

The Payment Card Interchange Fee Settlement resulted from a landmark antitrust lawsuit involving Visa, Mastercard and several major banks. The case centered around the interchange fees charged to merchants for processing credit and debit card transactions. These fees are set by card networks and are paid by merchants to the banks that issue the cards.

Jonathan Corona ·
opinion

It's Time to Rock the Vote and Make Your Voice Heard

When I worked to defeat California’s Proposition 60 in 2016, our opposition campaign was outspent nearly 10 to 1. Nevertheless, our community came together and garnered enough support and awareness to defeat that harmful, misguided piece of proposed legislation — by more than a million votes.

Siouxsie Q ·
opinion

Staying Compliant to Avoid the Takedown Shakedown

Dealing with complaints is an everyday part of doing business — and a crucial one, since not dealing with them properly can haunt your business in multiple ways. Card brand regulations require every merchant doing business online to have in place a complaint process for reporting content that may be illegal or that violates the card brand rules.

Cathy Beardsley ·
profile

WIA Profile: Patricia Ucros

Born in Bogota, Colombia, Ucros graduated from college with a degree in education. She spent three years teaching third grade, which she enjoyed a lot, before heeding her father’s advice and moving to South Florida.

Women In Adult ·
opinion

Creating Payment Redundancies to Maximize Payout Uptime

During the global CrowdStrike outage that took place toward the end of July, a flawed software update brought air travel and electronic commerce to a grinding halt worldwide. This dramatically underscores the importance of having a backup plan in place for critical infrastructure.

Jonathan Corona ·
opinion

The Need for Minimal Friction in Age Verification Technology

In the adult sector, robust age assurance, comprised of age verification and age estimation methods, is critical to ensuring legal compliance with ever-evolving regulations, safeguarding minors from inappropriate content and protecting the privacy of adults wishing to view adult content.

Gavin Worrall ·
opinion

Account-to-Account Payments: The New Banking Disruptor?

So much of our industry relies upon Visa and Mastercard to support consumer payments — and with that reliance comes increased scrutiny by both brands. From a compliance perspective, the bar keeps getting raised until it feels like we end up spending half our time making sure we are compliant rather than growing our business.

Cathy Beardsley ·
profile

WIA Profile: Samantha Beatrice

Beatrice credits the sex positivity of Montreal for ultimately inspiring her to pursue work in adult entertainment. She had many friends working in the industry, from sex workers to production teams, so it felt like a natural fit and offered an opportunity to apply her marketing and social media savvy to support people she truly believes in and wants to see succeed.

Women In Adult ·
opinion

Understanding the Latest Server Processors

Over the last decade, we mostly stopped talking about CPU performance. Recently, however, there has been a seismic and exciting change in the CPU landscape, due to innovation by a chip company called Advanced Micro Devices (AMD).

Brad Mitchell ·
Show More