No one knew the Internet
would become as popular as it has. When it started out as a way for
academics to communicate, it was built on ASCII text. ASCII
(American Standard Code for Information Interchange) includes the
Latin alphabet, numbers, punctuation marks, accented letters (like é
and ü) and mathematical symbols (like = and +). Of these, only the
standard alphabet, numbers, periods and hyphens are used in Web site
addresses. So, we ended up with an Internet with only Latin
characters and numbers in the DNS (Domain Name Server)
address.
Web sites are not actually
identified by names, but by IP numbers. The way the system works is
that when you type a name into your browser, the name is sent to a
remote server, which translates the name into numbers, and then
sends a request to the host of that site. You may have seen these
numbers when you set up a dial-up connection, or sometimes they
briefly appear at the bottom of your browser as the Internet looks
for the Web site you requested.
In the early days of the
Internet, few computing systems could deal with more than one set of
characters. Arabic Word didn't exist, and Chinese systems worked
only with Mandarin characters. In short, a computer system dealt
with only one language, and couldn't work with another system in a
different set of characters. The introduction of Unicode meant many
more languages could be included in computing. Unicode was
introduced in Windows 98, it is a subset of ASCII, which allows more
than 65,000 characters, compared with ASCII's 256.
An English-language Internet
would be fine if everyone spoke English (or French or Spanish--but
without the accents), but they don't. The fastest growing use of the
Internet is outside the United States and Europe. Asia, Africa and
the Middle East have some of the most promising and populous
countries. But there's a barrier to how fast the Internet can
penetrate, and that's the language barrier.
SPEAKING ARABIC
of egypt's 65 million people
only a small fraction have good enough English (or other European
language) skills to be able to surf the Internet fully. There are
around 500 million Arabic speakers in the world. The amazing
increase in Arabic Web sites has helped, with alternates for almost
every popular English site, but the need to type the address in
Latin characters will always limit Internet use in non
English-speaking countries. We have Arabic operating systems and
office suites, and even Web site authoring tools, but we still have
to deal with the legacy of the ASCII Internet. Countries like China
and Japan have huge populations excluded from easy access to the
Internet by the current Web address system.
INTERNATIONAL
PROBLEMS
the main problem is a
technical one. To change all the servers in the world to accept
other languages could take up to 10 years. The Internet rose to what
it is in less than five years--a decade is too long for it to be
international.
There are also various
linguistic problems involved, which differ with each language. Of
course, we're more interested in Arabic, which has its own set of
difficulties, but Chinese, Japanese and Russian have their own
problems.
It's not just the name of the
site, TLDs (Top-level domains), such as .com, .net and .org need to
be denoted some way into the foreign language. One company (see
below) continues to use this part of the Web address in English.
This is only a partial solution; for the Internet to be fully
international, it should be usable by someone entirely in their
native language. Other companies decided to shorten the TLD to one
letter or character. For example the Arabic letter sheen represents
the word shirka (meaning "company") for .com. Another company
decided to spell out the word in its entirety in Arabic. The
organization that oversees new TLDs, ICANN (Internet Corporation for
Assigned Names and Numbers), recently accepted several new TLDs, and
more will be available soon, exacerbating the
problem.
CcTLDs (country-code top-level
domains) are the part of a Web address that denotes the country (.eg
is Egypt, .uk is the UK, .de is Germany). This could cause problems
in the Middle East as Arabic uses few acronyms, so any country would
have to be fully spelled out, such as Central African Republic,
which could make Internet addresses impossibly long for anyone to
type. But they need to be included, as some Web sites use the same
name, but are a different company--www.egyptair.com.eg is the
Egyptian airline, but www.egyptair.com is a site about improving the
air quality in Egypt. An AINC conference (see below) decided on the
country domain names for Egypt, Bahrain and Saudi
Arabia.
Another potential problem is
for Web site administrators: Arabic (or any other non-English domain
name) is a secondary domain to the English-language address. If a
company wants to be "international," it must pay for each language
(at around $35 a name). You also have to remember to renew your site
in several languages.
There is also cybersquatting,
where an individual registers a name in a certain language, then
sells it to the company that uses the name--at an inflated price.
Now it's multilingual. How many enterprising Arabic-speakers will
try to register names like Amazon, eBay, Ford or Microsoft? All of
the domain name companies we looked at said they were not
responsible for the legal issues of cybersquatting.
LOST IN
TRANSLATION
there is also the problem of
translation/transliteration of names into Arabic (or any other
language). The registrars don't do this for free, so you may need to
hire someone to get the name right. If you're an Egyptian company
wanting to export to China, you may want to register your domain in
Chinese, but do you know how to write it? And is your computer set
up to type these languages?
There is also the dilemma of
transliterating back. For example the Arabic word for house could be
written as Bayt, Beit, Bet or Beyt, but can only be registered once
in Arabic.
BIG BUSINESS
companies developing
multilingual DNS systems are competing on shaky ground. The
authorities that oversee Internet standards are keen that the
operation of the Internet remains the same. If multilingual Web
addresses adversely affect the operation of the Internet, everyone
loses.
One of the problems is knowing
which, if any, of the technologies will become mainstream. If you
are thinking of registering a domain name in Arabic, be warned: it
could change. You might register with one company, then find another
system is adopted. Some of the companies guarantee that if theirs is
not the standard adopted, they will migrate your name to the other
system. But, your name may already be taken on that system. Some of
the systems are already up and running, but there's no guarantee
surfers will be able to find your site, or know how to use Arabic
Web addresses on their computer.
For many it's a waiting game.
They're holding back until one system is in place, then hoping the
name they want is still available. Of course, if there's a case of
cybersquatting, that will be something the courts have to
decide.
THE PLAYERS
WALID INC.
walid www.walid.com offers
several different languages: Japanese, Chinese, Hindi and Arabic.
The user downloads a small application, WorldConnect, which then
translates the foreign-language DNS to the numbers it represents,
and sends it through the normal Internet infrastructure. This is
known as a client-side solution, and is available for use
immediately. However, anyone who wants to surf in a non-English
language has to download the (free) software. Unfortunately, it
works only on Windows platforms with multilingual capabilities
(other OS like Linux and Unix should be supported soon). Walid hopes
ISPs will distribute WorldConnect with their Internet application
packs, so users won't have to download it. This system also works
with FTP and telnet applications.
MILLENNIUM
INC.
this is a server-side system,
where a server on the Internet translates the address. It uses a
multilingual name for the site name, and English for the TLD (.com,
.org or .net). One advantage of this system is it offers 39
character sets, which represent 250 languages. Users can set their
DNS address on Windows so the computer contacts the server that
translates the address and sends the request to the Web site.
Verisign's subsidiary, Network Solutions (NSI), is currently testing
the system at www.verisign-grs.com/idn/index.html. The company is
working with the IETF (see below), and if testing is successful, the
IETF will approve the system for the Internet. You can reserve
domain names for this service on a number of Web sites:
www.nativenames.net, www.arabicdomainname.com, www.arabicnames.com
and www.any-dns.com.
I-DNS
this system works at both the
client level and the server level. The company (www.i-dns.net) has
formed alliances with ISPs around the world for them to install
i-DNS software, through which DNS names can be resolved. There is
also a client application called iClient (a free download from the
company's site), which will translate the address on the user's
computer, and then send the request to a normal server. It is a
plug-in, and works with Internet Explorer, Netscape and e-mail
programs that support multilingual domains. The company uses a
single letter to denote the TLD. This system is up and running, and
perhaps offers the best of both worlds, as a user can surf
multilingual domains, even if their ISP does not support
them.
THE POWERS THAT BE
the ietf (Internet Engineering
Task Force) is the Internet standards-setting body. It works mainly
on technical issues. Its IDN (Internet Domain Name) working group
(www.i-d-n.net) works on multilingual domain names. One of the
issues it is discussing is which code standard should be used to
replace ASCII. The most popular choice is the UTF-8 (Universal
Transformation Format) standard, which converts Unicode to 8-bit
bytes.
ICANN www.icann.org is the
Internet Corporation for Assigned Names and Numbers, a nonprofit
organization that oversees Internet standards. One of its main roles
is to supervise domain names. It is the body that decides whether we
will have a .biz or a .info to augment the other TLDs. ICANN
recommends any multilingual system should be open (not monopolized
by any company), compatible with the existing system and allow any
DNS to be resolved by any system anywhere. ICANN is monitoring the
NSI testbed.
MINC is the Multilingual
Internet Names Consortium (www.minc.org). It is made up of companies
wanting to work with multilingual domains, and it works with
IETF.
AINC is the Arabic Working
group of MINC. It looks into the specific linguistic and technical
problems of Arabic. www.minc.org/WG/arabic/. It met in Amman, Jordan
in the spring to discuss some of the issues. Many of the founding
members are also senior management in the companies that are
offering Arabic domain names. The group will meet again this month
in Cairo. |