Indic Language on a Webpage

How to use an Indic language on a webpage. Thanks to this free advice, now you too can use multiple Indic languages on any webpage.

by Google+

Language Examples

Here are some examples of Indic languages displayed on a webpage.

  • Sanskrit परकत हगदपुतक रबह गद हबगलस ले्बूीांम
  • Hindi इन भाषाओं में
  • Bengali বাংলা
  • Telugu తెలుగు
  • Marathi मराठी
  • Tamil தமிழ்
  • Gujarati ગુજરાતી
  • Kannada ಕನ್ನಡ
  • Malayalam മലയാളം
  • Punjabi ਪੰਜਾਬੀ

UTF-8

Use UTF-8 character encoding everywhere. More specifically, use UTF-8 without the byte-order mark (BOM). Set the character encoding to UTF-8 in your HTML, your editor, your database, and your web server (Apache).

Set UTF-8 in Your Editor

Save all your files - .html, .php, etc. - as UTF-8 without the BOM.

UltraEdit

In the status bar, to the right of the Ln and Col numbers, you will see either DOS or U8-DOS, to indicate the encoding as either ASCII or UTF-8 respectively.

To change an open ASCII file to UTF-8, click File -> Conversions -> ASCII to UTF-8 (Unicode editing).

By default, every new file you create in UltraEdit uses ASCII encoding. To change the default, click Advanced -> Configuration -> Editor -> New File Creation, and select the option Create new files as Unicode.

To open existing files, correctly, set the config like this: click Advanced -> Configuration -> File Handling -> Unicode/UTF-8 Detection, and check Auto detect UTF-8 files.

In DreamWeaver

Modify --> Page Properties --> Title/Encoding , use the Encoding DD to change it to UTF-8.

Set UTF-8 in HTML

Add the charset meta-tag to the head section of every HTML webpage. Make this the first line after the <head> tag, even before the title tag.

<meta http-equiv="content-type" content="text/html"; charset="UTF-8">

Or the new shorter version for HTML5:

<meta charset="UTF-8">

Set UTF-8 in Database

Both mySQL and Postgres normally default to UTF-8. If this is not the case in your installation, then set the charset option when creating the database.

Keyboard Input

After changing language, bring up on-screen keyboard.

Control panel -> Ease of Access Center -> Start on-screen keyboard

An alternative to installing the Windows language keyboards:
http://www.branah.com/

Sanskrit and Malayalam Language Tools

Alphabets

Sanskrit is the mother tongue of Indo-European languages, including Latin, English, Spanish, French, German and Italian. Sanskrit is also the basis of most languages spoken in India, especially in the north.

Hindi, the most common Indian language, shares the same written script as Sanskrit, although the sentence structure and grammar are quite different. Pure Hindi, called "shuddh Hindi," is directly from Sanskrit and thus shares many words with it. Hindustani is the version of Hindi that is half Urdu, which is from Persian. Hindustani is the most common form of Hindi spoken in India today.

It is important to distinguish between a written script and a language. A language is a way of communicating, and has a grammar which defines its word and sentence formation. For example, "I eat soup" is a simple sentence in the English language. To write a language on paper, you need a script. In English, we use what is called Roman script. The Sanskrit language is usually written in a script called Devanagari.

Terms

language a way of communicating, and has a grammar which defines its word and sentence formation. For example, "I eat soup" is a simple sentence in the English language.

script a way of writing a language on paper

glyph each letter in the script is a glyph

diacritical marks little marks added above and below the letters Other Indo-European languages use the Roman script as well, sometimes with little marks added above or below certain letters, called "diacritical" marks. For example, the French ague or the German umlaut.

99.9% of the time, one particular language will be written in one particular script. Sanskrit is the exception.

Sanskrit uses Devanagari script, same as Hindi, Marathi, and Nepali. But some vedic documents uses additional characters and accents that are not in this script, and not in unicode. This is a special issue being addressed by committees.

Assumption: script = alphabet

Language Families

			Indo-European
				Greek
				Italic
				Romance, developed from Latin in 6th thru 9th centures
				Spanish, Portuguese, French, Italian, Romanian, and 18 others
			
			proto-Germanic, iron age language
				Germanic
				German, English, Swedish, Dutch
			
			Greek
				Early Cyrillic, 9th century AD
				Cyrillic
				Slavic, Russian
			
			Arab
				Persion
			
			Indic
				Sanskrit
			

Latin Alphabet

  • Also known as the Roman Alphabet.
  • originated from the greek alphabet
  • first used to write latin
  • Much of the world now uses the Latin alphabet.
  • Most european languages, including English, German, Spanish
  • Czech, Polish, Romanian, Vietnamese, Igbo
  • When the Soviet Union broke up, many Eastern European countries switched from Cyrillic alphabet to Latin alphabet
  • After World War II, many Turkish countries, Turkmenistan, Uzbekistan, and Azerbaijan, changed their original alphabets (Arab, Persian or Cyrillic) to the Latin alphabet.

Indic Alphabets

Many Indic languages share the same alphabetic sound structure. Each uses a different written glyph for the same actual sound. Because the sounds are the same, you can write many different languages using the same script. For example, T. Krishnamacarya, the teacher of BKS Iyengar, Pattabhi Jois and TKV Desikachar, wrote Sanskrit verses using his native Telegu script. The Sanskrit language is usually written in a script called Devanagari. Indic scripts have no capitalization. The Sanskrit alphabet in Devanagari is shown below.

Sources

Keywords