South Asia

Table of Major South Asian languages and their Wikipedias

Wiki code Language Primary Country Number of speakers (Millions) Potential Users (Millions) Number of articles (7-09) # of articles >1500 bytes (7-09) Articles, 1 year growth rate (5/08-5/09) # of 5+editors (5-09) 5+ editors, 1 year growth rate (5/08-5/09) 5+ editors,2 year growth rate (5/07-5/09) Article to editor ratio
hi Hindi India *302 18.1 34,614 3,461 65% 72 60% 112% 481
bn Bengali Bangladesh *251 15.1 20,049 2,406 14% 53 33% 104% 378
ur Urdu Pakistan *101 6 10,667 1,707 36% 39 22% 50% 274
pa Punjabi Pakistan 91 5.5 1,408 113 380% 18 157% 260% 78
te Telugu India 70 4.2 43,556 3,049 8% 45 -21% 32% 968
mr Marathi India 68 4.1 23,651 1,419 32% 47 38% 74% 503
ta Tamil India 66 3.9 18,951 4,359 31% 61 22% 65% 311
gu Gujarati India 47 2.8 7,454 373 490% 25 39% 178% 298
ml Malayalam India 36 2.2 10,631 3,508 60% 83 17% 84% 128
kn Kannada India 35 2.1 6,812 1,022 19% 32 28% 100% 213
or Oriya India 32 1.9 538 5 79% 7 40% N/A 77
ps Pushto Pakistan 20 1.2 1,363 654 19% 20 33% 82% 68
si Sinhala Sri Lanka *18 1.1 1,847 55 161% 16 33% 167% 115
as Assamese India 17 1 249 960 15% 5 -17% -38% 50
ne Nepali Nepal 14 0.8 2,746 247 4% 14 -7% -22% 196
*Includes second language speakers
[1]

Table showing comparison of article depth and quality of top Indian language Wikipedias (as of 2009 August)

Language Official article count Articles over 0.5 Kb (percentage) Articles over 2 Kb (percentage) Average Bytes per article Edits per article Database Size (MB) Words (M) Images Page depth
Bengali 20,022 54 12 1244 15.8 74 3.7 1197 64
Hindi 33,497 34 10 1162 7.9 128 7.9 3386 21
Kannada 6,685 55 15 1381 12.4 28 1.2 1413 16
Malayalam 10,271 83 33 2590 24.9 80 3.0 6112 173
Marathi 23,448 24 6 716 12.8 56 2.4 1872 16
Tamil 18,625 81 23 1840 15.8 100 4.1 3533 26
Telugu 43,370 21 7 714 7.7 82 3.9 4947 5

 

 
growth of articles greater 1.5 kb

South Asian languages

  • There are over 900 million native language speakers of the 15 South Asian languages listed in the table, comprising 57% of the population of India, Pakistan, Bangladesh, Sri Lanka, and Afghanistan.
  • All of the 15 languages listed in the table are official languages of one or more South Asian country, or state, and are used extensively in the local print media.
  • While many of these languages have a long written history, there are limited digital resources in these languages on science, technology, and world history
  • While there are many other languages spoken in South Asia, these fifteen languages were chosen due to large number of speakers, importance as official languages, and as medium of educational instruction.

South Asia and Internet access and usage

  • Despite the large number of speakers of these languages, most lack access to computers and the Internet, with Internet use rates of 6% of the population for the region as a whole. Additionally, most people in South Asia with Internet access can also read English.[2]
  • Although English is still the dominant language on the web for India, there are active and growing online communities in Indian languages. For example www.maayboli.com, a Marathi language Internet community boasts of 100,000 hits monthly. Many of these online communities use a Drupal based platform which facilitates typing in Indian language fonts[3]

South Asian languages and education

  • All of the languages listed are used as mediums of instruction at the elementary and high school level in different states and countries of South Asia.
  • Many private high schools, some elite public high schools, and some advanced high school science classes are taught in English.
    • For example, in the Indian state of Maharashtra in 2009 there were approximately 1,500,000 students graduating from the 10th grade in the state high school system. Of these, 750,000 pursued a course of studies in the humanities which is taught in the official state language of Marathi, 300,000 in the sciences which is taught in English, and 450,000 ended their studies at the 10th grade[4]
  • English is the primary language for higher education in South Asia. However, some universities in India use local languages at the undergraduate level.

South Asian language Wikipedias

  • There are some South Asian language Wikipedias in all of the 15 listed languages in the table. Some have shown a moderate growth over time, while others have grown at a very slow pace.
    • There are three Wikipedias in South Asian languages of more than 3 million speakers that were not listed, Kashmiri, Sindhi and Bhojpuri, but none of these three have more than 2500 articles.


Barriers to growth of the South Asian language Wikipedias

  • There are several major barriers to the growth of South Asian Wikipedias:
    • Low awareness of tools for typing Indian scripts on Western-style keyboards and outdated computers and operating systems that do not allow people to read and type in South Asian language scripts
    • Lack of Wikipedia tools to facilitate editing in Indian languages and a lack of editors who have the technical skills to address problems and fix bugs
    • Low awareness of the existence of Indian language Wikipedias
    • Strong emphasis on English for advanced education and professional advancement


Updating of Potential user calculation

Notes

  1. Information on languages from Ethnologue 2009 http://www.ethnologue.com Potential users is calculated by multiplying the number of language speakers by the national or regional Internet use rate. Internet use rates from from the International Telecom Union 2008
  2. Information on Internet use from International Telecommunications Union 2008 /
  3. Wikimediaindia mailing list post Please note in this e-mail the word "lack" (also spelled lakh) is a word taken it from the Hindi language that means 100,000. So 2.5 lack monthly hits would mean 250,000 monthly hits
  4. Letter from Wikamediaindia mailing list Please note that this letter uses the hindi word lack (also spelled lakh) to refer to the number 100,000. Therefore, 7.5 lack students means at 750,000 students