Author : Dr. Subi Chaturvedi, President, YES Global Institute, YES BANK
Today, with over 500 million internet users, India is the second largest online market in the world, only behind China. Thanks to a strong technology industry base, rapid growth in availability of affordable smartphones, cheap mobile data plans, as well as a sound policy push by governments at all levels, India’s wireless telephony and broadband subscribers have touched ~1189 mn and 463.66 mn, respectively. Government’s BharatNet program that aims to connect 250,000 gram Panchayats (GPs) through a network of optical fiber, has nearly
touched the halfway mark. At the same time, government’s DigiGaon scheme, has taken early steps towards bringing highspeed internet connectivity to over 700 villages. As India’s digital infrastructure challenges gradually get addressed, the country’s ‘Digital India’ vision needs to be supported by a strong Indic Internet ecosystem.
Shaping a digital society that integrates all social spheres, is crucial for India’s inclusive development. Although English continues to have a high aspirational value in India, only around 10 percent of Indians are estimated to speak the language. This necessitates localization of the Internet with vernacular languages to empower the country’s non-English-speaking vast majority.
A FAST EVOLVING INDIC INDUSTRY LANDSCAPE
As year 2018 witnessed considerable momentum towards creation of a 5G ecosystem in the country, the field of Indic Language Internet ecosystem has also seen itself take several large strides, with both technology majors as well as new-age startups gearing up to tap the
potential in this space:
Technology giants are leading the way
Global technology giants have announced major initiatives to bring Indian languages to the forefront. Google, which already supports 9 Indian Languages in ‘Google Search’, unveiled ‘Project Navlekha’ this year, to bring India’s 135,000 Indic language publications online. Microsoft India, on the other hand, announced the availability of new phonetic keyboards in ten Indian languages to members of its
Windows Insider Program.
World’s most popular messenger app – WhatsApp and its parent Facebook, already allow users in the country to change language, supporting 11 and 13 Indian languages respectively. Similarly, leading cab aggregators operating in India – Uber and Ola, have taken the vernacular route to further strengthen their position in the market with localized offerings, making their platform available in multiple
Indian languages. Amazon Web Services – a subsidiary of Amazon.com, has launched Hindi language support for ‘Amazon Polly’,
a machine learning service that turns text to speech and supports both Indian English and Hindi.
Startups are stepping in
With a surge in usage of smartphones, coupled with rapid growth in internet penetration driven by telecom operators offering cheap data across India, a new breed of startups – the vernacular startups – are coming to the fore. Right from News Publications (e.g. Dailyhunt), Localization platforms (e.g. Process9), e-payments (e.g. Bijlipay), Language-as-a-Service (e.g. Reverie), Regional Operating systems (e.g. Indus OS), Text-to-speech (e.g. IndianTTS), Networking platforms (e.g. ShareChat), to Typing software (e.g Lipikaar) – these startups are building products in diverse areas, drawing significant attention of marque investors, raising millions of dollars of funding.
India’s Indic language content ecosystem is still nascent, with numerous barriers, particularly in the areas of contextual conversion and content monetizing:
Natural Language Processing
Natural Language Processing (NLP), powered by Artificial Intelligence (AI) and Machine Learning (ML) is instrumental in proliferation of local languages online. However, current methods for NLP are converts text into data, and learning from the data patterns, not always resulting in contextual conversion. Adding to the problem is the fact that Indic languages are derived from Brahmic scripts, which aren’t easy for NLP to understand.
Monetizing Vernacular Content
Content monetization has been one of the foremost challenges among all. Despite the existence of a huge pool of local language users, with merely 1 per cent of online content (text) in Indic languages, advertisers haven’t shown the required willingness to explore this segment. On the other hand, with fewer revenue generation options, publishers think twice before creating Indic language online content. While emerging platforms such as Google’s Navlekha provides Indian language publishers with free web hosting, there is a need for more such platforms.
GOVERNMENT LEADING FROM THE FRONT
Central Government’s phenomenal thrust in building a strong ecosystem of Indian Language Internet is evident from a number of initiatives taken in this direction. Right from mandating Indic- language support for mobile devices to supporting multiple Indian languages in key mobile platforms such as UMANG (Unified Mobile Application for New-Age Governance) and BHIM (Bharat Interface for Money), the government has taken several encouraging steps to promote vernacular online content in the country.
C-DAC (Centre for Development of Advanced Computing) – the premier R&D organization of the Ministry of Electronics and Information Technology (MeitY), has enabled Indian languages on Pagers, Set-Top-Boxes, Dot Matrix Printers, Line Printers, Handheld devices, Digital cameras, etc. It has also been working on creating language corpora, dictionaries and tools. MeitY has also taken enormous efforts to include India’s 22 constitutionally-recognized languages in the Unicode Standard.
Further, NITI Aayog in its ‘AI for All’ program, articulated in its National AI Strategy, has committed to support AI-based developments in speech recognition, natural language processing and creation of varieties of new applications.
THE ROAD AHEAD – MAINSTREAMING THE BIG IDEAS
According to an analysis of India’s 2011 census data, released this year, over 19,500 languages or dialects are spoken in India as mother tongue. At the same time, 96.71 percent of India’s population speak one of the 22 Scheduled Languages as their mother tongue. 
With an estimated 234 million users, Indian language internet users have already outnumbered English language internet users (175 million) online. Therefore, digital content localization makes complete business sense for all players in the online business, right from platform creators, service providers, OEMs, to online publishers.
It is about time that we get over the age-old ‘chicken or egg’ conundrum of who should start first, and craft a sustainable inclusive ecosystem model that enables creation, contextual conversion, discoverability, consumption as well as monetization of Indian language content online. Every digital content ecosystem player must pass all litmus tests to be able to tap the tremendous potential unleashed by this burgeoning vernacular digital user base.
As Natural Language Processing (NLP) technology evolves to go beyond text to include voice and speech synthesis, language localization needs will evolve to Speech-to-Text and will find wide adoption across sectors such as Healthcare, Education, Ecommerce, Retail, Transportation, Agriculture, Legal, Banking and Payments, Entertainment, among others. This will further open up plethora of opportunities for marketers, driving ad revenues from both urban and rural markets.
While Technology giants such as Google, Microsoft, Facebook and Amazon have made the first move, a vernacular startups too have made significant advances by building localized apps and services from the ground rather than adding language support as an afterthought. Success for these startups will largely depend on adopting different revenue models including in-app payments/gifts, and not restricting themselves to just advertisement revenues.
Going forward, Industry-Academia collaboration will prove to be a real game- changer, leading to a true synergy between Technology and Liberal Arts for problem solving. At the same time, continuous policy push towards building capacity, enhancing last-mile connectivity, taking connections beyond Common Service Centers (CSCs) and relying increasingly on renewable sources of energy to power homes through the light of education, self- development and learning, will allow India to unlock the full potential of a true Digital Bharat. As we celebrate 150 years of the birth of the Father of the Nation, India must continue with its relentless efforts towards true inclusive development by becoming digitally inclusive.
Source: Telecom Subscription Data as on August 31, 2018, TRAI
Source: Google – KPMG report “Indian Languages – Defining India’s Internet”
This article originally appeared as a part of the 4th edition of FICCI-ILIA Newsleter, December 2018 to January 2019