Machine Learning Transforms Google Translate Overnight
Compare these two different translations of Ernest Hemingway’s “The Snows of Kilimanjaro”. This passage was first translated into Japanese by Jun Rekimoto, a professor of human-computer interaction, at the University of Tokyo. Google Translate was then used to convert back to English [1].
From Google Translate:
Kilimanjaro is a mountain of 19,710 feet covered with snow and is said to be the highest mountain in Africa. The summit of the west is called “Ngaje Ngai” in Masai, the house of God. Near the top of the west there is a dry and frozen dead body of leopard. No one has ever explained what leopard wanted at that altitude.
From Google Translate 24 hours prior to the above translation:
Kilimanjaro is 19,710 feet of the mountain covered with snow, and it is said that the highest mountain in Africa. Top of the west, “Ngaje Ngai” in the Maasai language, has been referred to as the house of God. The top close to the west, there is a dry, frozen carcass of a leopard. Whether the leopard had what the demand at that altitude, there is no that nobody explained.
You will notice that there is a substantial improvement from version 1 to version 2. How could this possibly occur overnight? The answer is machine learning. Google is now infamous for boldly stating that they are now “AI First”, a drastic change from being “Mobile First” less than a decade ago. One of the first steps Google made in this transition was establishing Google Brain, a research sector dedicated to integrating deep learning artificial intelligence with Google’s suite of computing products and services. Their motto on their website is “Make machines intelligent. Improve people’s lives.” This motto, scary to some such as Elon Musk who stated “AI is a fundamental existential risk for human civilization” [3], has huge implications for the development of technology within the next few decades.
Google Translate was released in 2006 and is now globally one of its most used products. In America, I find myself seldom using Translate, taking for granted that the internet is largely based on my native language; English. It rarely occurs to me that most of the world have more frequent interactions with various languages. The European Union, for example, has 23 officially recognized languages [4]. Google Translate helps over 500 million users per month searching for over 140 billion words per day. The patterns of global usage can also provide useful geopolitical insights. During the ongoing Syrian refugee crisis and mass exodus to Europe, Google Translate has witnessed a five-fold increase in translations from Syrian to German.
The Translate team worked tirelessly throughout 2016 to integrate machine learning into their product. Within nine months, they had made huge improvements to translations from English to eight of the most widely used languages. Their goal in 2017 was to convert the rest of their languages to being based on machine learning at a rate of eight per month. So how have these improvements occurred? Google Translate has made improvements, just as a human would…by reading. Machine learning is a subfield of artificial intelligence that gives “computers the ability to learn without being explicitly programmed”, according to Arthur Samuel in 1959 [5]. Language is a perfect application of machine learning, because of its organic development and fluid complexity. A toddler learns to speak, not by learning grammatical rules, but merely by listening to the environment around him or her. A toddler’s grammar, syntax, cadence, and accent are emulations of the speech heard from parents, daycare friends, television, and possibly even artificial intelligence such as Alexa or Siri now. Similarly, we can train Google Translate or other Natural Language Processing (NLP) applications to learn colloquially by experience, not by rules. Translate is essentially fed a huge training data set of translations. English, along with many other languages, is riddled with as many exceptions as there are rules. Basic pre-programmed artificial intelligence has been great for performing rigid tasks limited by a specific parameter of rules, however, language has slight nuances such as sarcasm, bizarre prepositions, and awkward synonyms. A basic translation software could easily confuse “minister of agriculture” with “priest of farming”, while any person with a modest grasp of English would know the correct version.
We are slowly starting to see artificial intelligence permeate throughout technology we interact with. This is both exciting and also daunting as the fear of AI replacing menial jobs slowly becomes a reality. Machine learning is just one of the methods that will be used to train AI to become both more competent and more general in its abilities. Leveraging AI is an option that we cannot ignore and it undoubtedly will have substantial positive impact on humanity. Hopefully, the morality of man and inevitably the regulations of government will allow for AI to be responsibly developed and implemented.
[1] https://www.nytimes.com/2016/12/14/magazine/the-great-ai-awakening.html?_r=0
[2] https://research.google.com/teams/brain/
[3] http://www.npr.org/2017/07/17/537686649/elon-musk-warns-governors-artificial-intelligence-poses-existential-risk
[4] https://www.theguardian.com/news/datablog/2014/sep/26/europeans-multiple-languages-uk-ireland
[5] https://www.cims.nyu.edu/~munoz/files/ml_optimization.pdf
One comment on “Machine Learning Transforms Google Translate Overnight”
Comments are closed.
Hi,
This was a great post Jacob. It is really amazing to see how artificial intelligence powered by big data and machine learning can change our lives for better. Little by little our experiences are becoming more technological (the expansion of the Internet of Things). Gadgets such as Apple Watch, Nest and Amazon Echo are really inventions “from the future” and future is already here. However, there are many concerns about the new utilities that this technology can bring. I believe that is up to the people to implement and control a responsable development of AI.
Best,
Arthur