The key to cracking long-dead languages? 机器学习如何破译早已消亡的古 […]

The key to cracking long-dead languages?

Broken and scorched black by fire, the dense, wedge-shaped marks etched into the ancient clay tablets are only just visible under the soft light at the British Museum. These tiny signs are the remains of the world’s oldest writing system: cuneiform.

在大英博物馆(British Museum)的柔光照射下,人们只能勉强看到镌刻在这些古老泥板上的密密麻麻的楔形标记。这些细小的标记是世界上最古老的书写系统――楔形文字的遗迹。

Developed more than 5,000 years ago in Mesopotamia, the land between the Tigris and Euphrates rivers where modern-day Iraq now lies, cuneiform captured life in a complex and fascinating civilisation for some three millennia. From furious letters between warring royal siblings to rituals for soothing a fractious baby, the tablets offer a unique insight into a society at the dawn of history.


They chronicle the rise of fall of Akkad, Assyria and Babylonia, the world’s first empires. An estimated half a million of them have been excavated, and more are still buried in the ground.


However, since cuneiform was first deciphered by scholars around 150 years ago, the script has only yielded its secrets to a small group of people who can read it. Some 90% of cuneiform texts remain untranslated.


That could change thanks to a very modern helper: machine translation.


“The influence that Mesopotamia has on our own culture is something that people don’t know much about,” says Émilie Pagé-Perron, a researcher in Assyriology at the University of Toronto. Mesopotamia gave us the wheel, astronomy, the 60-minute hour, maps, the story of the flood and the ark, and the first work of literature, the Epic of Gilgamesh. But its texts are mainly written in Sumerian and Akkadian, languages that relatively few scholars can read.

"人们并不了解美索不达米亚文明对自身文化的影响,"多伦多大学亚述学研究员佩龙(Emilie Page-Perron)说。美索不达米亚文明孕育了车轮、天文学、一小时60分钟的计时制、地图、洪水和方舟的故事、以及第一部文学作品――《吉尔伽美什史诗》。这本诗集主要是用苏美尔语和阿卡德语写成的,能读懂这些语言的学者少之又少。

Pagé-Perron is coordinating a project to machine translate 69,000 Mesopotamian administrative records from the 21st Century BC. One of the aims is to open up the past to new research.


“We have information about so many different aspects of the lives of Mesopotamian people, and we can’t really profit from the expertise of people in different fields like economics or politics, who if they had access to the sources, could help us tremendously to understand those societies better,” says Pagé-Perron.


Apart from the clay tablets, there are also more than 50,000 Mesopotamian engraved seals scattered in collections around the world. For millennia, the people of Mesopotamia used seals made of engraved stone that were pressed into wet clay to mark doors, jars, tablets and other objects. Only some 10% of these have even been catalogued, let alone translated.


“We have more sources from Mesopotamia than we have from Greece, Rome and ancient Egypt together,” says Jacob Dahl, a professor of Assyriology at the University of Oxford. The challenge is finding enough people who can read them.

牛津大学亚述学教授达尔(Jacob Dahl)表示:"我们所获得的关于美索不达米亚文明的资料比希腊、罗马和古埃及的加起来还要多,但真正的挑战在于找到能读懂它们的人。"

Pagé-Perron and her team are training algorithms on a sample of 4,000 ancient administrative texts from a digitised database. Each records transactions or deliveries of sheep, reed bundles or beer to a temple or an individual. Originally impressed into the clay with a reed stylus, the texts have already been transliterated into our alphabet by modern scholars. The Sumerian word for big, for example, can be written in cuneiform signs, or it can be written in our alphabet as “gal”.


The wording in these administrative texts is simple: “11 nanny goats for the kitchen on the 15th day”, for example. This makes them particularly suitable for automation. Once these algorithms have learned to translate the sample texts into English, they will then automatically translate the other transliterated tablets.


“The texts we’re working on are not very interesting individually, but they’re extremely interesting if you take them as groups of texts,” says Pagé-Perron, who expects the English versions to be online within the next year. The records give us a picture of day to day life in ancient Mesopotamia, of power structures and trading networks, but also of other aspects of its social history, such as the role of female workers. Searchable translations would enable researchers from other areas to explore these rich facets of life in the ancient world.


“These people are so different and so remote from us, but at the same time, they have the same basic problems,” explains Pagé-Perron. “Understanding Mesopotamia is a way of understanding what it means to be human.”


She hopes machine analysis will also clarify certain features of Sumerian that still puzzle modern academics. This extinct language is not related to any modern language but has been preserved in inscriptions written in cuneiform. It may be our last remaining link to even older, unrecorded societies.


“Sumerian is probably the last member of what must have been a large family of languages that goes back thousands and thousands of years,” says Irving Finkel, the curator in charge of the 130,000 cuneiform tablets stored at the British Museum. “Writing appeared in the world just in time to rescue Sumerian… We’re just lucky that we had some ‘microphone’ that picked it up before it went away with all the others.”

"苏美尔语可能是数千年前的语言大家庭中的最后一个成员,"芬克尔(Irving Finkel)说。"文字及时地出现在这个世界上,拯救了苏美尔语……幸运的是,在苏美尔语与其他文字一起消失之前,我们及时地开始学习这种语言。"

Finkel is one of the world’s leading cuneiform experts. In his book-filled office at the British Museum, he explains how the script was slowly deciphered thanks to a multi-lingual inscription about a king, just like the Rosetta Stone that helped researchers make sense of Egyptian hieroglyphs.


“It’s actually rather astonishing how interesting it is when you find a human mind across millennia, where it is like talking to them on the telephone,” he says. “It’s the most exciting thing in the world when you meet one of these people.”


Ancient access


Few of us will ever cradle a 5,000-year-old tablet in our palm. But thanks to advanced imaging techniques, anyone with an internet connection can now access treasures such as the world’s oldest surviving royal library, which is being digitised. It was built in Nineveh by Ashurbanipal, a powerful and book-loving Assyrian king. Some of the surviving tablets from his library are displayed at the British Museum as part of a special exhibition on Ashurbanipal. Although blackened and hardened by fire when Nineveh was sacked in 612 BC, the text they carry can still be read.


New imaging techniques are making the job of working with such ancient, often damaged texts easier. With highly detailed images, it is possible to pick out marks that may be too obscure to see with a human eye.


Dahl and his colleagues have been digitising tablets and seals stored in collections in Teheran, Paris and Oxford for a project known as the Cuneiform Digital Library Initiative. This vast online database already contains about a third of the world’s cuneiform texts, as well as some undeciphered written languages, such as Proto-Elamite from ancient Iran. Without sprawling digital resources like this, training machines to do translation would not even be possible.

达尔和他的同事一直在进行一个名为"楔形文字数字图书馆倡议"(Cuneiform Digital Library Initiative)的项目,将储存在德黑兰、巴黎和牛津馆藏中的碑文及印章进行数字化处理。这个庞大的在线数据库已经包含了世界上约三分之一的楔形文字,以及一些未被破译的书面语言,如古伊朗的原始埃兰语。如果没有这样庞大的数字资源,让机器进行翻译几乎是不可能的。

Digitisation is also helping researchers to piece together links between texts scattered in collections around the world. Dahl, along with researchers at the University of Southampton and the University of Paris-Nanterre, has digitised 3D images of about 2,000 stone seals from Mesopotamia. In a pilot project, they then used AI algorithms to examine a group of six tablets and identify matching seal impressions found elsewhere in the world. The algorithm correctly selected a tablet that is currently stored in Italy, and another that is stored in the United States; both had been stamped by the same seal.


Matching seals and impressions has been notoriously difficult in the past, as many are stored thousands of miles apart. Dahl estimates that all seals could be digitised within about five years, which would then make it possible to trace other patterns. There is some indication, for example, that certain types of stone were favoured by women.


“That is the kind of question you could not answer unless you had large numbers of seals imaged in the way we’re doing, and applying techniques like algorithms or machine learning,” Dahl says. He hopes that as artificial intelligence evolves, it will help us unravel the full potential of the rich information contained in collections around the world.


“I want Assyriology, which covers half of human history and a very endangered cultural heritage, to be at the forefront of this.”


Cracking codes


Imaging is also changing research into undeciphered scripts. Humans tend to be better than machines at this type of decipherment, which typically involves small amounts of text, creative mental leaps, and an understanding of how people lived and organised themselves. It also involves a great deal of intellectual flexibility.


Early cuneiform signs, for example, were not even arranged in a linear text, but simply placed together with a box drawn around them. Proto-Elamite is three-dimensional: a shallow impression of a circle has a different meaning than a deeper one. However, technology has helped the decipherment process by providing detailed pictures that can be magnified, shared and compared.


“The crucial problem is first and foremost to get proper images,” says Dahl, who is working on deciphering the mysterious script. “That’s lacking for the first 100 years of study of Proto-Elamite.”


Such advances go beyond the field of Assyriology. Philippa Steele, a senior research fellow at Cambridge University, is an expert in the early writing systems of ancient Crete and Greece. These include ‘Linear A’, an undeciphered script, and ‘Linear B’, which was used to write an ancient form of Greek.

这些进步已经超越了亚述学领域。剑桥大学高级研究员斯蒂尔(Philippa Steele)是研究古克里特和希腊早期文字系统的专家。其中包括"线形文字A"(一种未破译的文字)和"线形文字B"(一种古代希腊语的书写形式)。

Thanks to techniques that take sophisticated images of ancient tablets that feature these scripts, Steele has discovered new details.


“You can make out features that are very difficult to make out with the naked eye,” she says. “And often those features might correspond to the ways in which the person writing the document interacted with the document. So for Linear B, for example… you can make out erasures. Sometimes you can tell when the person writing the document has worked something out and then written something over the top.”


Pagé-Perron hopes that machines will eventually be able to translate more complex Sumerian tablets, and other languages like Akkadian. “There’s a lot more to discover about ancient cultures,” she says.


Perhaps one day, we will be able to read all of our earliest texts in translation C though many of Mesopotamia’s riddles are likely to outlive us, not least because many missing cuneiform fragments are still in the ground, waiting to be excavated.


The kings of ancient Mesopotamia thought deeply about the past and the future. They revered cuneiform texts from previous eras, and buried special inscriptions recording their names and achievements, promising rewards for a later ruler who would honour them.


In some ways their wish came true. Their battles and conquests may be forgotten by most. But their most powerful invention, writing, has helped humanity develop ideas and technologies over millennia C and now, train machines to learn from the past.


本文由 语料库 作者:Tmxchina 发表,其版权均为 语料库 所有,文章内容系作者个人观点,不代表 语料库 对观点赞同或支持。如需转载,请注明文章来源。