Series – Papyrus maximus, or how to read an ancient text with the help of AI

Secret codes, dead languages, faction jargons or caste sabirs: this series focuses on decoding obscure languages ​​from yesterday to today. Today: how artificial intelligence helps to read and translate ancient papyri.

By asking the automatic Google Translation service to translate into French the first verses of Genesis in Hebrew, we obtain: “In the beginning God created the heavens and the earth. And the earth was in confusion and confusion (sic), and darkness, on the face of the deep; And the spirit of God, hovering over the face of the water. »

Not bad. In any case, the rendering approximates the human translation of Zadoc Kahn, approved by the rabbinate, which goes as follows: “In the beginning God created the heavens and the earth. But the earth was only loneliness and chaos; darkness covered the face of the abyss, and the breath of God hovered over the face of the waters. »

Google Translate already opens bridges of this quality and even more between 150 languages, including Latin and Greek. Digital programs in development seek to make it just as easy to go through very old texts written in dead languages ​​in stone, on papyrus or on parchment. There are dozens of languages ​​that have disappeared for thousands of years, including Sumerian, Elamite, Luvite, Ugaritic or Umbrian.

A team from Tel Aviv University revealed in May in PNAS Nexusa specialized journal from Oxford University, having developed a program capable of automatically translating Akkadian, a language widely used in Mesopotamia 5000 years ago.

“I have seen the evolution of Google Translate, comments to the Duty Isabelle Marthot-Santaniello, from the Department of Antiquity Sciences at the University of Basel, Switzerland. Ten years ago, when I was teaching middle school students, the students could insult so and so who spoke poorly by saying that he was doing Google Translate. Currently, it doesn’t look like that anymore. I use DeepL and the quality of the translations is impressive. Many things are therefore already possible and many things will advance in the field of translation assisted by artificial intelligence, including in my field of ancient languages. »

The papyrus code

Professor Marthot-Santaniello began to have a passion for the languages ​​and civilizations of Antiquity at a very young age, as is often the case in her very, very specialized field. She speaks of the mythologies discovered in children’s books, of a trip to Egypt at age 10 with her parents which decided her vocation. “When I came back, I announced that I wanted to be an Egyptologist,” she says.

Thing said, thing done. She began learning Latin and Greek at age 13, at college in France. Mastery of hieroglyphs, hieratic, demotic and Coptic followed. “The Egyptian language has three thousand years of existence, so inevitably there were stages and transformations”, explains the polyglot.

The Egyptologist ended up branching off a bit from his early dreams, since his chosen specialty for the doctorate in Paris now concerns the post-Pharaonic period, that of Greco-Roman Egypt. “I study manuscripts written in Greek, which are very difficult to read. It is a very interesting period, covering a millennium between Alexander the Great and the Arab conquest. »

She arrived at the Universität Basel, the oldest in Switzerland, founded in 1460, to join a publishing project for a collection of Greek papyri. She is currently working on the Der Papyri-Code project, which involves the computer vision, an artificial intelligence technique based on deep learning to analyze images, understand them and process the information they contain. This artificial vision allows, for example, the development of the autonomous car and facial recognition.

The programs easily manage to read modern ink writing on a blank page. The papyri pose an entirely different challenge. They are very deteriorated and the ink contrasts little on the fiber support. “You have to imagine brown ink on a damaged brown support”, sums up the professor. In the recent case, artificial vision succeeded in reading almost perfectly. In the other, reliability drops by half, for now.

“My project started in 2018 and I really felt the revolution through artificial intelligence, explains the papyrologist. My computer science colleagues were only talking about neural networks or deep learning. It is a tool that must be used well and which can do extremely interesting things. But it’s still a tool. We are still clearly needed, at the beginning, to prepare the data and guide the research; in the middle to check the settings; and at the end to interpret the results. I am not at all worried about my work, which is bound to change, however. »

A global puzzle

About 80,000 Greek papyri are already published. There are probably more than half a million left to process in the collections. Trafficking in these cultural treasures, often from Cairo’s antiquities market, has scattered them in fragments all over the world. The end of a text can be in Boston and another end of the same text, in Berlin.

Professor Marthot-Santaniello wants to develop tools to track and analyze the shape of letters. Research on digitized documents would then be based on the image and not on the transcription of the texts.

“With fragments here and there, you can’t get much out of it. But if we manage to put them together end to end, all of a sudden, these fragments take on much more meaning. This technique would make it possible to discover fragments of unknown literature, a lost Greek tragedy or a writing by Aristotle, for example, of which we no longer had a copy. »

The material studied covers a thousand years during which writing has evolved. The objective of the Basel search is to find the most similar entries, with the same alpha, the same beta, etc. The same shapes of letters, what. These indices make it possible to identify the approximate provenance, perhaps one day the provenance and even the author of a given papyrus.

“We won’t necessarily have his name,” said the professor. We risk calling it Anonymous 003 or 004, but it will already be interesting to know if the same scribe copied theIIiad and the Bible in the late Greco-Roman period. There are plenty of questions still being asked. »

Massive digitization will also facilitate and amplify research in texts to finally ask them new questions. Mme Marthot-Santaniello has just carried out an experiment with colleagues which made it possible to easily process five million occurrences to create a panorama demonstrating the transformations of the language over centuries.

She also worked on abbreviations in the papyri. At the beginning of the corpus, in the Ptolemaic period (that of Cleopatra, from 323 to 30 before our era), the scribes used symbols which were instead replaced by abbreviations (very frequent in the Latin language) in the Roman period then byzantine. “We sensed this phenomenon, which we can now quantify and monitor over time. »

The movements never cease to amaze in this boiling field. In March, the University of Kentucky unveiled the Vesuvius Challenge, a global competition to read and decipher two charred scrolls from Herculaneum, a city destroyed in 79 at the same time as Pompeii.

Documents discovered in 1750 cannot be unrolled. Artificial intelligence and digitizer images could reveal their mysteries. The new tools would then make it possible to tackle the decryption of the complete library discovered at Herculaneum. Notice to interested parties: the winner of the competition will receive US$700,000…

Decolonizing Egyptology

To see in video


source site-39