Translated, specialising in the development of language solutions based on artificial intelligence, and Cineca, one of the world’s largest computer centres, announced the start of a research project to create an artificial intelligence capable of translating like the best professional translators.
The project represents a step forward in the evolution of translation technologies and in the global accessibility of languages and aims to train the most advanced linguistic model developed by Translated on a unique translation dataset in the world, using Leonardo, the seventh most powerful supercomputer in the world, the third in Europe, managed by Cineca at the Tecnopolo of Bologna (pictured). The result will be the world’s most advanced language model, which in the Italian-English and English-Italian language combination will be released open-source and open weights, marking a further significant step towards the realisation of a universal translator.
Cineca will support Translated’s language model training with 10 million hours of GPU training. This enormous computing capacity will allow significant acceleration in training the model, ensuring rapid progress. The dataset provided by Translated is the result of 15 years of meticulous collection, and differs significantly from commonly used datasets in that it includes entire documents with extensive context, i.e. mistranslations, revisions and reasoning by translators and reviewers in the event of disagreement. It is precisely this wealth of rich and sophisticated data that now enables Translated, with the help of Cineca, to train an artificial intelligence model capable of deep language understanding.
“Language has driven human evolution, allowing us to understand each other, collaborate for a better future, and develop faster than any other species,” Marco Trombetti, CEO of Translated, said in a note. “By combining our 25 years of experience in artificial intelligence research with the computational capacity managed by Cineca, we will create an AI that will have a very strong impact on the lives of millions of people and that will one day allow anyone to understand and be understood in their own language. Together, we are taking a decisive step towards the next stage of human evolution’.
“We are thrilled to be involved in a project that takes such a significant step in research on artificial intelligence applied to language, for the benefit of the national and global community,” says Francesco Ubertini, president of Cineca. “This collaboration is an example of how supercomputing can drive innovation and have a great social impact and benefit people globally.
The research project, presented during an event entitled The Power of Languages – Toward the Universal Translator, follows in the wake of public-private partnerships with universities and research centres supported by grants from the European Union in which Translated has participated since the early 2000s and which have made it possible to develop ever more advanced technologies to support translation, and to make them available to society.
The linguistic model to be trained by the Leonardo supercomputer was developed by Translated using an innovative chain-of-thought technique and has already been tested with a small group of companies in real production contexts over the course of 2024. It has demonstrated the ability to translate conversational data with an error rate of less than three per thousand words, lower than those made by professional translators on the same content, and four times lower than the most advanced machine translation systems. Thanks to Leonardo’s computing power, Translated and Cineca expect to reduce the margin of error down to one error per thousand words, achieving an accuracy comparable to the first percentile of the best professional translators.
After the initial release for Italian-English and English-Italian combinations, Translated planned to extend the new model to all 200 languages supported by its current AI, further expanding the global impact of the technology development.
ALL RIGHTS RESERVED ©