Par. GPT AI Team

Is ChatGPT Translation Better than DeepL?

To cut straight to the chase: ChatGPT translations are surprisingly good, but not as good as DeepL or even Google Translate. Yet, this emerging technology offers intriguing possibilities for translation, especially in the context of increasingly interconnected communication styles.

ChatGPT has taken the world by storm, boasting a staggering 100 million users just two months after OpenAI’s launch. News articles, tech blogs, and LinkedIn discussions frequently highlight new use cases of ChatGPT, including its translation capabilities. As a translation agency, we couldn’t resist exploring how ChatGPT performs compared to established players like DeepL and Google Translate, particularly for various languages. While the assessment of such models varies widely, it’s worth diving into the nitty-gritty details to understand where ChatGPT shines and where it falters.

ChatGPT Translations

The translation landscape has been evolving, especially with different language models capturing the spotlight. As indicated in Intento’s latest report on machine translations, GPT-4 and ChatGPT have made their way into the top ten machine translation systems, at least for some language pairs. In contrast, Google holds a somewhat cautious view of earlier versions like GPT-3. Our own tests with ChatGPT revealed a middle ground; stylistically, ChatGPT often outperforms DeepL and Google Translate, while it tends to approach content quality with excessive liberalism sometimes, leading to comprehension challenges.

The Quality of ChatGPT Translations: A Case Study

To examine the efficacy of ChatGPT, we conducted a test involving a German online shop excerpt with a total of 763 words, including product names and descriptions. To exploit ChatGPT’s capabilities fully, we formulated three unique prompts:

  1. A simple translation from German to English.
  2. A pre-edit prompt focusing on optimizing the source text before translating it.
  3. A combination of pre-editing and post-editing, wherein we aimed to fine-tune stylistic and conceptual content after translation.

We used DeepL as a benchmark since it’s widely acknowledged as the go-to machine translation system, particularly for German-English translations. After completing both DeepL and ChatGPT translations, we performed a quality check to count the types and frequency of errors encountered in each version.

How Does ChatGPT Work?

For context, let’s break down how ChatGPT operates. It is a generative language model, also known as an LLM (Large Language Model). Through an extensive training process that involved over 100 trillion parameters—taking into account millions of web pages, books, and other data—ChatGPT can generate remarkably human-like text by predicting the most plausible sequence to follow based on the input it receives.

Imagine reading a gripping crime novel. As you progress through the plot, you gather clues and analyze various suspects, trying to anticipate the possible outcomes of the story. Similarly, transformers (the underlying technology of ChatGPT) recognize patterns amongst words, calculating the probabilities of subsequent words based on earlier context. The distinct advantage of this technology is that it accounts not only for immediate sentences but also grasps longer passages, continuing to draw correlations where traditional systems might falter.

Interestingly, ChatGPT chooses not only the most probable word but embraces a range of alternatives that may not be the most obvious choices. This approach introduces a dynamic and engaging element to the writing, making the output appear more organic and human-like. Hence, when a user inputs a prompt, the results may vary significantly each time, revealing how adaptable ChatGPT can be.

The Quality of ChatGPT Translations: Weighing Errors

When we take a step back and analyze the error landscape, it’s quite alarming to realize how both DeepL and ChatGPT commit numerous mistakes. In a typical batch of translations, we identified anywhere from 67 to 101 errors within a mere three pages of text—an unacceptable standard for any professional context requiring translation. In light of this, the necessity for human post-editing cannot be overstated.

In our particular study, ChatGPT managed to slightly decrease the number of major errors compared to DeepL. However, the caveat was the notable increase in minor and medium errors—proof that greater complexity also increases the number of ways things can go awry. The pre-editing phase proved beneficial in reducing medium errors, but serious mistakes persisted across the board.

We evaluated the errors based on their severity. Typically, terminology errors (such as incorrect product names) are far more consequential than grammatical slip-ups. Through our weighted assessment approach, we found that serious errors in ChatGPT translations were minimized relative to DeepL, but the persistent existence of minor inaccuracies still created considerable additional workload for post-editing.

Style and Text Flow

When it comes to style and fluidity, ChatGPT impresses with its adaptability, maintaining a conversational tone that often sounds more natural to native speakers than DeepL’s offerings. Remarkably, almost 40% of the segments translated by ChatGPT were identical to the output from DeepL, and nearly 80% bore striking similarity. Depending on modifiers such as pre-editing and post-editing, the percentage of similarity fluctuated, with pre-editing driving it down to 67% and post-editing to 55%.

This strength is particularly evident with more extensive texts, such as product descriptions, where ChatGPT’s translations read more fluently and elegantly compared to the occasionally awkward phrasing that users may encounter in DeepL outputs. Furthermore, post-editing yielded an even more significant boost in language quality, suggesting that while ChatGPT does require oversight, it also offers the potential for improvement.

However, it’s essential to acknowledge that while ChatGPT tends to simplify and streamline translations by omitting seemingly redundant sentences, this can lead to crucial information being lost—especially in the context of a web shop. Missing content can critically hinder effective communication where each detail is necessary for informing customers about products. Additionally, capitalization issues sometimes appeared in ChatGPT’s translations, particularly with numerical lists, demonstrating inconsistencies that were easily addressable but added up to significant time adjustments during post-production.

ChatGPT vs. DeepL: Style and Terminology

The interplay between the volume and nature of errors significantly impacts both cost and efforts required to polish ChatGPT’s translations to a professional level. For instance, while many minor errors aggregate to a considerable amount of effort, others necessitate complete overhauls of sentences or paragraphs, drastically inflating the workload for human post-editors.

When comparing the translation quality, ChatGPT often performs exceedingly well stylistically, avoiding serious errors while producing fewer medium severity mistakes. However, it is notorious for producing a higher volume of minor errors. Those involved in translating product names typically fared better with ChatGPT, though names and properties of materials didn’t always align properly. This raises a question: how valuable is a translation that, while stylistically appealing, loses meaning or details? For some businesses, accuracy reigns supreme over flowery language.

Conclusion

So, is ChatGPT translation better than DeepL? The answer remains nuanced. ChatGPT offers a surprisingly superior stylistic edge, making it an appealing choice for contexts where tone and light conversational quality matter. But when it comes to raw accuracy and terms of serious content fidelity, DeepL still stands as the front-runner. In the ever-evolving landscape of machine translation, the decision ultimately hinges on the specific needs of each project. With human oversight playing an indispensable role, the collaboration between these technologies may well shape the future of translation.

In the end, as we marvel at the capabilities of these language models, let’s also remember that they are tools rather than complete solutions. Tapping into their strengths while offsetting their weaknesses through skilled human editorship may yield the most successful outcomes in charge of bridging language barriers and fostering communication among diverse audiences.

Laisser un commentaire