Google Translate idea

03.08.2009 18:40

Google Translate is currently pretty useless for any serious translation (regardless of what translators of user manuals for consumer electronics think).

However, I sometimes find myself searching for, say, a Slovene equivalent to a specific English technical term or phrase. I'm fluent in both languages, it's just that the term may be too far out of my expertise to know the correct translation - although I can usually spot the correct one when I see it in context.

Scientific terminology is usually very strict, with one specific name for a phenomenon. And if you want the translation to appear correct in the eyes of someone knowledgeable in that field, a simple by-the-dictionary translation of that name may make you a popular target of in-jokes (take note, authors of subtitles on Slovenian television networks).

So, if I don't have someone fluent in that terminology at hand, what I usually do is first check for equivalent pages in different languages of Wikipedia. More often than not, that approach fails miserably. Then I'm off to Google, where I search for the term I want to translate (in quotes) plus some terms in the targete language that I estimate must appear nearby in the translation.

However, that's not really a good task for a general search engine - translations may for example appear on different web pages (pages often have an English and an Slovenian section separated on different pages). On the other hand my search will only return results when a single page contains both English and Slovene texts, so a lot of potentially useful results would be missed.

Here's my idea: machine translation tools like Google Translate already recognize pairs of texts on the web that are direct translations of each other. That's the input for their machine learning algorithms, where they learn how to (badly) translate free text.

Wouldn't it be nice if you could enter just an English phrase and select a language, and get back a list of English texts containing that phrase, plus the matching texts in Slovene? It's not even necessary to point out where exactly in the text the equivalent phrases are - A paragraph-level of precision would be more than enough. I can find the exact spot myself while reading the context (which I must, to make sure I'm using the correct translation).

So in effect you would be using the infrastructure that most likely already exists, but for human instead of machine learning. I'm sure that would be a most useful tool for people translating technical texts. At least until machine translation becomes a little more accurate.

Posted by Tomaž | Categories: Ideas


Classic beginers comment on language processing! We found logical conflict after implementing third Toporisic rule in Slovenian. Big dictionary helps these days.

Add a new comment

(No HTML tags allowed. Separate paragraphs with a blank line.)