Proposal:Preparing a translation machine

Status (see valid statuses)

The status of this proposal is:
Request for Discussion / Sign-Ups

Every proposal should be tied to one of the strategic priorities below.

Edit this page to help identify the priorities related to this proposal!


  1. Achieve continued growth in readership
  2. Focus on quality content
  3. Increase Participation
  4. Stabilize and improve the infrastructure
  5. Encourage Innovation



Summary

I propose to prepare what is needed for a future translation machine.

Proposal

  1. Accepting OmegaWiki as a Wikimedia Foundation's project and improving it.
  2. Writing a robot to extract as much information as possible from Wiktionaries and to include it in OmegaWiki, through a human check, so that the person entering an expression into OmegaWiki could just accept the fields from Wiktionary by checking boxes, and would have the possibility to modify and complete the fields before inclusion.
  3. Thinking of a way to involve the general public in a WikiGrammar, written in a rigorous way, like OmegaWiki (not like Wiktionaries) so that this grammar information could later be used by a translation machine.
  4. Contact the Moses, OpenLogos, Apertium or/and any other community working on opensource translation machine to see if they would be interested in collaborating with the WikiMedia Foundation.
  5. Thinking of one or two machine translation approaches.

Additions

¹ Moses is a statistical machine translation system that allows you to automatically train translation models for any language pair. All you need is a collection of translated texts (parallel corpus).

Motivation

  • There is no high quality translation machine in this planet. The least bad ones are proprietary, or only work between similar languages (typically between romance languages).
  • Translation machines have 2 parts: the lexical part (a dictionary a program can read and use easily) and a programmatic part, implementing the grammar and translation rules. Up to now, both parts have been regarded as too technical for the general public to be involved. OmegaWiki made a significant step towards a machine-readable highly-structured dictionary using the general public knowledge. Other steps are necessary in the lexical part, as discussed in OmegaWiki's community pages.
  • I think it's possible to use the grammatical knowledge of the general public too, though this has to be thought. It's not very difficult in some fields, such as inflections (conjugations, declensions etc., which are already recorded in Wiktionaries), word places in some languages (for instance whether a French adjective is placed before or after the noun).
  • I don't think we're ready now to make a good translation machine. If we begin too quickly, the risk of major errors is high. This is why I propose we build the bricks which will enable to make a translation machine later.

Key Questions

  1. How to make a highly structured machine-readable WikiGrammar?
  2. What should be the internal structure of such a wiki: relational database (MySQL, for instance), XML, Prolog knowledge base, or other?
  3. How to interconnect or merge the Wiktionaries with OmegaWiki?

Potential Costs

References

Web services


Community Discussion

Do you have a thought about this proposal? A suggestion? Discuss this proposal by going to Proposal talk:Preparing a translation machine.

Want to work on this proposal?

  1. .. Sign your name here!