Thursday 20 September 2007

Hierarchichal representation under Postgres

Oracle has implemented hierarchical representation using the non-sql conformant "connect by" clause. Postgres has declined to accept this implementation and the team is currently working on a better approach.

In the meantime, the contrib folder contains a somewhat similar implementation to that of Oracle's, the connectby() function.

Wednesday 19 September 2007

On-the-fly Language Translation

In my project I am in need of a system to dynamically translate natural text passages into multiple languages (basically and MT, or CAT system). Since the content source is not known, the first thing I need is a language identifier tool. There are many available tools (try this cool online version from Xerox).

I tried text_cat, which is written in Perl, and is used by spamassassin. The results where disappointing to me when I found out confusing results between Spanish and Catalan, even when using ñ, so I need to invest a bit more research on that.
Once the original text language has been identified, I need a translation service. Altavista's Babelfish is probably the most common one, but the programmatic interface is limited (although a Python-based client is fairly easy), it would demand too much bandwidth, and there is a limit on the number of translation requests per client on a given timeframe.

Google's language tools, and google toolbar offer an interface to translating text to multiple languages (be it complete websites, or text passages).

The Spanish newspaper 'el Periodico' uses some propietary software to publish their daily editions in both Catalan and Spanish.

Systran is a commercial solution which offers online and local translation services, it's currently being used by Altavista, Google and many others (Try translating some difficult, slangy passage using Babelfish then Systran and see for yourself).

Since there are a lot of different resources available, I will leave this post as an introduction to this fascinating subject, until my next writing with hopefully some clear and promising results.

Wednesday 22 August 2007

Open source search engines

Nutch (wikipage) is an open source project that offers a set of tools to setup a search engine. There's a bunch of projects based on Nutch.