| Author |
Message |
Scott tutorialtoday.com

Joined: 24 Mar 2005 Posts: 2582 Location: Mississauga, Ontario
|
Posted: Sat Mar 08, 2008 8:53 pm Post subject: Different Languages in PHP |
|
|
I am working on making my script multi-lingual, it currently is only in English but people are also requesting to translate it into other languages.
First of all (although it is not a PHP issue) do I need to change the charset or something in the <head> for it to work with different languages?
Second, for the regex I currently have it just checks for a-zA-Z (when checking the letters). Although, if I use \w (word character) to check, will that return correctly for letters from another language? _________________ Tutorial Management Script - Version 1.3 Released
TutorialToday - Up and running, submit your tutorials!
Linux Tutorials - Coming Soon |
|
| Back to top |
|
| |
krt ...

Joined: 11 Jan 2005 Posts: 4607 Location: Australia
|
Posted: Sat Mar 08, 2008 10:10 pm Post subject: |
|
|
I'd use Unicode and send a header and use a meta tag for the best cross browser compatibility. Some don't deal with just the one.
In PHP: header("Content-Type: text/html; charset: utf-8");
(obviously changing text/html if needed)
Then mimicking this in HTML:
| Code: | | <me*ta http-equiv="Content-Type" content="text/html;charset=utf-8" /> |
Remove the * obviously
\w in regex only matches [a-zA-Z0-9_]. Note the fact that "word characters" can include numbers and underscores which most people don't expect. |
|
| Back to top |
|
| |
Scott tutorialtoday.com

Joined: 24 Mar 2005 Posts: 2582 Location: Mississauga, Ontario
|
Posted: Sun Mar 09, 2008 8:39 am Post subject: |
|
|
I would I go about validating titles with regex if they are in a different language. _________________ Tutorial Management Script - Version 1.3 Released
TutorialToday - Up and running, submit your tutorials!
Linux Tutorials - Coming Soon |
|
| Back to top |
|
| |
Pie32 Not Banned

Joined: 17 Mar 2005 Posts: 1411 Location: Lost in 84
|
Posted: Sun Mar 09, 2008 8:52 am Post subject: |
|
|
I'm not sure what version of PHP 6 we have on L2P servers, but that's something you may want to look into. One of the big features they are working on for it are support for languages with non-Latin characters. _________________ [img]http://luneknight.com.ru/counter.jpg[/img]
Random Battle: [img]http://luneknight.com.ru/l.jpg[/img] vs. [img]http://luneknight.com.ru/r.jpg[/img] |
|
| Back to top |
|
| |
Scott tutorialtoday.com

Joined: 24 Mar 2005 Posts: 2582 Location: Mississauga, Ontario
|
Posted: Sun Mar 09, 2008 9:19 am Post subject: |
|
|
| Pie32 wrote: | | I'm not sure what version of PHP 6 we have on L2P servers, but that's something you may want to look into. One of the big features they are working on for it are support for languages with non-Latin characters. |
This won't just be for my site, so it will be on other servers where they will most likely have PHP 4. _________________ Tutorial Management Script - Version 1.3 Released
TutorialToday - Up and running, submit your tutorials!
Linux Tutorials - Coming Soon |
|
| Back to top |
|
| |
LP-SolidRaven Dictator of the Dump

Joined: 06 Jun 2004 Posts: 7015 Location: The cheese is made out of moon
|
Posted: Tue Mar 11, 2008 7:43 am Post subject: |
|
|
You could use \xhexhere
Replace hexhere with the hex code of the character. _________________
| Quote: |
<bart416> I just realized something
<bart416> we celebrate the fact that this piece of rock made one rotation around a glowing ball of plasma that is kept together due to its own gravity well
<njsg> HAPPY NEW YEAR
<Easter> ^^
|
|
|
| Back to top |
|
| |
leontius Novice Poster
Joined: 26 Mar 2008 Posts: 2
|
Posted: Wed Mar 26, 2008 5:04 am Post subject: |
|
|
'\w' in regex corresponds to the locale set in the server. PHP 5 manual said that
| Quote: | | A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w. |
So probably you can use regular expressions in combination with setlocale() method. _________________ Leap On! |
|
| Back to top |
|
| |
ClickFanatic Est. 2005

Joined: 18 Jan 2005 Posts: 3857
|
Posted: Wed Mar 26, 2008 7:24 am Post subject: |
|
|
The problem with locales is that they have to be present on the server. Some servers simply have the en_US locale and nothing else.
But as I don't really see alternatives (unless you want to manually create a regex for every language, using \x## patterns) it's probably worth the risk. _________________ Captain Jell-O Buster from the Future
[img]http://feeds.feedburner.com/sparepencil.1.gif[/img] |
|
| Back to top |
|
| |
linuxdoctor Infallible Persona

Joined: 23 Apr 2005 Posts: 1203 Location: Ottawa, Canada
|
Posted: Wed Mar 26, 2008 8:45 am Post subject: |
|
|
the E107 content management system has an interesting way to handle different languages. It's written entirely in PHP so perhaps this is something you might want to look at.
http://e107.org
http://e107coders.org
To quickly describe what they do for different languages.
First they put all of the text that is to be output in strings in different language files.
| Code: |
/* this is for English and placed in English/lang.php */
define("GOOD_MORNING", "Good morning.");
define("GOOD_BYE", "See you.");
/* This is for French and place in French/lang.php */
define("GOOD_MORNING", "Bon jour.");
define("GOOD_BYE", "Au revoir.");
/* This is for German and place in German/lang.php */
define("GOOD_MORNING", "Guten Tag.");
define("GOOD_BYE", "Auf wiedersehen.");
|
Notice how each of these files are all called 'lang.php' but in different directories under the name of the language. This is so you can multiple language files but all of them in the same place per language. It may also be convenient to place all these directories in a directory of it's own called 'languages' for instance.
Second, in your application configuration file, have two variables to deal with the languages, the first is the URL to where the language directories will be located and the second the name of the language.
| Code: |
/* config.php entry for languages */
$LANGUAGES_DIRECTORY = "languages";
$DEFAULT_LANGUAGE = "German";
|
So, whatever language you want to set as default, just change the '$DEFAULT_LANGUAGE' variable. In your app, you may want to have this as a variable per-user in the database. So, when the user logs in, you can simply assign this variable whatever his default language is.
Now the third step is where the magic begins. In you app, load in the 'config.php' and then load in the particular language you want. Then whenever you want to output a string, usethe macro for the text from the 'lang.php' file rather than the actual text. Magically, your chosen language will be used.
| Code: |
/* app.php */
// Include anything you need to have done first
include('config.php');
// include anything else you might need to here like change the default language from
// a cookie or the database
// now include the default language:
include( $LANGUAGES_DIRECTORY . '/' . $DEFAULT_LANGUAGE . '/lang.php');
// now use it.
echo GOOD_MORNING
echo GOOD_BYE
|
The output will be in your selected language. The e107 documentation and code will give you the exact implementation. I've necessarily only sketched out how they do it very briefly. It is actually done very elegantly and that is what good programmers always strive to accomplish.
This example should work with the possible exception of having to make the LANGUAGES_DIRECTORY an absolute URL rather than just a relative one. The languages directory needs to be in the same directory as the main 'app.php'. _________________ Misanthrope: someone who realizes that humans really are as stupid as they appear.
If you think I'm 'politically' incorrect you have the wrong politics. |
|
| Back to top |
|
| |
LP-SolidRaven Dictator of the Dump

Joined: 06 Jun 2004 Posts: 7015 Location: The cheese is made out of moon
|
Posted: Wed Mar 26, 2008 10:45 am Post subject: |
|
|
That's a good idea until you start with an extensive template system... _________________
| Quote: |
<bart416> I just realized something
<bart416> we celebrate the fact that this piece of rock made one rotation around a glowing ball of plasma that is kept together due to its own gravity well
<njsg> HAPPY NEW YEAR
<Easter> ^^
|
|
|
| Back to top |
|
| |
ClickFanatic Est. 2005

Joined: 18 Jan 2005 Posts: 3857
|
Posted: Wed Mar 26, 2008 11:49 am Post subject: |
|
|
Defining constants is quite a good solution, but not very elegant in my opinion. Especially when the number of different phrases becomes really big (consider this constant USER_PASSWORD_CHARACTER_MAXLENGTH_EXCEEDED).
Also, if the translations are outdated, or certain phrases remain untranslated, what will happen? You will see the constant (which is undefined) in plain text. Not really nice.
A better solution is the PHP implementation of gettext: http://savannah.nongnu.org/projects/php-gettext/
It is used in WordPress, among others. It basically works like this:
Throughout the code the programmer can use the default language to write messages. For example: Your password exceeds the maximum length
To make it possible to translate this text, it will be wrapped in a function.
| Code: | | <?php echo _e('Your password exceeds the maximum length'); ?> |
What this function does is rather simple. It looks in the loaded language file for a translation of the given string. If it exists it will be returned, if it's not, then the default string (which is of course the string passed to the function) will be returned.
Translations are stored in gettext format, which is very useful because software exists to easily edit such files. _________________ Captain Jell-O Buster from the Future
[img]http://feeds.feedburner.com/sparepencil.1.gif[/img] |
|
| Back to top |
|
| |
linuxdoctor Infallible Persona

Joined: 23 Apr 2005 Posts: 1203 Location: Ottawa, Canada
|
Posted: Wed Mar 26, 2008 4:29 pm Post subject: |
|
|
That could be a good way. The particular gettext you pointed to is not the GNU gettext but something different. PHP supports the standard GNU gettext.
I'm not that familiar with WordPress but if it does use gettext it would most likely be the GNU version.
The biggest disadvantage is the requirement of the use the .po files. These files have a fairly complex structure that will require a learning curve which might be somewhat of a deterrent for people not familar with Unix conventions.
Another disadvantage is speed. Gettext in PHP is slow. Everything in PHP is slow. PHP is an interpreted language. First the Zend compiler reads all the source files in and converts it into an intermediate form before executing it. The macro implementation is very fast since the translation is done in situ by the Zend compiler as the source is read in. Using gettext would not cause the translations to actually be done until the execution phase and the net result might be a noticable slowness in response or a jerkiness in rendering pages.
While gettext provide the most general solution, it was designed for compiled to object programmes. In PHP it is cumbersome and slow. In my view, that hardly qualifies it as elegant. _________________ Misanthrope: someone who realizes that humans really are as stupid as they appear.
If you think I'm 'politically' incorrect you have the wrong politics. |
|
| Back to top |
|
| |
LP-SolidRaven Dictator of the Dump

Joined: 06 Jun 2004 Posts: 7015 Location: The cheese is made out of moon
|
Posted: Thu Mar 27, 2008 3:05 am Post subject: |
|
|
There is a fast way to use GNU Gettext in php.
You might want to read this. I've used it before and it's quite fast. The only issue is that most servers don't have it compiled. Haven't checked if LLP has it. And if it doesn't I don't think Trel would mind installing it. Oh yeah keep in mind the documentation of this module is terrible in several ways. If I find some time I might write a small tutorial on it.
And if the site is intended to work under extreme high loads you could always use a squid cache server combined with an accelerator. _________________
| Quote: |
<bart416> I just realized something
<bart416> we celebrate the fact that this piece of rock made one rotation around a glowing ball of plasma that is kept together due to its own gravity well
<njsg> HAPPY NEW YEAR
<Easter> ^^
|
|
|
| Back to top |
|
| |
linuxdoctor Infallible Persona

Joined: 23 Apr 2005 Posts: 1203 Location: Ottawa, Canada
|
Posted: Thu Mar 27, 2008 7:21 am Post subject: |
|
|
You're sort of making my point for me. I agree with you that gettext is a very general solution to the problem of multi-lingual translations for applications. Gettext is widely used in Linux apps and there is a lot to say for it.
However, there are also weaknesses and we've pointed them out. Another weakness with a lot of GNU software, including gettext, is that in its endeavour to be general it also becomes extremely complex. Some critics have said needlessly complex. On the other hand, being less complex would also mean being less general and less complete necessitating the developer to add other libraries and modules to add additional functionality that would be missing from a more general solution. In the end, the application becomes even more complex than it otherwise would have been and and almost certainly much larger and slower.
What I suggested was a simple approach. Certainly it's not the best but it's simple and it works. While one programmer is spending a lot of time trying to get gettext to work another programmer could be up in running with my proposed solution. _________________ Misanthrope: someone who realizes that humans really are as stupid as they appear.
If you think I'm 'politically' incorrect you have the wrong politics. |
|
| Back to top |
|
| |
ClickFanatic Est. 2005

Joined: 18 Jan 2005 Posts: 3857
|
Posted: Thu Mar 27, 2008 11:47 am Post subject: |
|
|
The PHP gettext that I referred to is indeed not the GNU gettext extension that may be installed on servers.
Portability would be a valid reason to use the PHP gettext implementation instead of the GNU gettext extension for PHP. WordPress doesn't rely on the PHP extension, but uses a modified version of PHP gettext. _________________ Captain Jell-O Buster from the Future
[img]http://feeds.feedburner.com/sparepencil.1.gif[/img] |
|
| Back to top |
|
| |
|
|
|