Related Product Enhancement/Bug IDs: CS-9813
14 Replies Last post: Nov 21, 2008 8:27 AM by Sean  
robbono Novice 22 posts since
Jul 30, 2008
Currently Being Moderated

Nov 5, 2008 6:39 PM

Alphabetical Ordering of Chinese Characters

Hello Jivers,

 

We're having an issue with alphabetical ordering of Chinese characters. Each Chinese character has a corresponding romanization (a way of writing the word in English) called pinyin. When listing things in Chinese (for example, the phone book in a mobile phone), names beginning with Chinese characters are ordered according to their pinyin in a regular A-Z format. So, for example, the character 我 is pronounced "wo", and would come after the character “你”, pronounced "ni."

 

However, when Clearspace is listing things in Chinese, for example, users, this alphabetical ordering isn't being obeyed. I created two users with the above listed names, added them as friends, and then ordered my friends list by name. The user named “我我我" (wo) is showing up before "你你你“ (ni) in the friends listing.

 

I'm wondering if, since Chinese is expressed using multibyte characters (two english characters per Chinese character), Clearspace might be ordering Chinese characters according to their multibyte equivalent, instead of according to their pinyin equivalent.

 

My team tells me that Java is capable of ordering Chinese according to its pinyin; I'm wondering what it would take for us to enable this functionality within Clearspace's listings, in terms of lists of content, people, etc that can be alphabetically ordered.

 

Thanks as always!

 

-Rob-

Sean JiveSupport 1,682 posts since
Dec 10, 2007
Currently Being Moderated
Nov 6, 2008 3:00 PM in response to: robbono
Re: Alphabetical Ordering of Chinese Characters

Hey Rob,

 

In Clearspace we use Java's String.compareTo() method. Unfortunately this method compares strings lexicographically. This means the character with the lowest unicode value will be first. As far as I can tell there is no way to have java convert chinese characters to pinyin before comparing them. The reason the list appears out of order is because, as you mentioned, Java is sorting them based on their multibyte unicode value.

 

I can file this as a bug for you if you'd like. Unfortunately to sort chinese correctly it would take a fairly involved code change so I'm not sure how long it would take for this fix to be implemented.

Sean JiveSupport 1,682 posts since
Dec 10, 2007
Currently Being Moderated
Nov 7, 2008 8:32 AM in response to: robbono
Re: Alphabetical Ordering of Chinese Characters

Hey Robert,

 

Which page were you looking at when using sorting? The admin console? Could you give me the URL you're visiting so that I can be sure to look at the correct portion of our code?

ChadPank Novice 24 posts since
Aug 13, 2008
Currently Being Moderated
Nov 7, 2008 8:53 AM in response to: Sean
Re: Alphabetical Ordering of Chinese Characters

Hi Sean,

 

Rob has gone home for the weekend so I don't know the exact pages.. but have concern for anywhere customers / users have access in the community.

 

- people lists (a-z etc)

- blogs lists

- groups lists

 

We need the users to be able to sort these lists to find people.  99% of the users of this community will have a chinese name, blogs will be in chinese and group names will also be in chinese. 

 

I would say admin panel is lower priority as we can use that in english until a fix.  Our main concern is that community will be unusable (and unlaunchable) if people, blogs and groups can not even be sorted.   

 

I hope this helps.

Sean JiveSupport 1,682 posts since
Dec 10, 2007
Currently Being Moderated
Nov 7, 2008 8:59 AM in response to: ChadPank
Re: Alphabetical Ordering of Chinese Characters

Hey Chad,

 

Sorry for the back and forth here, I should have asked this earlier. Could you tell me which version of Clearspace you're currently using or plan to go live on?

ChadPank Novice 24 posts since
Aug 13, 2008
Currently Being Moderated
Nov 7, 2008 9:21 AM in response to: Sean
Re: Alphabetical Ordering of Chinese Characters

No problem.  We are working with 2.5.3 at the moment. 

Sean JiveSupport 1,682 posts since
Dec 10, 2007
Currently Being Moderated
Nov 7, 2008 3:17 PM in response to: ChadPank
Re: Alphabetical Ordering of Chinese Characters

Hey Chad,

 

I've traced through our source for the use cases you metioned above: People lists, Blog lists, and Group lists. The ordering of these elements in a list is done in a different location for each feature.

 

For people lists, we use a Lucene SortComparator. Lucene is an indexing API we use to index all of our content. SortComparatorSource is an interface that only has one method called newComparator which will return an object that can be used to compare two elements. For a example of this interface being implemented, have a look at a class named SortComparator in the Lucene source. We use Lucene's default SortComparator, which utilizes String.compareTo(), which as I mentioned above, doesn't handle chinese characters correctly. In order to get the people list sorting correctly, you'll need to implement your own PeopleAction, which will extend the default PeopleAction. The main thing you'd want to change is the getSortOrder() method. Here you'll want to return your own Lucene Sort object that will use a SortComparator that uses RuleBasedCollector to compare, instead of String.compareTo().

 

For Blog lists, we also use Lucene, but in a different way. We use the DbSearchQueryManager to ultimately query Lucene. The object you'll ultimately want to modify is the SearchQueryResultRelevenceComparator. By default this class's compare() method uses String.compareTo() to compare objects. You'll want to modify this to use your RuleBasedCollector.

 

Finally, group lists don't actually use Lucene, they query the database directly. That means the sort order is ultimately determined by SQL. So inorder for groups to be sorted by pinyin, your database is going to need to be able to use the 'SORT BY' SQL keywords to sort by pinyin. I'm not sure if this is possible in any database, but that is a different discussion. If there's need to modify the query that returns these items, it can be found in SocialGroupDAOImpl on line 500.

ChadPank Novice 24 posts since
Aug 13, 2008
Currently Being Moderated
Nov 7, 2008 11:03 PM in response to: Sean
Re: Alphabetical Ordering of Chinese Characters

Sean,

 

Thank you for your very detailed research and response. 

 

But I find it surprising that you are suggesting that we do this work.  We expect that a full localization is provided by Jive software which means that all functions of the english version are already working in the local language, in this case Chinese. 

Sean JiveSupport 1,682 posts since
Dec 10, 2007
Currently Being Moderated
Nov 10, 2008 7:48 AM in response to: ChadPank
Re: Alphabetical Ordering of Chinese Characters

Hey Chad,

 

I've actually filed this as a bug on our end. I just thought you'd like to know where the customizations would take place, in case you didn't want to wait for them to be implemented in a bug-fix release.

Sean JiveSupport 1,682 posts since
Dec 10, 2007
Currently Being Moderated
Nov 11, 2008 2:13 PM in response to: ChadPank
Re: Alphabetical Ordering of Chinese Characters

Hey Chad,

 

Just wanted to add a bit more information on the bug I filed. The ID for this issue is CS-9813, and I've requested that it be fixed for CS 2.5.5, which is scheduled to be released Dec. 15th.

 

If you have any other questions let me know.

ChadPank Novice 24 posts since
Aug 13, 2008
Currently Being Moderated
Nov 16, 2008 7:05 PM in response to: Sean
Re: Alphabetical Ordering of Chinese Characters

Thanks Sean.

Sean JiveSupport 1,682 posts since
Dec 10, 2007
Currently Being Moderated
Nov 19, 2008 11:19 AM in response to: ChadPank
Re: Alphabetical Ordering of Chinese Characters

Hey Chad,

 

One of our core engineers believes he has a fix for some of these issues however he'd rather not check them in until he can validate them. Unfortunately no one here speaks or reads Chinese, so we're unable to validate. Would it be possible for you to provide us with a list of names in Chinese that are sorted by unicode value ( as our instance does now ) as well as sorted by pinyin ( the way it should be ). This would hep us validate the changes we've made.

 

The changes that we're trying to test only effect the sorting of content based on Lucene. If the content is to be sorted by the database ( as mentioned above ) the only way to handle that currently is to set the DB to the proper locale and collation. The problem with this approach is that if it's a multi-locale instance, all users will be effected. We realize this is a less than ideal solution but the scope of the issue is so large that we cannot make these changes in a point release.

Sean JiveSupport 1,682 posts since
Dec 10, 2007
Currently Being Moderated
Nov 21, 2008 8:27 AM in response to: robbono
Re: Alphabetical Ordering of Chinese Characters

Hey Rob,

 

Using the list you've attached our engineer was able to validate his fix. He's checked those fixes into 2.5.5 which will be released Dec. 15th. However this fixes won't encompass all cases where items can be sorted lexicographically. Specifically the cases that require the DB to do the sorting are cases where a fix was too large in scope for a point release. If this site is set to be a single locale site, setting the collation on the database might provide the correct sorting order. However this order would be incorrect for any characters that are not Chinese.

More Like This

  • Retrieving data ...
To better serve our customers we have included functionality to automatically follow up on a case after it has been idle for more than 5 days, and then auto close after an additional 3 days of inactivity. Choose No to acknowledge that this case will remain idle for longer than 5 days.