18 June 2014

Auto Translation An Myth Sitecore

Hi Friends,

After long time I am back in technical writing and it feel good.
We do have best CMS for content authors to draft there content but once the main website is created all author crave to have "some tool which can help them translate there main website in as many as versions possible without any human intervention".
We developer too have such fancy where we can build such tool. But in our hearts we know that there is no such technology exist which can translate well human emotions written in one language and translate to another perfectly without loosing its sense.

So do we have any way out there !!
I respect the challenge as I know that doing 100% automation is not the answer but building something which can help them translate the content well with ease and control.  May be end of the day we can achieve 60-70 % of content translation automation. Still whatever is translated need to go for moderation by relevant author. May be translating a poem/phrase using any existing auto translation technique is not the way but translating basic information still can be done and may be chance to reduce the workload of translating agency.

So what is the technique ?
I will use Google Translation Service and provide simple tool in Sitecore using simple custom built add-on to give an attempt in my way. (May not be correct).

Let's Start !!
First I need to create a Sitecore Solution in which web project is out of web root directory and I will use File publish method to update the Sitecore web root directory. In same solution I also have a library project where I can put most of my action.
So my visual studio solution look like this
Figure 1.1
I will not list all the code here but if need to put some important code snippet, then will do so.
The approach we need to take is to create two action button which will translate the given text. Following they are :

  1. Figure 1.2 :-  A button in Rich Text Editor toolbar will be clicked while some text is selected. Here the point is that author can select the text (Only Text, Ctrl + A will not work) which he want to translate and click on that button. That button will raise an jQuery Ajax request in background to our own web service which can pass the baton to Google API Service. The Source language will be detected automatically by the Google and target language will be passed as parameter which will be caught by current language selected in Sitecore using version tab. This approach make sure that most of the text will be manually and carefully translated by the author / translator itself and can be moderated at same time.
  2. Figure 1.3 :-A button near to translate under version tab of Sitecore Ribbon. This button will only target the Single Line and Multi-line text field and will translate all of them in current context item. Using settings you may choose to opt out few fields which you may not need to translate such as ProductID or SKU etc., This approach make sure that all simple text field get translated and still can be moderated as only done for particular item in context.
You may also notice in Figure 1.2 near Germany Flag Icon we have another faded button namely 'Copy From', this package is one of the recommended addon to install which will give you a facility of creating a new language version from choice of your language. Which eventually become the fodder for your translation text.  Another point you may translate from German to Danish or any language. But the most recommended way is to translate from English to whatever language you may have. Another point Rich text based translation may not work well for few east Asian language like Japanese because they don't have a concept of spaces.

Following is the way it should look for use after installation of our package.
Figure 1.2
In both approaches we will use common service in background. The way data get delivered is different as output expected location is different.
The most typical is making a selection based translation tool. Which is still have few shortcomings for ex., while translating, approach should be selecting chunks of text top to bottom and move in same way till end of content arrive. In other words don't do random selection and then try translation.
This is something because of limited client side API functionality exposed by the RAD Rich Text Editor. May be some one can overcome this issue by writing extensive jQuery based functions.

Figure 1.3
Can I select Image and text together and then try to translate?
- No, Tool will give you error saying "please select text only".
In background during selection we get the text (plain simple text - the RAD API helps to get that), then we broke that text into multiple words based on space, then send that string array for translation. Once we received the translated text we replace word by word finding in whole content present in editor, but the key is we do it for length of text what user had selected. For example following is the function which do server side job.
public string Translate(string sourceText, string targetLanguage)
        {
            string result = string.Empty;
            try
            {
                string key = GetAPIKey();
                if (key == null || key.Length <= 0)
                {
                    result = "error: Google API Key is Not Defined";
                    return result;
                }
                // Create the service.
                var service = new Google.Apis.Translate.v2.TranslateService(new BaseClientService.Initializer()
                {
                    ApiKey = key,
                    ApplicationName = base.itmSettings.Fields["GoogleApplicationName"].Value,
                    MaxUrlLength = System.Convert.ToUInt32(base.itmSettings.Fields["MaxUrlLength"].Value)
                });
                //Splitting string for better translation
                string[] srcText = sourceText.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
                TranslationsResource.ListRequest lr = service.Translations.List(srcText, GetLanguage(targetLanguage));
                lr.Format = TranslationsResource.ListRequest.FormatEnum.Html;
                var response = lr.Execute();
                var translations = string.Empty;
                foreach (Google.Apis.Translate.v2.Data.TranslationsResource translation in response.Translations)
                {
                    translations += translation.TranslatedText + "|||";
                }
                if (translations != null && translations.Length > 0 && translations.Contains('|'))
                {
                    translations = translations.Substring(0, translations.LastIndexOf("|||"));
                    result = translations.Trim();
                }
            }
            catch (Exception ex)
            {
                result = "error:" + ex.Message;
                if (ex.InnerException != null)
                {
                    result += System.Environment.NewLine + ex.InnerException.Message;
                }
            }
            return result;
        }
This is the client side action which is basically configured in the button in toolbar of rich text editor.
RadEditorCommandList["Translate"] = function (commandName, editor, tool) {
    var selection = editor.getSelection();
    var selectedText = jQuery.trim(selection.getText());
    var wholeText = editor.get_html(true);
    if (selectedText.length > 1 && scLanguage.length > 1) {
        jQuery.ajax({
            type: "post",
            url: "/Sitecore/Service/Translate.aspx/TranslateText",
            contentType: "application/json; charset=utf-8",
            dataType: "json",
            data: '{"text":"' + selectedText + '","lang":"' + scLanguage + '"}',
            success: function (result) {
                if (result != null) {
                    var resultData = result.d;
                    if (resultData != null) {                      
                        var source = selectedText.split(/\s+/);
                        resultData = jQuery.trim(resultData);
                        var target = resultData.split("|||");
                        if (source.length == target.length) {
                            for (var i = 0; i < source.length; i++) {
                                wholeText = wholeText.replace(source[i], target[i]);
                            }
                            editor.set_html(wholeText);
                        }
                        else {
                            alert("Translation is not possible. Please select only text.");
                        }
                    }
                    else {
                        alert("Translation is not possible.");
                    }
                }
                else {
                    alert("Translation is not possible.");
                }
            },
            error: function (xhr, status, error) {
                alert('Error' + error + status);
            }
        });
    }
    else {
        alert("Please select some text, which should be greater then 2 character.");
    }
};
In case of ribbon based button we do find all fields in current item based on type of single line text and multi line text and minus the fields specified in exclude list under settings, we go for whole translation field by field. Here we don't split the words. Exclude list and settings for this module is defined as follows:

Complete Source code you may find at following GitHub link.
The package you may download from the Sitecore Market place.
Following concept is not part of this post but I used in my endeavour so I am referencing them back.
How to add button in rich text editor of Sitecore ?
- Xcentium Blog and Brian Pedersen's Blog.
How to add simple button in ribbon of content editor ?
- Huge Inc. Blog (I Followed from 'Next, we will describe how to add a button to Sitecore client... ')
To use this package in your Sitecore Installation there are few lengthy steps to follow, which you may find in ReadMe of package and at documentation in Sitecore Market Place.

Let me know folks how much this approach is appropriate for auto translation. Your feedback will help in improving this package and end of the day Sitecore Community.