Fixing UNICODE-ASCII-Arabic conversion problems
UNICODE-ASCII-Arabic and conversion problems? Use this algorithm
So, you've used Microsoft Office applications like Excel or Access and SQL Server to store and retrieve Arabic text. There are times though, when your data seems to be corrupt or you get dummy Latin characters when you're expecting the correct Arabic text. Providing configuration information regarding how to set SQL Server, Excel or Access to hold Arabic information correctly is not the topic of interest for this blog post. Instead, I'd like to provide you with an algorithm and sample VBA code to help you convert the Latin text back to Arabic.
There are many reasons for the Arabic text to appear in Latin. One of the reasons may be your SQL Server Collation settings, or the fact that you have used a VARCHAR (or in this context CHAR, TEXT, etc.) when you're supposed to use NVARCHAR (or NTEXT, NCHAR, etc.). The N prefix in the type makes sure that the information in the column is stored in UNICODE and hence, your Arabic text will surely display exactly as it is everywhere. However, if you are getting something like ÇáßãÈíæÊÑ when you are supposed to get الكمبيوتر , then it's already too late.
Good news! I have figured out the conversion table between such Latin characters and the corresponding Arabic text in Unicode. The algorithm is simple, there is a difference in the ASCII codes between the Latin text and those in Arabic (based on code page 1256: Arabic Windows). Unfortunately, this is not a constant value, as the order of characters is not the same either. Hence, in order to convert those Latin letters into UNICODE, all you have to do is scan the text taking one character at a time, then get the UNICODE value for the character and add the deficit corresponding the range value from the table below:
Update on July 16th, 2011: Table has been updated with more characters. I have no time at the moment to update the code. But you can easily add the missing ones. I have also attached an Excel Sheet with the conversions.
Conversion Table
ASCII range for Latin Text | ASCII range for Latin text (in hex) | Deficit between ASCII & Unicode | ||
---|---|---|---|---|
From | To | From | To | |
129 | 129 | 81 | 81 | 1533 |
138 | 138 | 8A | 8A | 1519 |
141 | 141 | 8D | 8D | 1529 |
142 | 142 | 8E | 8E | 1546 |
143 | 143 | 8F | 8F | 1529 |
144 | 144 | 90 | 90 | 1567 |
152 | 152 | 98 | 98 | 1553 |
154 | 154 | 9A | 9A | 1527 |
159 | 159 | 9F | 9F | 1563 |
170 | 170 | AA | AA | 1556 |
191 | 191 | BF | BF | 1376 |
192 | 192 | C0 | C0 | 1537 |
193 | 214 | C1 | D6 | 1376 |
216 | 219 | D8 | DB | 1375 |
220 | 223 | DC | DF | 1380 |
225 | 225 | E1 | E1 | 1379 |
227 | 230 | E3 | E6 | 1378 |
236 | 237 | EC | ED | 1373 |
240 | 243 | F0 | F3 | 1371 |
245 | 246 | F5 | F6 | 1370 |
248 | 248 | F8 | F8 | 1369 |
250 | 250 | FA | FA | 1368 |
Here's a VBA code you can use within Excel or Access to convert the Latin Text to Arabic. I have used Excel as an example:
Sub convert2ara()
For Each s In Selection
strNew = ""
For i = 1 To Len(s.Text)
j = AscW(Mid(s.Text, i, 1))
Select Case j
Case &HFA: j = j + 1368
Case &HF8: j = j + 1369
Case &HF5 To &HF6: j = j + 1370
Case &HF0 To &HF3: j = j + 1371
Case &HEC To &HED: j = j + 1373
Case &HC0 To &HD6, &HBF, &HC1: j = j + 1376
Case &HD8 To &HDB: j = j + 1375
Case &HE1: j = j + 1379
Case &HE3 To &HE6: j = j + 1378
Case &HDC To &HDF: j = j + 1380
Case &H8A: j = j + 1519
Case &H9A: j = j + 1527
Case &H8D, &H8F: j = j + 1529
Case &H81: j = j + 1533
Case &HC0: j = j + 1537
Case &H8E: j = j + 1546
Case &H98: j = j + 1553
Case &HAA: j = j + 1556
Case &H9F: j = j + 1563
Case &H90: j = j + 1567
End Select
strnew = strnew & ChrW(j)
Next
Cells(s.Row, s.Column + 1) = strnew
Next
End Sub
Here's also a T-SQL Stored Procedure that does the same thing:
CREATE PROCEDURE Convert2Ara
@Latin VarChar(100),
@Arabic NVarChar(100) = N'' OUTPUT
AS
BEGIN
DECLARE @ind int, @Len int, @Src int
SET @Len = Len(@Latin)
SET @ind = 1
WHILE @ind <= @Len
BEGIN
SET @Src = ASCII(SUBSTRING(@Latin, @ind, 1))
Set @Src = CASE
WHEN @Src = 250 THEN @Src + 1368
WHEN @Src = 248 THEN @Src + 1369
WHEN @Src BETWEEN 245 AND 246 THEN @Src + 1370
WHEN @Src BETWEEN 240 AND 243 THEN @Src + 1371
WHEN @Src BETWEEN 236 AND 237 THEN @Src + 1373
WHEN @Src BETWEEN 216 AND 219 THEN @Src + 1375
WHEN @Src BETWEEN 193 AND 214 OR @Src=191 THEN @Src + 1376
WHEN @Src BETWEEN 227 AND 230 THEN @Src + 1378
WHEN @Src = 225 THEN @Src + 1379
WHEN @Src BETWEEN 220 AND 223 THEN @Src + 1380
WHEN @Src = 138 THEN @Src + 1519
WHEN @Src = 154 THEN @Src + 1527
WHEN @Src = 141 OR @Src=143 THEN @Src + 1529
WHEN @Src = 129 THEN @Src + 1533
WHEN @Src = 192 THEN @Src + 1537
WHEN @Src = 142 THEN @Src + 1546
WHEN @Src = 152 THEN @Src + 1553
WHEN @Src = 170 THEN @Src + 1556
WHEN @Src = 159 THEN @Src + 1563
WHEN @Src = 144 THEN @Src + 1567
END
SET @Arabic = @Arabic + NCHAR(@Src)
SET @ind = @ind + 1
END
END
Furthermore, below is a T-SQL Scalar-Valued, User-Defined Function that does the same thing:
CREATE FUNCTION [dbo].[Convert2Ara]
(
@Latin VarChar(255)
)
RETURNS NVarChar(255)
AS
BEGIN
DECLARE @ind int, @Len int, @Src int, @Arabic NVarChar(255) = N''
SET @Len = Len(@Latin)
SET @ind = 1
WHILE @ind <= @Len
BEGIN
SET @Src = ASCII(SUBSTRING(@Latin, @ind, 1))
Set @Src = CASE
WHEN @Src = 250 THEN @Src + 1368
WHEN @Src = 248 THEN @Src + 1369
WHEN @Src BETWEEN 245 AND 246 THEN @Src + 1370
WHEN @Src BETWEEN 240 AND 243 THEN @Src + 1371
WHEN @Src BETWEEN 236 AND 237 THEN @Src + 1373
WHEN @Src BETWEEN 216 AND 219 THEN @Src + 1375
WHEN @Src BETWEEN 193 AND 214 OR @Src=191 THEN @Src + 1376
WHEN @Src BETWEEN 227 AND 230 THEN @Src + 1378
WHEN @Src = 225 THEN @Src + 1379
WHEN @Src BETWEEN 220 AND 223 THEN @Src + 1380
WHEN @Src = 138 THEN @Src + 1519
WHEN @Src = 154 THEN @Src + 1527
WHEN @Src = 141 OR @Src=143 THEN @Src + 1529
WHEN @Src = 129 THEN @Src + 1533
WHEN @Src = 192 THEN @Src + 1537
WHEN @Src = 142 THEN @Src + 1546
WHEN @Src = 152 THEN @Src + 1553
WHEN @Src = 170 THEN @Src + 1556
WHEN @Src = 159 THEN @Src + 1563
WHEN @Src = 144 THEN @Src + 1567
END
SET @Arabic = @Arabic + NCHAR(@Src)
SET @ind = @ind + 1
END
RETURN @Arabic
END