Arabic names parsing

We are looking for is to have configurable option to reparse the token based on culture and confidence.

Arabic name does not get parsed in the complete way, when for example; Muhammad Hafiz is compared with Hafiz and KHAN with KHAN and come back no match.

name_match_why("Muhammad Hafiz KHAN","Hafiz KHAN")

The IBM suggestion to manually configure HAFIZ to always be treated as a GN does not sound feasible for 2 reasons.

- First of all because it is not reasonable to ask the client him to configure all possible middle names.
- Secondly, assume that the client will follow your advice what will happen if he then try to search "Muhammad Hafeeze KHAN" (middle name written differently).
Hafeeze is not configured anywhere so he will again get GN: "Muhammad" SN: "Hafeeze KHAN".
Now he will have 2 missing stems one in the GN and one in the SN and won't get a hit…

The client wants is to be able to get both parses e.g.

When he provides "Muhammad Hafiz KHAN"

GN:"Muhammad Hafiz" SN: "KHAN"
GN:"Muhammad" SN: "Hafiz KHAN"

So, he can get a hit when comparing with either:

- "Muhammad KHAN"

- - "Hafiz KHAN"
Currently even if we set the re-parse threshold to be high enough so that the name will be re-parsed, we are not able to guarantee to the client he will get the second parse of the name.
That is as the second name parsing confidence must be higher than the first name parse confidence for us to get the second parse.
I think it is reasonable to return the second parse with its confidence and let us (via internal configuration of course) to decide ourselves what to do with the second parse.
I think that when middle names are involved, especially 3 tokens names, it make sense to always do the following:
- If the middle name in the first parse was attached to the GN return it in the SN.
- If the middle name in the first parse was attached to the SN return it in the GN.

And return both parses and their confidence.

Needed by Date

Apr 3, 2021

Post comment

Guest

Reply
| Mar 9, 2021

Thank you - Ronen - for submission. We will investigate and follow up here with status.

0 reply Hide replies

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.

Please enter your email address

RELATED IDEAS

Arabic names parsing