Enhance GNM to screen 'Initials' in the name on both the SN and the GN side of processing

Needed By

Month

Post comment

Guest

Aug 19, 2025
Hi, Arik,
There has been a special rule for three-phase European name in GNM:
1. For Hispanic or Portuguese name, middle one is strongly favored as part of surname
2. For other European name, middle one is strongly favored as part of given name
This explains why "Donald J. Trump" is parsed that way since the culture of the name is "ANGLO" since the new multiple parse option does not touch that logic.
The new multiple parse option is independent from the alternative reparsing. It's from the same original tokens , not the rotated one as in alternative reparsing, thus not impacted by the alternative parsing threshold.
Thanks,
Randy
Reply
Hide replies

Guest

Aug 17, 2025

Hi Randy,
1. Correction for "Vladimir V. Putin" - it does parses into 2 names when parser.multipleParsedNames(true).
2. I thought that maybe the second parse of initials (same as regular second parse) is provided only if the first break confidence is lower than the reparse threshold.
But even setting ReparseThreshold to 1 does not provide a second parse of "Donald J. Trump" or "John F. Kennedy" that their break confidence is 90.

Regards,
Arik

Reply
Hide replies

Guest

Aug 17, 2025

Hi Randy,
I am using gnm_7.0.0.0.LAiFix009.
Can you please explain what is the difference between the breaking of "BUSARGIN R. VIKTORVOICH" to the breaking of "Donald J. Trump"?
The first, as expected, is broken into 2 names when "parser.multipleParsedNames(true)" and to a single name when "parser.multipleParsedNames(false)".
The second is being broken into a single name (see below) regarless of calling parser.multipleParsedNames(true) or parser.multipleParsedNames(false).
Same with "Donald J Trump", "John F. Kennedy", "Vladimir V. Putin", etc.
Why this feature works with some names but not with other names?
NAME_MATCH_WHY("BUSARGIN R. VIKTORVOICH", "BUSARGIN VIKTORVOICH")
----------------------------------------
[
comparing_names : data_gn : BUSARGIN data_sn : VIKTORVOICH query_gn : BUSARGIN query_sn : R VIKTORVOICH
cultures : data_gn_culture : 0 data_sn_culture : 6 query_gn_culture : 0 query_sn_culture : 6
bitmap_keys : match : true
comparing_fields : data : VIKTORVOICH query : R VIKTORVOICH
comparing_tokens : data : VIKTORVOICH query : R
score_without_left_bias : score : 0.0
comparing_tokens : data : VIKTORVOICH query : VIKTORVOICH
exact_string_match : score : 1.0
applying_oops_factor : factor : 0.950 score : 0.950
applying_missing_stem_factor : factor : 0.850 score : 0.80750
comparing_tokens : data : VIKTORVOICH query : RVIKTORVOICH
score_without_left_bias : score : 0.880
using_compressed_score : factor : 1.0 maximum : 0.950 score : 0.880
comparing_fields : data : BUSARGIN query : BUSARGIN
exact_string_match : score : 1.0
scores : gn_score : 1.0 gn_weight : 0.80 name_score : 0.933333333333 sn_score : 0.880 sn_weight : 1.0
]
----------------------------------------
[
comparing_names : data_gn : BUSARGIN data_sn : VIKTORVOICH query_gn : BUSARGIN R query_sn : VIKTORVOICH
cultures : data_gn_culture : 0 data_sn_culture : 6 query_gn_culture : 0 query_sn_culture : 6
bitmap_keys : match : true
comparing_fields : data : VIKTORVOICH query : VIKTORVOICH
exact_string_match : score : 1.0
comparing_fields : data : BUSARGIN query : BUSARGIN R
comparing_tokens : data : BUSARGIN query : BUSARGIN
exact_string_match : score : 1.0
comparing_tokens : data : BUSARGIN query : R
score_without_left_bias : score : 0.0
applying_oops_factor : factor : 0.950 score : 0.0
applying_missing_stem_factor : factor : 0.850 score : 0.850
comparing_tokens : data : BUSARGIN query : BUSARGINR
score_without_left_bias : score : 0.842105263158
scores : gn_score : 0.850 gn_weight : 0.80 name_score : 0.933333333333 sn_score : 1.0 sn_weight : 1.0
]
----------------------------------------
Names match.

NAME_MATCH_WHY("Donald J. Trump", "Donald Trump")
----------------------------------------
[
comparing_names : data_gn : DONALD data_sn : TRUMP query_gn : DONALD J query_sn : TRUMP
regularized_names : data_gn : DONOLD data_sn : TRUMP query_gn : DONOLD J query_sn : TRUMP
cultures : data_gn_culture : 1 data_sn_culture : 1 query_gn_culture : 1 query_sn_culture : 1
bitmap_keys : match : true
comparing_fields : data : TRUMP query : TRUMP
exact_string_match : score : 1.0
comparing_fields : data : DONALD query : DONALD J
comparing_tokens : data : DONALD query : DONALD
exact_string_match : score : 1.0
comparing_tokens : data : DONALD query : J
score_without_left_bias : score : 0.0
applying_oops_factor : factor : 0.950 score : 0.0
applying_missing_stem_factor : factor : 0.90 score : 0.90
comparing_tokens : data : DONALD query : DONALDJ
score_without_left_bias : score : 0.80
comparing_regularized_fields : data : TRUMP query : TRUMP
exact_string_match : score : 1.0
applying_regularize_score_max : score : 0.980
using_original_sn_score : score : 1.0
comparing_regularized_fields : data : DONOLD query : DONOLD J
comparing_regularized_tokens : data : DONOLD query : DONOLD
exact_string_match : score : 1.0
comparing_regularized_tokens : data : DONOLD query : J
score_without_left_bias : score : 0.0
applying_oops_factor : factor : 0.950 score : 0.0
applying_missing_stem_factor : factor : 0.90 score : 0.90
comparing_regularized_tokens : data : DONOLD query : DONOLDJ
score_without_left_bias : score : 0.80
using_original_gn_score : score : 0.90
scores : gn_score : 0.90 gn_weight : 0.80 name_score : 0.955555555556 sn_score : 1.0 sn_weight : 1.0
]
----------------------------------------
Names match.
Thanks,
Arik

Reply
Hide replies

Guest

Jul 29, 2025

We have the initial implementation, and is going through the release process to get the fix pack ready.

There is new option "multipleParsedNames" introduced in Parser and high level APIs, similar to the "ibmgnr::parser::NameParser.reparseSouthwestAsianNames" option. Here is the sample code with the option enabled:

void testParser()
{
    parser::NameParser parser;
    parser.multipleParsedNames(true);


    string name = "BUSARGIN R. VIKTORVOICH";
    cout << name << endl;


    parser::ParseData parseData = parser.parseName(name, 1.0);
    for (parser::ParseAlternate parsedAlt : parseData.getNames())
    {
        parser::ParseName parsedName = parsedAlt.getParses()[0];
        cout << parsedName.getConfidence() << ": "
             << parsedName.getSurname().getText() << " , " << parsedName.getGivenName().getText() << endl;
    }
}

and here is the output of above code:

BUSARGIN R. VIKTORVOICH 
80: VIKTORVOICH , BUSARGIN R 
19: R VIKTORVOICH , BUSARGIN

The current implementation is focusing on initials as it will take much more time to handle other tokens without give name or surname stats in the linguistic database.

It's able to handle multiple tokens as well. Here is the output with extra initial in the middle:

BUSARGIN K. R. VIKTORVOICH 
80: VIKTORVOICH , BUSARGIN K R 
50: R VIKTORVOICH , BUSARGIN K 
19: K R VIKTORVOICH , BUSARGIN

Please let us know if there is other concern or questions.

Thanks,

Randy

1 reply

Guest

Jun 10, 2025

Hi Randy / Manoj,
As discussed in the call today, please find attached list of sample Names for which we are missing hits due to parsing issue on the GNM side. The Initials in the name are not getting considered for parsing as part of both GN and SN
Thanks & Regards
Amish Agarwal

Initial Parsing...

Initial Parsing GNM Issue.xlsx
Initial Parsing GNM Issue.xlsx
Open full size
Initial Parsing GNM Issue.xlsx

1 reply
Hide replies

Guest

Apr 2, 2025

Hi Team,
Request you to please help with update on the mentioned enhancement. Any tentative timeline for this to be implemented
Thanks & Regards
Amish Agarwal
NICE Actimize

1 reply
Hide replies

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.

Please enter your email address

RELATED IDEAS

Enhance GNM to screen 'Initials' in the name on both the SN and the GN side of processing