This portal is to open public enhancement requests against products and services offered by the IBM Data Platform organization. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).
Shape the future of IBM!
We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:
Search existing ideas
Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,
Post your ideas
Post ideas and requests to enhance a product or service. Take a look at ideas others have posted and upvote them if they matter to you,
Post an idea
Upvote ideas that matter most to you
Get feedback from the IBM team to refine your idea
Specific links you will want to bookmark for future use
Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.
IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.
ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.
IBM Employees should enter Ideas at https://ideas.ibm.com
Hi, Arik,
There has been a special rule for three-phase European name in GNM:
This explains why "Donald J. Trump" is parsed that way since the culture of the name is "ANGLO" since the new multiple parse option does not touch that logic.
The new multiple parse option is independent from the alternative reparsing. It's from the same original tokens , not the rotated one as in alternative reparsing, thus not impacted by the alternative parsing threshold.
Thanks,
Randy
Hi Randy,
1. Correction for "Vladimir V. Putin" - it does parses into 2 names when parser.multipleParsedNames(true).
2. I thought that maybe the second parse of initials (same as regular second parse) is provided only if the first break confidence is lower than the reparse threshold.
But even setting ReparseThreshold to 1 does not provide a second parse of "Donald J. Trump" or "John F. Kennedy" that their break confidence is 90.
Regards,
Arik
Hi Randy,
I am using gnm_7.0.0.0.LAiFix009.
Can you please explain what is the difference between the breaking of "BUSARGIN R. VIKTORVOICH" to the breaking of "Donald J. Trump"?
The first, as expected, is broken into 2 names when "parser.multipleParsedNames(true)" and to a single name when "parser.multipleParsedNames(false)".
The second is being broken into a single name (see below) regarless of calling parser.multipleParsedNames(true) or parser.multipleParsedNames(false).
Same with "Donald J Trump", "John F. Kennedy", "Vladimir V. Putin", etc.
Why this feature works with some names but not with other names?
NAME_MATCH_WHY("BUSARGIN R. VIKTORVOICH", "BUSARGIN VIKTORVOICH")
----------------------------------------
[
comparing_names : data_gn : BUSARGIN data_sn : VIKTORVOICH query_gn : BUSARGIN query_sn : R VIKTORVOICH
cultures : data_gn_culture : 0 data_sn_culture : 6 query_gn_culture : 0 query_sn_culture : 6
bitmap_keys : match : true
comparing_fields : data : VIKTORVOICH query : R VIKTORVOICH
comparing_tokens : data : VIKTORVOICH query : R
score_without_left_bias : score : 0.0
comparing_tokens : data : VIKTORVOICH query : VIKTORVOICH
exact_string_match : score : 1.0
applying_oops_factor : factor : 0.950 score : 0.950
applying_missing_stem_factor : factor : 0.850 score : 0.80750
comparing_tokens : data : VIKTORVOICH query : RVIKTORVOICH
score_without_left_bias : score : 0.880
using_compressed_score : factor : 1.0 maximum : 0.950 score : 0.880
comparing_fields : data : BUSARGIN query : BUSARGIN
exact_string_match : score : 1.0
scores : gn_score : 1.0 gn_weight : 0.80 name_score : 0.933333333333 sn_score : 0.880 sn_weight : 1.0
]
----------------------------------------
[
comparing_names : data_gn : BUSARGIN data_sn : VIKTORVOICH query_gn : BUSARGIN R query_sn : VIKTORVOICH
cultures : data_gn_culture : 0 data_sn_culture : 6 query_gn_culture : 0 query_sn_culture : 6
bitmap_keys : match : true
comparing_fields : data : VIKTORVOICH query : VIKTORVOICH
exact_string_match : score : 1.0
comparing_fields : data : BUSARGIN query : BUSARGIN R
comparing_tokens : data : BUSARGIN query : BUSARGIN
exact_string_match : score : 1.0
comparing_tokens : data : BUSARGIN query : R
score_without_left_bias : score : 0.0
applying_oops_factor : factor : 0.950 score : 0.0
applying_missing_stem_factor : factor : 0.850 score : 0.850
comparing_tokens : data : BUSARGIN query : BUSARGINR
score_without_left_bias : score : 0.842105263158
scores : gn_score : 0.850 gn_weight : 0.80 name_score : 0.933333333333 sn_score : 1.0 sn_weight : 1.0
]
----------------------------------------
Names match.
NAME_MATCH_WHY("Donald J. Trump", "Donald Trump")
----------------------------------------
[
comparing_names : data_gn : DONALD data_sn : TRUMP query_gn : DONALD J query_sn : TRUMP
regularized_names : data_gn : DONOLD data_sn : TRUMP query_gn : DONOLD J query_sn : TRUMP
cultures : data_gn_culture : 1 data_sn_culture : 1 query_gn_culture : 1 query_sn_culture : 1
bitmap_keys : match : true
comparing_fields : data : TRUMP query : TRUMP
exact_string_match : score : 1.0
comparing_fields : data : DONALD query : DONALD J
comparing_tokens : data : DONALD query : DONALD
exact_string_match : score : 1.0
comparing_tokens : data : DONALD query : J
score_without_left_bias : score : 0.0
applying_oops_factor : factor : 0.950 score : 0.0
applying_missing_stem_factor : factor : 0.90 score : 0.90
comparing_tokens : data : DONALD query : DONALDJ
score_without_left_bias : score : 0.80
comparing_regularized_fields : data : TRUMP query : TRUMP
exact_string_match : score : 1.0
applying_regularize_score_max : score : 0.980
using_original_sn_score : score : 1.0
comparing_regularized_fields : data : DONOLD query : DONOLD J
comparing_regularized_tokens : data : DONOLD query : DONOLD
exact_string_match : score : 1.0
comparing_regularized_tokens : data : DONOLD query : J
score_without_left_bias : score : 0.0
applying_oops_factor : factor : 0.950 score : 0.0
applying_missing_stem_factor : factor : 0.90 score : 0.90
comparing_regularized_tokens : data : DONOLD query : DONOLDJ
score_without_left_bias : score : 0.80
using_original_gn_score : score : 0.90
scores : gn_score : 0.90 gn_weight : 0.80 name_score : 0.955555555556 sn_score : 1.0 sn_weight : 1.0
]
----------------------------------------
Names match.
Thanks,
Arik
We have the initial implementation, and is going through the release process to get the fix pack ready.
There is new option "multipleParsedNames" introduced in Parser and high level APIs, similar to the "ibmgnr::parser::NameParser.reparseSouthwestAsianNames" option. Here is the sample code with the option enabled:
and here is the output of above code:
The current implementation is focusing on initials as it will take much more time to handle other tokens without give name or surname stats in the linguistic database.
It's able to handle multiple tokens as well. Here is the output with extra initial in the middle:
Please let us know if there is other concern or questions.
Thanks,
Randy
Hi Randy / Manoj,
As discussed in the call today, please find attached list of sample Names for which we are missing hits due to parsing issue on the GNM side. The Initials in the name are not getting considered for parsing as part of both GN and SN
Thanks & Regards
Amish Agarwal
Hi Team,
Request you to please help with update on the mentioned enhancement. Any tentative timeline for this to be implemented
Thanks & Regards
Amish Agarwal
NICE Actimize