
Strcmp95 - Jaro/Winkler weighted string comparison

use NED::Strcmp95;
NED::Strcmp95::strcmp95(str1, str2,
fixed_len,
case_match,
exact_match)

strcmp95 returns a number between 0.0 and 1.0 that indicates the similarity between
two character strings.
s1 and s2 are pointers to the 2 strings to be compared.
len is the length of the strings. When comparing strings of unequal length, pad
the shorter string with blanks. NOTE: returns 0.0 if either or both strings are blank.
fixed_len, case_match, and exact_match define whether certain options should be activated. A nonzero value
indicates the option is
deactivated.
The options are:
- fixed_len
-
Increase the probability of a match when the number of matched characters
is large. This option allows for a little more tolerance when the strings
are large. It is not an appropriate test when comparing fixed length fields
such as phone and social security numbers.
- case_match
-
All lower case characters are converted to upper case prior to the
comparison. Disabling this feature means that the lower case string
``code'' will not be recognized as the same as the upper case string
``CODE''. Also, the adjustment for similar characters section only applies
to uppercase characters.
- exact_match
-
Counts similar characters in addition to exact comparisons of the main
loop.
The suggested values are all zeros for character strings such as names.

Keith Gorlen
Division of Computer Research and Technology
National Institutes of Health
Federal Building, Room 816A
7550 Wisconsin Ave MSC 9100
BETHESDA MD 20892-9100
Phone: 301-496-1111, FAX: 301-594-1151
Email: kg2d@nih.gov

Winkler, W. E. (1990), ``String Comparator Metrics and Enhanced Decision
Rules in the Fellegi-Sunter Model of Record Linkage'',
Proceedings of the Section on Survey Research Methods, American
Statistical Association, 472-477.
Winkler, W. E. (1994), ``Advanced Methods of Record Linkage,''
American Statistical Association, Proceedings of the Section of
Survey Research Methods, 467-472.

Strcmp95 is an XS interface to C-language code written by William Winkler,
Statistical Research Division, U. S. Bureau of the Census.