NAME
SYNOPSIS
DESCRIPTION
AUTHOR
SEE ALSO
ACKNOWLEDGEMENTS

 

NAME

Strcmp95 - Jaro/Winkler weighted string comparison

 

 

SYNOPSIS

  use NED::Strcmp95;
  NED::Strcmp95::strcmp95(str1, str2,
                          fixed_len,
                          case_match,
                          exact_match)

 

 

DESCRIPTION

strcmp95 returns a number between 0.0 and 1.0 that indicates the similarity between two character strings.

s1 and s2 are pointers to the 2 strings to be compared.

len is the length of the strings. When comparing strings of unequal length, pad the shorter string with blanks. NOTE: returns 0.0 if either or both strings are blank.

fixed_len, case_match, and exact_match define whether certain options should be activated. A nonzero value indicates the option is deactivated.

The options are:

 

fixed_len
Increase the probability of a match when the number of matched characters is large. This option allows for a little more tolerance when the strings are large. It is not an appropriate test when comparing fixed length fields such as phone and social security numbers.

 

case_match
All lower case characters are converted to upper case prior to the comparison. Disabling this feature means that the lower case string ``code'' will not be recognized as the same as the upper case string ``CODE''. Also, the adjustment for similar characters section only applies to uppercase characters.

 

exact_match
Counts similar characters in addition to exact comparisons of the main loop.

 

The suggested values are all zeros for character strings such as names.

 

 

AUTHOR

  Keith Gorlen
  Division of Computer Research and Technology
  National Institutes of Health
  Federal Building, Room 816A
  7550 Wisconsin Ave MSC 9100
  BETHESDA MD 20892-9100
  Phone: 301-496-1111, FAX: 301-594-1151
  Email: kg2d@nih.gov

 

 

SEE ALSO

Winkler, W. E. (1990), ``String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage'', Proceedings of the Section on Survey Research Methods, American Statistical Association, 472-477.

Winkler, W. E. (1994), ``Advanced Methods of Record Linkage,'' American Statistical Association, Proceedings of the Section of Survey Research Methods, 467-472.

 

 

ACKNOWLEDGEMENTS

Strcmp95 is an XS interface to C-language code written by William Winkler, Statistical Research Division, U. S. Bureau of the Census.