NAME
SYNOPSIS
DESCRIPTION
RLESearch Public Instance Methods
Detailed description of search
AUTHOR
SEE ALSO
Related Classes
Books and Journals
Related WWW Links
ACKNOWLEDGEMENTS

 

NAME

RegSearch - Registration search using probabilistic record linking

 

 

SYNOPSIS

  use NED::RegSearch;

 

  $rsh = RegSearch->new($dbh);
  $ary_ref = $rsh->search($query_string);
  $ary_ref = $rsh->search($query_string, $low_cutoff);
  $str = $rsh->searchSQL;

 

 

DESCRIPTION

RegSearch extends class the RLESearch manpage to search the NIH Electronic Directory (NED) on the following nihInetOrgPerson directory attributes:

 
nihSSN Social Security Number (SSN)
sn Surname
givenName
middleName
nihAliasSn Alias Surname
nihAliasGivenName Alias Given Name
nihAliasMiddleName Alias Middle Name
nihDateOfBirth
nihCityOfBirth
nihGender

 

RLESearch Public Instance Methods

new
  $rsh = RegSearch->new($dbh);

Returns a handle to a new instance of a RegSearch object, initialized to search the database specified by the the DBI manpage database handle $dbh.

 

search
  $ary_ref = $rsh->search($query);
  $ary_ref = $rsh->search($query, $low_cutoff);

The search method is inherited from class the RLESearch manpage.

The argument $query is a hash reference or character string that specifies the search criteria. If $query is a hash reference, it is used as-is, and is assumed to contain all-uppercase keys with the names of nihInetOrgPerson attributes as listed above, with the attribute values to be used for the search.

If $query is a string, it must use Perl's syntax for initializing a hash, excluding the enclosing braces.

The query must include the sn (surname) attribute. All other attributes are optional.

Case and whitespace are ignored.

The value of nihDateOfBirth is in YYYY-MM-DD format. The '-'s are required, but YYYY, MM, or DD may be empty.

For the multi-valued attributes nihAliasSn, nihAliasGivenName, and nihAliasMiddleName, the attribute value is reference to an array of strings. This is denoted in a query string by a list of strings enclosed in square brackets and separated by commas. These three attributes must always have the same number of values specified.

Example $query string:

 

  "nihSSN => '123456789', nihGender => 'M',
   sn => 'DOE', givenName => 'John', middleName => 'X',
   nihAliasSn =>         [ 'DOE',  'DOUGH' ],
   nihAliasGivenName =>  [ 'Jack', 'Jay' ],
   nihAliasMiddleName => [ '',     'X' ],
   nihDateOfBirth => '1986-03-',
   nihCityOfBirth => 'Chicago'"

This example query specifies two alias names: ``DOE, Jack'' and ``DOUGH, Jay X''.

See the RLESearch manpage for a desription of the search results.

 

searchSQL
  $str = $rsh->searchSQL;

Returns a string containing the SQL that was used to retrieve the search records for the last search.

 

 

Detailed description of search

The query argument specifies a record linking operation against a subset of DIR_PERSON records in the NED relational database.

The query record set consists of one record for the surname, plus one record for each nihAliasSn. All other attributes specified are included on all query records. Thus, for the example query above the query record set is:

 

  (1, 123456789, DOE,   JOHN, X, 1986, 03, , CHICAGO, , , M)
  (2, 123456789, DOE,   JACK,  , 1986, 03, , CHICAGO, , , M)
  (3, 123456789, DOUGH, JAY,  X, 1986, 03, , CHICAGO, , , M)

(The empty fields after CHICAGO are reserved for future implementation of nihStateOfBirth and nihCountryOfBirth.)

The search record set consists of ALL distinct DIR_PERSON records meeting ANY of the following criteria:

 
SOUNDEX(sn) = SOUNDEX(DIR_PERSON.sn)

 

SOUNDEX(sn) = SOUNDEX(PERSON_ALIAS.nihAliasSn)

 

SOUNDEX(nihAliasSn) = SOUNDEX(DIR_PERSON.sn)

 

SOUNDEX(nihAliasSn) = SOUNDEX(PERSON_ALIAS.nihAliasSn)

 

nihSSN = DIR_PERSON.nihSSN

 

nihSSN matches DIR_PERSON.nihSSN in all but one digit

 

nihSSN matches DIR_PERSON.nihSSN if two adjacent digits transposed

 

nihSSN matches DIR_PERSON.nihSSN if one digit inserted

 

nihSSN matches DIR_PERSON.nihSSN if one digit deleted

 

Attributes are compared using the following RLESearch comparators:

 

  Attribute                       Comparator
  ---------                       ----------
  nihSSN                          _compareNumeric
  sn, nihAliasSn                  _compareFreq
  givenName, nihAliasGivenName    _compareFreq
  middleName, nihAliasMiddleName  _compareFreq (middle 
                                          initial only)
  year, month, and day of birth   _compareExact
  nihCityOfBirth                  _compareOrdinary
  nihGender                       _compareExact

 

 

AUTHOR

  Keith Gorlen
  Center for Information Technology
  National Institutes of Health
  Federal Building, Room 816A
  7550 Wisconsin Ave MSC 9100
  BETHESDA MD 20892-9100
  Phone: 301-496-1111, FAX: 301-594-1151
  Email: kg2d@nih.gov

 

 

SEE ALSO

 

Related Classes

  DBI - Database independent inteface for Perl
  NED::RLESearch - Registration Search using RLESearch
  NED::AbsFreqTbl - Absolute frequency tables for record linking

 

 

Books and Journals

  1. Gill, L. E., and Baldwin, J. A. (1987), ``Methods and technology of record linkage: some practical considerations'' in J. Baldwin, E. D. Acheson, and W. Graham (ed.) Textbook of Medical Record Linkage, Oxford: Oxford University Press, 39-54. M- and U- probabilities for place of birth and gender were calculated from Table 2.4 Match weights used by the Oxford Record Linkage Study for the 16-year (1963-1978) file.

     

  2. Newcombe, H. B. (1988), Handbook of Record Linkage: Methods for Health and Statistical Studies, Administration, and Business, Oxford: Oxford University Press, pp. 51-53. M- and U- probabilities for year, month, and day of birth are from Tables 17.1 and 18.1.

     

 

Related WWW Links

  http://www.census.gov/srd/www/reclink/reclink.html

 

 

ACKNOWLEDGEMENTS

Many thanks to William Winkler, Statistical Research Division, U. S. Bureau of the Census, for his software, help, and advice.