Skip to content

pramsey/pgsql-addressing-dictionary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Jul 13, 2022
bf3590e · Jul 13, 2022

History

32 Commits
Feb 5, 2015
Feb 5, 2015
Mar 30, 2015
Mar 31, 2015
Nov 1, 2019
Dec 21, 2019
Jul 13, 2022
Mar 30, 2015
Feb 5, 2015
Nov 1, 2019
Mar 30, 2015
Sep 8, 2020
Feb 5, 2015
Feb 5, 2015
Mar 30, 2015

Repository files navigation

TSearch Address Dictionaries

Motivation

Trying to use full-text indexing for address data with the default language configurations doesn't work well: if you use an 'english' configuration, you end up getting street names stemmed by the english dictionary. Using the 'simple' configurations is better but still not optimal: there is no handling of obvious abbreviations, so you might fail to match "12 N Oak st" to "12 North Oak Street", or vice versa.

This extension provides an 'addressing_en' configuration that has common (Canada Post enumerated) street type abbreviations, direction tokens (n, s, e, w, etc) and numeric street variants (first/1st) handled. There's sure to be piles more, as well as many cases that cannot be handled with a simple full-text tokenizing strategy (st = street or saint?). However, using full-text features for basic geocoding is too convenient to ignore.

Using full-text indexing for nationwide address data might still be a bad idea, since there are so many possible false or multiple matches to find, but for a single city, county or even state, this approach can generate a quick'n'dirty address lookup routine perfect to tying to an autocomplete form field on a web page.

Currently there is only support for English (really, North American English) addressing -- pull requests gratefully received.

Installation

Clone the repository, ensure that pg_config is on your path, and run make install to copy the dictionary files into place.

Examples

> CREATE EXTENSION addressing_dictionary;

> SELECT to_tsvector('addressing_en', '1234 n main st');

                 to_tsvector                  
----------------------------------------------
 '1234':1 'main':4 'n':2 'north':3 'street':5

Caveats

If using v1.1, then cardinal directions are tokenized as both the single letter abbreviation, and the word. This is to avoid ambiguity. See example above.

About

TSearch dictionaries for addresses

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published