Applied AI and Machine Learning – Address Parsing – Old problem, new solution
Machine learning and AI are very rapidly moving from the realm of research to business and consumer applications. It already powers many critical functions of large business like Google, Facebook and Amazon. AI can generally be defined as software that perceives its environment and adapts to new information, predicts outcomes to make decisions and take action, and continually learns from every interaction to optimize towards its goals.
The type of AI that we focus on today all generally falls within the category of Weak or Narrow AI, which is software that equals or exceeds human effectiveness or efficiency for specific tasks. In the context of Process automation, this can be extremely powerful, since almost all tasks that make up a process are domain bound and deal with problems in a very specific context.
Address Parsing: Address parsing is an age old problem and can be fairly complex. Usually this is solved using a combination of pattern matching or string parsing. The task is made even more complex, when you do not know whether the text that needs to be parsed contains an address or not, and whether it contains any other extraneous information. There are commercial services that provide address correction and standardization, however, that opens up other privacy and security concerns (especially when you dont know what is in the text being analyzed).
In our case, the task required parsing of a string that contained an address, as well as one or two recipients in the beginning and possibly some extraneous data tacked on to the end of the string. Our solution used an open source Machine Learning (Conditional Random Fields) algorithm that was pre-trained on US addresses which when combined with some simple techniques gave us almost 95% accuracy when parsing addresses in text without the data ever leaving the local machine. (Icing on the cake: we were also able to use a ML library pre-trained on western names that was able to distinguish an individual vs. a company name in the recipient list)