Identifying Lists based on Definitions

Lists, by KWB standards are defined as a group of related nouns, pronouns or items having a relational connection, as in a “Topic”.

Also, these lists must have a clear definition for each “member” of the list.

To explore this idea, I have taken several examples and ran them through a script I’ve written over the weekend.

I’ve found the best way to play around with these ideas is to scrape dictionary.com for the definitions. Parsing the definitions and putting each definition in a separate file, helps to break the process down logically.

Since ambiguity, is the number one problem with tackling the issue of being able to assemble keyword lists, a separate folder will be allocated for each end user, primarily because of the diversified interests that can exists, and also to assist in debugging and development.

What I will do is the definitions will be in a _definitions folder for each end user, and the filename, will consist of $topic.$keyword.$x, where $x is a number between 1-N, where N is the Number of definitions for a given word.

This way, we can write script to cross-reference definitions, and look for common terms among those definitions. With this logic we can determine if one or more definitions fit into a “group” of definitions.

For example, for all fruit defined, it is likely the definitions will mention the word “fruit”, or some other common term that logically categorizes those terms.

My exploratory script (will publish later) is intended to discover these relationships, using real data defs, scraped from http://dictionary.com, in hopes that the subset of terms garnered from the site, is large enough to make the automation of list data worthwhile, for as many use cases as possible.

I will publish the results in the next few days.

Leave a Reply

Your email address will not be published. Required fields are marked *