On KWB Theory and Practice
The theory behind keyword buttons, and a critical aspect of its’ potential to succeed, as a Big Data Content Management System Model, is dependent on if we will be able to programatically assemble large lists of keywords, that are within a given topic. And that the subordinate keywords can also be of the “long tail variety”.
As described earlier, this is the main hurdle facing Keyword Buttons Today.
Several ideas have manifested in my mind since the inception of KWB, and the latest idea I wanted to share in this blog.
If we can assert that Keyword Information (long tail or otherwise) is scattered throughout the web, then certainly many “lists” of things that people are interested in also exist in these domains.
With the popularity of deep learning algorithms, and the libraries that enhance the common man to explore deep learning (at least at a base level using Python libs for example), we can envision a possibility to finally solving the issue of finding these hidden lists on the web – at least insofar as Keyword Buttons Development is concerned.
Many dictionary sites do a good job of finding content that is related to the definitions of the words, but not to a more abstract level as a “connected list”, that is lists of nouns, proper nouns, or even concepts, that are categorized under a given “list title”.
So by invoking search on the web, where various domains have many lists, presumably of “identical or near identical composition”, that is if a person is interested in “Dogs” for example, it is likely the Dog lover will have a list of Dogs on their site (or maybe not), and other Dog Lovers, may also have lists or links to lists of dogs.
What is asserted here is that assuming there is X+1 lists on the web where the list is duplicated on one or more sites, the strength of the list being a legitimate list is increased by a factor N, where N represents the similarities between these collective lists and other sites having identical or near identical lists.
So also key to this theory, is the “proximity” wherewith the list resides in the site code (or in terms of the rendered location(s) ) – which helps to authenticate code as being potentially of the “list flavor”. Logic prevails then that the variations of keywords among related websites, and in identifiable website areas, will have duplicate or near duplicate list entities.
As a caveat, the entire list may or may not be exposed via HTML, depending on the technology used in the particular site, hosting the list, but the hope is that there will be enough of those types of sites out there, which will expose their particular list via underlying HTML, and it really only takes two. Also the exposed may be partial, but this is another thing to concern ourselves with at a later point.
So it is proposed then, that by identifying keywords in areas of a website, and comparing those keywords to other websites, a percentage of “similarity”, can be asserted, which can yield a probability that the lists on the disparate sites are in fact lists “and ~ one and the same”, or at least having a high likelihood of those lists being relevant, in the construct of what they represent (lists of dogs for example).