Scrub the Distributed File Systems
OK, I’ve scrubbed the ideas of using Hadoop and Ceph. Although I believe both solutions to be great for what they are intended for, they do not work with the ideas I’ve in mind for keywordbuttons.com
Since the i/o to KWB is in tiny chunks, Hadoop doesn’t work well – I knew this before I embarked on the journey of using Hadoop, but thought I could mitigate that by using low level search on the disks – that is by optimizing the manner in which look-ups could be done by inserting KWB data – that is to embed <myowntag> references. I may use this idea later, but it really doesn’t make sense to go to all the trouble of doing that. Hadoop really is more for managing huge files (with having optionally large numbers of replication).
For Ceph, the initial setup was easy, but then setting up CephFS was overly complicated IMHO. The documentation was sufficient, and I do think I could have finished the install, but I decided to abandon it because it seemed like overkill as well.
Finally, I’m back to seven Linux boxes, with a passwordless ssh installed, and using scp to copy scripts to 6 of them (under control of a master server). I’ve been able to demonstrate a “hello world” application on each server, after having written a PHP script to the remote file system, and executing it locally on each respective node on the network.
In this way we can distribute the processes of i/o between six servers. The reasons for this are to do lookups on KWB initiated requests, for six separate servers (on the “backend”).
Right now KWB does the lookups as needed, and there is a slight delay between when the content areas are propagated.
So that’s where I am at now in my development of KWB.
I’ve already a “queue” of tasks, waiting to run. The trick will be to determine where in the system the content references will be.
Do I want to funnel them down to one server, or distribute them among the six additional servers?
This question should be answered in the next post.