Onward – 5TB’s
So, rather then expounding on history, I’ve decided I would dive right in to what I’ve been working on recently. Hadoop.
It was a bit of a learning curve to install on my COTS hardware, but in the end I’m happy to have accomplished the install. As we speak, I am initializing Hadoop to use with my Keyword Buttons Applicaiton.
It has been six hours so far, writing out folder after folder of 1GB sections of data. This is my big data area, that I will be using to write to.
These servers are old hardware with 2TB hard drives per Linux box, so it isn’t costing me an arm and a leg in order to allocate large amounts of storage. When setting up Hadoop, it is known that it handles large files much better than smaller files, so even though by today ‘s standards 1GB isn’t that large I decided on using that size.
I set the replication factor to two, since I want to optimize the usage of my 10TB available, which basically gives me RAID1, on 5TB of storage.
I won’t have to use AWS APIS, but rather native Hadoop functions in order to access the data. The goal is to integrate Keyword Buttons i/o to hadoop, so as to be able to write “Big Data”.
It is my goal then, to be able to optimize the Hadoop cluster for usage in Keyword Buttons, hereafter KWB.
The cluster system I am laying out has the format /user/cfleshner/folder/block, where folder is a numbered folder and block is a numbered block. Each folder contains 25 blocks, and each block is a single 1GB file.
The idea is to use the entire cluster (5TB), as a bucket for the urls which will be appended with meta data. It is cheaper to do it this way, than to use AWS, or other cloud based services.