In this tutorial we are going to be streaming Kafka data, but we need somewhere to store it. This could be local files, another database, or S3, to name just a few. But in this case we’re going to persist the data into a HDFS cluster.
Deploy via the Juju GUI
The easiest was to get going is via the GUI.
Go to jaas.ai and in the search bar enter the phrase “Anssr Hadoop”. In the results bar you’ll see a bundle. Select this bundle and click on “Add to new model” in the top of the bundle page.
Once added to a new model, you can see the charms laid out on the canvas. You can see that the bundle consists of 5 applications and by default 5 machines are used to deploy it.
The applications are Namenode, Resource Manager, Workers, Client, and Plugin. This is a fully working Hadoop stack and, once deployed, it will spin up 3 workers to distribute the processing load. As here we’re just testing, we can keep costs down and reduce this to 1. To do this, click on the Worker charm (this is the leftmost icon). This should bring up some details on the top left.
Click on Units, check 2 of the checkboxes, and click the remove button.
Once you’ve done this, click “Commit changes” on the bottom right-hand corner of the screen.
If you’ve not logged into the charm store. at this point you will be asked to Login or Sign up to Juju. This uses Ubuntu One so, if you’ve already got an account there, you can enter it here.
Next you will be asked where you want to deploy your Hadoop cluster. Depending on your cloud choices, you can then select from AWS, Azure or GCP. You will need to enter your cloud credentials. You may upload your SSH key using the manual SSH key entry or else use the Github or Launchpad key installers. Make sure to click the “Add key” button before moving on.
From there you then need to click the “Commit” button.
As machines get started and applications deployed, the charms on the canvas should get different coloured outlines to indicate their status. You can also find out more about their current state by clicking on the Status tab. When finished, all the charms should be in the Active state with a ready status message.
Deploy via CLI
If on the other hand you prefer using the CLI, this is how you do it.
First you need to add a model:
juju add-model streaming <cloud name>
More details can be found here.
Then you can deploy anssr-hadoop
to this model:
juju deploy cs:~spiculecharms/anssr-hadoop
And scale down the workers for this tutorial:
juju remove-unit worker/1 worker/2
To keep an eye on what’s going on, run:
juju status