So in this tutorial we are going to be streaming Kafka data, but we need somewhere to store it. This could be local files, another database or S3 to name a few. But in this case we’re going to persist the data into a HDFS cluster.
Deploy via the Juju GUI
The easisest was to get going is via the GUI.
Go to jaas.ai and in the search the store box on the toolbar, enter the phrase, Anssr Hadoop. In the results bar you’ll see a bundle, select this bundle and click on Add to new model in the top of the bundle page.
Once added to a new model, you can see the charms laid out on the canvas. You can see that it consists of 5 applications and by default 5 machines are used to deploy it.
The applications are the Namenode, Resource Manager, Workers, Client and Plugin. This is a fully working Hadoop stack and so once deployed will spin up 3 workers to distirbute the processing load, but as we’re just testing here, we can keep costs down and reduce this to 1. To do this click on the Worker charm, this is the left hand most icon, and then in the top left you should see some details.
Click on Units, check 2 of the checkboxes and click the remove button.
Once you’ve done this click Commit changes in the bottom right of the screen.
If you’ve not logged into the charm store at this point you will be asked to Login or Sign up to Juju, this uses Ubuntu One, so if you’ve already got an account you can enter it here.
Next you will be asked where you want to deploy your Hadoop cluster. Depending on your cloud choices you can then select from AWS, Azure or GCP. You will need to enter your cloud credentials and its advised you upload your SSH key using the manual SSH key entry or use the Github or Launchpad key installers. Make sure you click the Add key button before moving on.
From there you then need to click the Commit button.
As machines get started and applications deployed, the charms on the canvas should get different coloured outlines to indicate their status. You can also find out more about their current state by clicking on the Status tab. When finished all the charms should be in the Active state with a ready status message.
Deploy via CLI
If on the other hand you prefer using the CLI, this is how you do it.
First you need to add a model:
juju add-model streaming <cloud name>
More details can be found here
Then you can deploy the model:
juju deploy ~spiculecharms/anssr-hadoop
And scale down the workers for this tutorial:
juju remove-unit -n 2 worker
To keep an eye on whats going on run: