Skip to content

Latest commit

 

History

History
201 lines (130 loc) · 8.18 KB

msk-data-gen-container-deploy.md

File metadata and controls

201 lines (130 loc) · 8.18 KB

How to Deploy MSK Data Generator in a Container running in ECS

Overview

This example uses a publicly available container and CloudFormation template to deploy MSK Data Generator running in Elastic Container Service.

It requires an existing MSK cluster configured to allow access with no authentication and plain-text access (port 9092).

The CloudFormation template is found in the deploy directory of this repo.

Before running the CFN template, consider the following requirements.

Requirements

You will need the following

  • Bootstrap Server string of your existing MSK Cluster

  • Existing Security Group ID with access to MSK Broker endpoints (minimum. more on this below)

  • Subnet ID of where to deploy MSK Data Generator

Here's a walk-through example of one way to get started

  1. From CloudFormation Console, Create Stack with upload of ../deploy/cloudformation.yml file as shown and click Next

    Create Stack

  2. Complete form according to your environment. For example

    • Stack Name: anything you want # Example below is "MSK-Data-Gen"

    • BootstrapServer: bootstrap server string of your cluster

    • EC2KeyName: existing EC2 pair in case you want to ssh to the ECS EC2 instance

    • SecurityGroupId: existing security group with access to MSK cluster AND port 8083. More on this below.

    • SubnetID: where the ECS EC2 instance will be deployed; presumes access to MSK Cluster

    Stack Details

    Note on SecurityGroupId

    At minimum, the specified Security Group should have access to MSK ports. In addition, in this example, open access for ports 8083 (to configure MSK Data Generator from your environment).

    For example, the following shows the Security Group specified in example above has ports 22, 8083, and 9000 open from my laptop IP

    Security Group 1

    In turn, my MSK Security Group allows access from this Security Group as shown

    Security Group 2

    Notice how my security group id starting with "sg-0f4" is allowed access to MSK related ports and also access to port 8083 from my laptop ip address. Of course, these values are dependent on your environment and you may simply use one security group if desired. This is just an example.

    To Continue Click Next and Next again on the following screen.

  3. Create Stack

    On the final review screen, acknowledge creation of IAM resources and click Create Stack button

    Example

    Create Stack

  4. MSK Data Generator Access

    After a few minutes, the CloudFormation template will complete and you should have a new EC2 instance called "msk-data-generator" in your EC2 Console. For example

    MSK Data Generator

    Note the public ip address or public DNS for next step

  5. Sanity Check

    Confirm you can query the Kafka Connect REST endpoint on port 8083

    For my particular example, http://ec2-3-239-203-236.compute-1.amazonaws.com:8083/connector-plugins/ and I would expect to see JSON response similar to the following

    [{"class":"com.amazonaws.mskdatagen.GeneratorSourceConnector","type":"source","version":"0.4"},
    {"class":"org.apache.kafka.connect.file.FileStreamSinkConnector","type":"sink","version":"2.7.0"},
    {"class":"org.apache.kafka.connect.file.FileStreamSourceConnector","type":"source","version":"2.7.0"},
    {"class":"org.apache.kafka.connect.mirror.MirrorCheckpointConnector","type":"source","version":"1"},
    {"class":"org.apache.kafka.connect.mirror.MirrorHeartbeatConnector","type":"source","version":"1"},
    {"class":"org.apache.kafka.connect.mirror.MirrorSourceConnector","type":"source","version":"1"}]
    

    Assuming you were successful on both of these sanity checks, you are now ready to start and configure the MSK Data Generator.

  6. Start Generating Data into MSK

    To start generating data, POST in configuration. (See the ./examples directory for examples).

    For example, in my environment, I could POST the following JSON to start generating data

    curl -X POST -H "Content-Type: application/json" -d @./examples/new-orders.json http://ec2-3-239-203-236.compute-1.amazonaws.com:8083/connectors

    and if successful, you'll see a response similiar to the following

    {
         "name": "msk-data-generator",
         "config": {
           "connector.class": "com.amazonaws.mskdatagen.GeneratorSourceConnector",
           "genkp.customer.with": "#{Code.isbn10}",
           "genv.customer.name.with": "#{Name.full_name}",
           "genv.customer.gender.with": "#{Demographic.sex}",
           "genv.customer.favorite_beer.with": "#{Beer.name}",
           "genv.customer.state.with": "#{Address.state}",
           "genkp.order.with": "#{Code.isbn10}",
           "genv.order.product_id.with": "#{number.number_between '101','109'}",
           "genv.order.quantity.with": "#{number.number_between '1','5'}",
           "genv.order.customer_id.matching": "customer.key",
           "global.throttle.ms": "2000",
           "global.history.records.max": "1000",
           "name": "msk-data-generator"
         },
         "tasks": [],
         "type": "source"
       }
    
  7. Confirm Data Generation

    At this point, you can confirm you are generating data. If you used the above example, you'll see events in the order and customer topics now. For example, if we run the console consumer

    bin/kafka-console-consumer.sh --topic order --bootstrap-server b-2.XXXcluster-msk.nb7mmr.c21.kafka.us-east-1.amazonaws.com:9092,b-3.XXXcluster-msk.nb7mmr.c21.kafka.us-east-1.amazonaws.com:9092,b-1.XXXcluster-msk.nb7mmr.c21.kafka.us-east-1.amazonaws.com:9092
    

    we should see events such as

{"quantity":"2","product_id":"114","customer_id":"e45bb3bb-35e7-4314-90e7-e0adf8df8c57"} {"quantity":"2","product_id":"140","customer_id":"b78a7812-32fb-40c0-b028-f2111589fc61"}


    Nice.  We see two events in the example above.  

    Next steps are learning more about configuration options so you can customize
    the data being generated.  Also, you'll benefit from knowing more about how to operate the
    data generator.


## MSK Data Generation Operations

  In the example above, MSK Data Generator is deployed in a single node Kafka Connect cluster running
  in distributed mode.  This simply means creating or updating data generation configuration is accomplished
  through expected REST API endpoint calls.  

  Examples

  * To update configuration of an existing running data generator, use a PUT call, but as with Kafka Connect, remove `config` wrapper when updating

    Example of updating an already running data generator (notice PUT URI to /connectors/msk-data-generator/config and `config` wrapper element in JSON is missing)

    ```
    curl -X PUT -H "Content-Type: application/json" \
    -d `{ "name": "msk-data-generator",
            "connector.class": "com.amazonaws.mskdatagen.GeneratorSourceConnector",

            "genv.impressions.bid_id.with": "#{Code.isbn10}",
            "genv.impressions.i_timestamp.with":"#{date.past '10','SECONDS'}",
            "genv.impressions.campaign_id.with": "#{Code.isbn10}",
            "genv.impressions.creative_details.with": "#{Color.name}",
            "genv.impressions.country_code.with": "#{Address.countryCode}",


            "genkp.clicks.with": "#{Code.isbn10}",
            "genv.clicks.c_timestamp.with":"#{date.past '10','SECONDS'}",
            "genv.clicks.correlation_id.matching": "impressions.value.bid_id",
            "genv.clicks.tracker.with": "#{Lorem.characters '15'}",

            "global.throttle.ms": "5000",
            "global.history.records.max": "10"
        }`
     http://ec2-3-239-203-236.compute-1.amazonaws.com:8083/connectors/msk-data-generator/config
    ```

  * List running connectors

    Example
    `curl http://ec2-3-239-203-236.compute-1.amazonaws.com:8083/connectors/`

  * Stop or Delete a running connector

    Example

    `curl -X DELETE -H "Content-Type: application/json"  http://ec2-3-239-203-236.compute-1.amazonaws.com:8083/connectors/msk-data-generator/`