Skip to main content

Setting CircleCI to automate collector runs

This page provides basic guidance on how to use CircleCI to automate the operation of data.world collectors. You can further improve your process with advanced features offered by CircleCI. The information on this page is written using the Tableau collector. The same steps apply to all collectors.

Important

As a pre-requisite you should have already setup CircleCI and Github and should have knowledge about using these tools.

STEP 1: Set up the collectors

  1. Follow the instructions for generating a command for running the collectors. For parameters that you want to set as environment variables, set the values as variables instead of real values. For safety, the system already suggests using variables for sensitive data like passwords.

  2. Note down the environment variables and directories to be created for the collector.

    tableau01.png
  3. Download the YAML file for running the collector.

    tableau02.png
  4. At the end, note the command for running the YAML file.

    tableau_03.png

STEP 2: Store the collector YAML file to GitHub repository

  • Store the yaml file you downloaded from data.world in the GitHub repository. The CircleCI job will use this yaml file to reference config settings later on.

    If the collector requires any additional files, add them to the same directory. For example, the AWS Glue collector requires an additional credentials file.

    yaml_in_github.png

STEP 3: Set up the environment variables

  • In the CircleCI project settings, go to environment variables and create required environment variables for the collector. Make sure the environment variable names are in uppercase, for example, DW_AUTH_TOKEN.

    create_variables.png

STEP 4: Set up the Circle CI configl.yml file for running the collectors

  1. Open your Circle CI project files in GitHub.

  2. In the .circleci folder, browse to config.yml. In the file, set up a job to run the collector.

    setup_circleCI_v2.png
  3. Provide a description.

  4. In the machine section, provide the details of the machine you want to use for running the collector job. We recommend using the ubuntu-2004 Linux machine.

     machine:
          image: ubuntu-2004:current  #Use ubuntu so it works with the dwcc docker image. 
  5. In the steps do the following:

    1. If the collector requires a specific driver, first add a step to install that driver on the ubuntu machine. This step needs to be run before running the collector step. For example, the MySQL collector requires a specific driver to be installed on the machine.

    2. Next, checkout the code for the repository in Github. You need to do this to get access to the YAML file you got from data.world. Create a directory on the Linux machine and provide the command to move the collector YAML file to the directory you just created.

      If you have any other any other collector specific files that need to be moved, add the command to move those files as well.

      steps:
      -checkout #checkout repo to get the dwcc config file.
      -run:
         name: Create dwcc folder, copy config file from mca folder to dwcc folder
         command: |
         mkdir ${HOME}/dwcc
         echo "Copying config file created in previous step to ${HOME}/dwcc folder..."
         cp dwcc/tableau/config.yml ${HOME}/dwcc/config.yml
         echo "Displaying the config file for debugging..."
         cat ${HOME}/dwcc/config.yml
    3. Next, add the the run command to run the collector. Provide a name, description, timeout, and the command you copied from data.world to run the command.

      - run:
         name: Run dwcc for Tableau
         description: |
         The code below is generated by the Collector Wizard in data.world. 
          no_output_timeout: 20m
          command: | 
          docker run -it --rm --mount type=bind,source=${HOME}/dwcc,target=/dwcc-output \
          --mount type=bind,source=${HOME}/dwcc,target=${HOME}/dwcc \
          -e DW_AUTH_TOKEN=${DW_AUTH_TOKEN} -e DW_TABLEAU_NAME=${DW_TABLEAU_NAME} -e DW_TABLEAU_SECRET=${DW_TABLEAU_SECRET} \
          datadotworld/dwcc:2.149 --config-file=/dwcc-output/config.yml
    4. Commit the change.

STEP 5: Run the CircleCI Job for collectors

  1. In the config.yml file add a workflow to run the job you created for the collector. Optionally add a parameter to ignore Main if you want to first test your changes.

    run_job_circleci.png
  2. Make sure to save your changes and push the commit, which initiates the job run in CircleCI. The automatic triggering of the job in CircleCI upon commit push depends on how your CircleCI environment is configured. For setting it up so that a workflow is triggered automatically, consult the CircleCI documentation.

  3. You can now go to CircleCI console and check the job steps to see all got triggered properly.

    circleci_checkjob_run.png
  4. Next, you can go to your data.world installation and check the dataset to verify that the output is visible there.

STEP 5: Schedule automatic runs for jobs