Community docs

Connecting to an AWS RDS database through an SSH tunnel

If you have a database hosted on Amazon Web Services(AWS) that is not publicly accessible, you can allow data.world to connect to it by using an SSH tunnel

In this scenario, you will launch a publicly accessible SSH server (sometimes called a bastion server) in the same AWS Virtual Private Cloud (VPC) as your AWS database.

You will then configure data.world to connect to your SSH server instead of directly to the database. The public SSH server will forward data.world's requests to the private database.

By using this type of connection, you can keep your database hidden from the public internet and instead rely on your SSH server to handle the security and access control for connections to that database.

Find your database's VPC and Port

Login to your AWS Management Console.

For RDS databases(including SQL Server, MySQL, PostgreSQL, or Oracle), click on Database>RDS in the center of the page.

Click on Databases on the left side of the Amazon RDS page.

Click the DB Instances link in the middle of the page to see a list of active databases.

Click on the name of the database in the DB identifier column to open its details page.

Within the Connectivity and Security section at the bottom of the page, take note of the following information:

  • Endpoint & port > Endpoint

  • Endpoint & port > Port

  • Networking > VPC

Create an SSH server on an Amazon EC2 instance

Go to the AWS Management Console main page and click Compute>EC2 from the center of the page.

In the middle of the page, select Launch Instance.

The Instance configuration requires the following steps:

  1. Choose AMI (Amazon Machine Image): Any Linux based AMI will be appropriate for this setup - SSH is the only program required.

  2. Choose Instance Type: This will be a low memory and storage application. Depending on your usage requirements, a free tier t2.micro instance may be sufficient.

  3. Configure Instance: ensure the following settings are configured:

    • Network: select your database's VPC from Task 1

    • Subnet: Choose a public subnet

    • Auto-assign Public IP: Use subnet setting(Enable)

  4. Add Storage: Accept defaults

  5. Add Tags: Accept defaults

  6. Configure Security Group:

    • Assign a security group: Select Create a new security group

    • Security group name: data.world to SSH

    • Description: Bastion server for forwarding requests from data.world to private database

      Modify the first line of the security rules to show the following values:

    • Type: SSH

    • Protocol: TCP

    • Port Range: 22

    • Source: Enter your public IP address followed by /32. You can use a Google search to find your IP

    • Description: My IP for configuration

      Click Add Rule and enter:

    • Type: SSH

    • Protocol: TCP

    • Port Range: 22

    • Source: 52.3.83.134/32

    • Description: data.world inbound connection

      Click Add Rule and enter:

    • Type: SSH

    • Protocol: TCP

    • Port Range: 22

    • Source: 52.205.195.10/32

    • Description: data.world inbound connection

      Click Add Rule and enter:

    • Type: SSH

    • Protocol: TCP

    • Port Range: 22

    • Source: 52.205.207.86/32

    • Description: data.world inbound connection

      AWS_task2_1.png
  7. Review: Verify the above settings were entered correctly and create the instance.

  8. When launching the instance, you will be prompted to select an existing key pair or create a new key pair. If you are using a previous key pair, you will need to have a copy of the key .pem file stored on your local computer from when you created the keys. If you're creating a new key pair, download that .pem file now.

    aws_keys.png
Enable forwarding from the SSH server to the database

Go to the AWS Management Console and select Compute>EC2.

Select Instance>Instances on the left side.

Find the instance you just created from the list in the main section of the page. Click on it to load its details in the lower part of the page.

You'll need to know the Public DNS and Private IP addresses shown for the next step - so keep them handy by creating a new, duplicate browser tab to complete the next set of steps.

AWS_task3_1.png
Create a new security group
  1. In your new tab, select Network & Security>Security Groups on the left side of the EC2 page.

  2. Click Create Security Group

  3. Enter the following values:

    • Security group name: SSH to your database name

    • Description: "allows traffic from the SSH server to database"

    • VPC: this is the same VPC used for the SSH server and database

  4. With the Inbound tab selected at the bottom of the window, click Add Rule

  5. A new row will populate in the list of rules. Enter the following values:

    • Type: Custom TCP Rule

    • Protocol: TCP

    • Port Range: the port number of the database

    • Source: Custom; in the blank box to the right enter the Private IP address with /32 added to the end

Add the security group to your database
  1. From the AWS Management Console, click on Database>RDS

  2. Select Databases on the left side

  3. Click on the link to your database in the DB identifier column

  4. On the database details page, click on the Modify button on the top right

  5. Scroll down to the Network & Security section; from the Security group drop down menu, add the security group that you created in the previous section (e.g. "SSH to your database name")

  6. Save the changes by scrolling to the bottom of the page and clicking the Continue button

  7. On the following page, choose when to apply the changes, then click Modify DB instance

Configure an SSH user for data.world to use to connect
  1. On MacOS or Linux, open a new Terminal window. For Windows, use an SSH client such as Putty or OpenSSH

  2. Within the terminal, navigate to the location where you downloaded the .pem key file you generated

  3. Set the permissions for the key file to be not publicly viewable, using the appropriate name if your key pair is different than our example of ssh_tunnel:

    chmod 400 ssh_tunnel.pem
  4. Connect to your AWS SSH server from the terminal using the default user for your EC2 instance. AWS will generate the command tailored to your specific instance - to find that command, you can navigate to your EC2 Instance details page and click the Connect button at the top.

    AWS_task4_1.png

    The general form will be:

    ssh -i "your key file.pem" <Your default EC2 user>@<Your EC2 Public DNS>

    For other options, please see additional guidance from Amazon at https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstances.html

  5. Once you're connected to your server, create a user group named datadotworld:

    sudo groupadd datadotworld 
  6. Create a user named ddw:

    sudo useradd -m -g datadotworld ddw
  7. Switch to the user named ddw

    sudo su - ddw
  8. Create a hidden directory called .ssh to upload your public key, setting its permissions appropriately:

    mkdir ~/.ssh
    chmod 700 ~/.ssh
  9. Enable read and write permissions for the owner on that file

  10. Create an empty file called authorized_keys and provide read and write access to its owner:

    touch authorized_keys
    chmod 600 authorized_keys
  11. Get the public key value from data.world by doing the following:

    • Open a new browser tab

    • While logged into data.world, go to https://data.world/integrations/categories/database and click on the tile of the database type you'd like to connect

    • If that integration has not yet been enabled, click the Enable Integration button (otherwise go the Manage tab and choose Add new connection)

    • In the new window that opens, click the Advanced Settings tab and check the Use SSH Tunnel box

    • Copy the SSH public key

      AWS_task4_2.png
    • Leave this window open as you'll return to it to complete the configuration

  12. Back in the terminal, add the public key to your authorized_keys file with the following command:

    echo "<Your Public Key>" >> ~/.ssh/authorized_keys

    Include the quotation marks but replace <Your Public Key> with the key you just copied from data.world in the previous step

Configure the database connection in data.world
  1. Return to the database configuration tab on data.world that you opened in the previous step

  2. Within the Advanced Settings, enter the following values:

    • SSH host: this is the Public DNS(IPv4) value in your EC2 instance created in Task 2

    • SSH user: the name of the SSH user you created in Task 4 - that's "ddw" if you followed our suggestion

  3. Go to the General Settings tab and enter the following values:

    • Display name: your choice - this is how your database connection will appear in data.world

    • Host/IP: the Endpoint value

    • Port: the Port value

    • Connection username: a valid user in your database instance

    • Connection password: the database password for that user

  4. Test the configuration and save.

With your database configured, you can now use any of the Add data mechanisms to import data into a dataset or project on data.world.