Enterprise docs

Java/JDBC

The JDBC API is a Java API that can access any kind of tabular data, especially data stored in a relational database.data.world’s JDBC driver allows any program written in Java or in a JVM-based language - such as Scala, Clojure, or Kotlin - to query thousands of datasets on data.world using SQL and SPARQL. In addition, this driver makes it possible to pull data from data.world into JDBC-compatible tools - such as RapidMiner, Spark, Drill, TIBCO, or Pentaho - to support and enhance ETL and business intelligence implementations.

JDBC URLs

JDBC connects to data source based on a provided JDBC url. data.world JDBC urls have the form

jdbc:data:world:[language]:[user id]:[dataset id]

where:

  • [language] is either sql or sparql

  • [user id] is the data.world id of the dataset owner

  • [dataset id] is the data.world id of the dataset

You can extract these ids from the dataset home page url: https://data.world/[user id]/[dataset id].

Sample code (Java 8)

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;


final String QUERY = "select * from HallOfFame where playerID = ? order by yearid, playerID limit 10";
final String URL = "jdbc:data:world:sql:dave:lahman-sabremetrics-dataset";


try (final Connection connection =    // get a connection to the database, which will automatically be closed when done
         DriverManager.getConnection(URL, "<your user name>", "<your API token>");
     final PreparedStatement statement = // get a connection to the database, which will automatically be closed when done
         connection.prepareStatement(QUERY)) {
    statement.setString(1, "alexape01"); //bind a query parameter
    try (final ResultSet resultSet = statement.executeQuery()) { //execute the query
        ResultSetMetaData rsmd = resultSet.getMetaData();  //print out the column headers
        int columnsNumber = rsmd.getColumnCount();
        for (int i = 1; i <= columnsNumber; i++) {
            if (i > 1) System.out.print(",  ");
            System.out.print(rsmd.getColumnName(i));
        }
        System.out.println("");
        while (resultSet.next()) { //loop through the query results
            for (int i = 1; i <= columnsNumber; i++) { //print out the column headers
                if (i > 1) System.out.print(",  ");
                String columnValue = resultSet.getString(i);
                System.out.print(columnValue);
            }
            System.out.println("");

            // Note: when calling ResultSet.getObject() prefer the version that takes an explicit Class argument:
            // Integer n = resultSet.getObject(param, Integer.class);
        }
    }
}

Using dw-jdbc in your project

If using Maven, you can use dw-jdbc by just including the following in your pom.xml file:

<dependency>
    <groupId>world.data</groupId>
    <artifactId>dw-jdbc</artifactId>
    <version>0.4.1</version>
</dependency>

See this link at Maven Central to find the latest version number for the JDBC driver.

For some database tools it's easier to install the jdbc driver if it's a single jar. For this reason we also provide dw-jdbc bundled with all its dependencies under the following:

<dependency>
    <groupId>world.data</groupId>
    <artifactId>dw-jdbc</artifactId>
    <classifier>shaded</classifier>
    <version>0.4.1</version>
</dependency>
Finding your Token
  1. Visit https://data.world

  2. Visit your user settings, and click the advanced tab.

  3. Copy your token.

Features
  • JDBC 4.2

  • The driver only supports read-only queries. It does not support INSERT/UPDATE/DELETE, DDL, or transactions.

  • Queries can be written in SPARQL 1.1 or in the SQL dialect described at https://docs.data.world/tutorials/dwsql/.

  • [SQL-only] Table and column metadata via java.sql.DatabaseMetaData.

  • [SQL-only] Support for positional parameters via java.sql.PreparedStatement.

  • [SPARQL-only] Support for named parameters via java.sql.CallableStatement.

    • For example, CallableStatement.setString("name", "value") will bind the string value to ?name within the query.

  • The DataWorldStatement.setJdbcCompatibilityLevel(JdbcCompatibility) method can be used to adjust how the JDBC driver maps query results to Java objects in java.sql.ResultSetMetaData. This is particularly relevant to SPARQL queries where result types in a column can vary from row to row.

    • JdbcCompatibility.LOW - No assumptions are made about types. ResultSetMetaData.getColumnType() returns java.sql.Types.OTHER and ResultSet.getObject() returns world.data.jdbc.model.Node.

    • JdbcCompatibility.MEDIUM - [SPARQL default] All columns are typed as string. ResultSetMetaData.getColumnType() returns java.sql.Types.NVARCHAR and ResultSet.getObject() returns java.lang.String.

    • JdbcCompatibility.HIGH - [SQL default] Columns are typed based on the underlying data, either using table metadata (SQL) or by inspecting the first row of the response (SPARQL).