Java/JDBC
The JDBC API is a Java API that can access any kind of tabular data, especially data stored in a relational database.data.world’s JDBC driver allows any program written in Java or in a JVM-based language - such as Scala, Clojure, or Kotlin - to query thousands of datasets on data.world using SQL and SPARQL. In addition, this driver makes it possible to pull data from data.world into JDBC-compatible tools - such as RapidMiner, Spark, Drill, TIBCO, or Pentaho - to support and enhance ETL and business intelligence implementations.
JDBC URLs
JDBC connects to data source based on a provided JDBC url. data.world JDBC urls have the form
jdbc:data:world:[language]:[user id]:[dataset id]
where:
[language]
is eithersql
orsparql
[user id]
is the data.world id of the dataset owner[dataset id]
is the data.world id of the dataset
You can extract these ids from the dataset home page url: https://data.world/[user id]/[dataset id]
.
Sample code (Java 8)
import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.ResultSetMetaData; final String QUERY = "select * from HallOfFame where playerID = ? order by yearid, playerID limit 10"; final String URL = "jdbc:data:world:sql:dave:lahman-sabremetrics-dataset"; try (final Connection connection = // get a connection to the database, which will automatically be closed when done DriverManager.getConnection(URL, "<your user name>", "<your API token>"); final PreparedStatement statement = // get a connection to the database, which will automatically be closed when done connection.prepareStatement(QUERY)) { statement.setString(1, "alexape01"); //bind a query parameter try (final ResultSet resultSet = statement.executeQuery()) { //execute the query ResultSetMetaData rsmd = resultSet.getMetaData(); //print out the column headers int columnsNumber = rsmd.getColumnCount(); for (int i = 1; i <= columnsNumber; i++) { if (i > 1) System.out.print(", "); System.out.print(rsmd.getColumnName(i)); } System.out.println(""); while (resultSet.next()) { //loop through the query results for (int i = 1; i <= columnsNumber; i++) { //print out the column headers if (i > 1) System.out.print(", "); String columnValue = resultSet.getString(i); System.out.print(columnValue); } System.out.println(""); // Note: when calling ResultSet.getObject() prefer the version that takes an explicit Class argument: // Integer n = resultSet.getObject(param, Integer.class); } } }
Using dw-jdbc in your project
If using Maven, you can use dw-jdbc by just including the following in your pom.xml file:
<dependency> <groupId>world.data</groupId> <artifactId>dw-jdbc</artifactId> <version>0.4.1</version> </dependency>
See this link at Maven Central to find the latest version number for the JDBC driver.
For some database tools it's easier to install the jdbc driver if it's a single jar. For this reason we also provide dw-jdbc bundled with all its dependencies under the following:
<dependency> <groupId>world.data</groupId> <artifactId>dw-jdbc</artifactId> <classifier>shaded</classifier> <version>0.4.1</version> </dependency>
Visit https://data.world
Visit your user settings, and click the advanced tab.
Copy your token.
JDBC 4.2
The driver only supports read-only queries. It does not support INSERT/UPDATE/DELETE, DDL, or transactions.
Queries can be written in SPARQL 1.1 or in the SQL dialect described at https://docs.data.world/tutorials/dwsql/.
[SQL-only] Table and column metadata via
java.sql.DatabaseMetaData
.[SQL-only] Support for positional parameters via
java.sql.PreparedStatement
.[SPARQL-only] Support for named parameters via
java.sql.CallableStatement
.For example,
CallableStatement.setString("name", "value")
will bind the stringvalue
to?name
within the query.
The
DataWorldStatement.setJdbcCompatibilityLevel(JdbcCompatibility)
method can be used to adjust how the JDBC driver maps query results to Java objects injava.sql.ResultSetMetaData
. This is particularly relevant to SPARQL queries where result types in a column can vary from row to row.JdbcCompatibility.LOW
- No assumptions are made about types.ResultSetMetaData.getColumnType()
returnsjava.sql.Types.OTHER
andResultSet.getObject()
returnsworld.data.jdbc.model.Node
.JdbcCompatibility.MEDIUM
- [SPARQL default] All columns are typed as string.ResultSetMetaData.getColumnType()
returnsjava.sql.Types.NVARCHAR
andResultSet.getObject()
returnsjava.lang.String
.JdbcCompatibility.HIGH
- [SQL default] Columns are typed based on the underlying data, either using table metadata (SQL) or by inspecting the first row of the response (SPARQL).