Standalone functions

load_dataset(dataset_key, force_update=False, auto_update=False)

Load a dataset from the local filesystem, downloading it from first, if necessary.

This function returns an object of type LocalDataset. The object allows access to metedata via it’s describe() method and to all the data via three properties raw_data, tables and dataframes, all of which are mappings (dict-like structures).

  • dataset_key (str) – Dataset identifier, in the form of owner/id or of a url

  • force_update (bool) – Flag, indicating if a new copy of the dataset should be downloaded replacing any previously downloaded copy (Default value = False)

  • auto_update (bool) – Flag, indicating that dataset be updated to the latest version


The object representing the dataset

Return type



RestApiError – If a server error occurs

open_remote_file(dataset_key, file_name, mode='w', **kwargs)

Open a remote file object that can be used to write to or read from a file in a dataset

  • dataset_key (str) – Dataset identifier, in the form of owner/id

  • file_name (str) – The name of the file to open

  • mode (str, optional) – the mode for the file - must be ‘w’, ‘wb’, ‘r’, or ‘rb’ - indicating read/write (‘r’/’w’) and optionally “binary” handling of the file data. (Default value = ‘w’)

  • chunk_size (int, optional) – size of chunked bytes to return when reading streamed bytes in ‘rb’ mode

  • decode_unicode (bool, optional) – whether to decode textual responses as unicode when returning streamed lines in ‘r’ mode

  • **kwargs


>>> importdatadotworldasdw>>>>>> # write a text file>>> withdw.open_remote_file('username/test-dataset',... 'test.txt')asw:... w.write("this is a test.")>>>>>> # write a jsonlines file>>> importjson>>> withdw.open_remote_file('username/test-dataset',... 'test.jsonl')asw:... json.dump({'foo':42,'bar':"A"},w)... w.write("\n")... json.dump({'foo':13,'bar':"B"},w)... w.write("\n")>>>>>> # write a csv file>>> importcsv>>> withdw.open_remote_file('username/test-dataset',... 'test.csv')asw:... csvw=csv.DictWriter(w,fieldnames=['foo','bar'])... csvw.writeheader()... csvw.writerow({'foo':42,'bar':"A"})... csvw.writerow({'foo':13,'bar':"B"})>>>>>> # write a pandas dataframe as a csv file>>> importpandasaspd>>> df=pd.DataFrame({'foo':[1,2,3,4],'bar':['a','b','c','d']})>>> withdw.open_remote_file('username/test-dataset',... 'dataframe.csv')asw:... df.to_csv(w,index=False)>>>>>> # write a binary file>>> withdw.open_remote_file('username/test-dataset',>>> 'test.txt',mode='wb')asw:... w.write(bytes([100,97,116,97,46,119,111,114,108,100]))>>>>>> # read a text file>>> withdw.open_remote_file('username/test-dataset',... 'test.txt',mode='r')asr:... print(>>>>>> # read a csv file>>> withdw.open_remote_file('username/test-dataset',... 'test.csv',mode='r')asr:... csvr=csv.DictReader(r)... forrowincsvr:... print(row['column a'],row['column b'])>>>>>> # read a binary file>>> withdw.open_remote_file('username/test-dataset',... 'test',mode='rb')asr:...
query(dataset_key, query, query_type='sql', parameters=None)

Query an existing dataset

  • dataset_key (str) – Dataset identifier, in the form of owner/id or of a url

  • query (str) – SQL or SPARQL query

  • query_type ({'sql', 'sparql'}, optional) – The type of the query. Must be either ‘sql’ or ‘sparql’. (Default value = “sql”)

  • parameters (query parameters, optional) – parameters to the query - if SPARQL query, this should be a dict containing named parameters, if SQL query,then this should be a list containing positional parameters. Boolean values will be converted to xsd:boolean, Integer values to xsd:integer, and other Numeric values to xsd:decimal. Anything else is treated as a String literal (Default value = None)


Object containing the results of the query

Return type



RuntimeError – If a server error occurs