Integrations

Standalone functions

load_dataset(dataset_key, force_update=False, auto_update=False)

Load a dataset from the local filesystem, downloading it from data.world first, if necessary.

This function returns an object of type LocalDataset. The object allows access to metedata via it’s describe() method and to all the data via three properties raw_data, tables and dataframes, all of which are mappings (dict-like structures).

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id or of a url

  • force_update (bool) – Flag, indicating if a new copy of the dataset should be downloaded replacing any previously downloaded copy (Default value = False)

  • auto_update (bool) – Flag, indicating that dataset be updated to the latest version

Returns

The object representing the dataset

Return type

LocalDataset

Raises

RestApiError – If a server error occurs

open_remote_file(dataset_key, file_name, mode='w', **kwargs)

Open a remote file object that can be used to write to or read from a file in a data.world dataset

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id

  • file_name (str) – The name of the file to open

  • mode (str, optional) – the mode for the file - must be ‘w’, ‘wb’, ‘r’, or ‘rb’ - indicating read/write (‘r’/’w’) and optionally “binary” handling of the file data. (Default value = ‘w’)

  • chunk_size (int, optional) – size of chunked bytes to return when reading streamed bytes in ‘rb’ mode

  • decode_unicode (bool, optional) – whether to decode textual responses as unicode when returning streamed lines in ‘r’ mode

  • **kwargs

Examples

>>> importdatadotworldasdw>>>>>> # write a text file>>> withdw.open_remote_file('username/test-dataset',... 'test.txt')asw:... w.write("this is a test.")>>>>>> # write a jsonlines file>>> importjson>>> withdw.open_remote_file('username/test-dataset',... 'test.jsonl')asw:... json.dump({'foo':42,'bar':"A"},w)... w.write("\n")... json.dump({'foo':13,'bar':"B"},w)... w.write("\n")>>>>>> # write a csv file>>> importcsv>>> withdw.open_remote_file('username/test-dataset',... 'test.csv')asw:... csvw=csv.DictWriter(w,fieldnames=['foo','bar'])... csvw.writeheader()... csvw.writerow({'foo':42,'bar':"A"})... csvw.writerow({'foo':13,'bar':"B"})>>>>>> # write a pandas dataframe as a csv file>>> importpandasaspd>>> df=pd.DataFrame({'foo':[1,2,3,4],'bar':['a','b','c','d']})>>> withdw.open_remote_file('username/test-dataset',... 'dataframe.csv')asw:... df.to_csv(w,index=False)>>>>>> # write a binary file>>> withdw.open_remote_file('username/test-dataset',>>> 'test.txt',mode='wb')asw:... w.write(bytes([100,97,116,97,46,119,111,114,108,100]))>>>>>> # read a text file>>> withdw.open_remote_file('username/test-dataset',... 'test.txt',mode='r')asr:... print(r.read())>>>>>> # read a csv file>>> withdw.open_remote_file('username/test-dataset',... 'test.csv',mode='r')asr:... csvr=csv.DictReader(r)... forrowincsvr:... print(row['column a'],row['column b'])>>>>>> # read a binary file>>> withdw.open_remote_file('username/test-dataset',... 'test',mode='rb')asr:... bytes=r.read()
query(dataset_key, query, query_type='sql', parameters=None)

Query an existing dataset

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id or of a url

  • query (str) – SQL or SPARQL query

  • query_type ({'sql', 'sparql'}, optional) – The type of the query. Must be either ‘sql’ or ‘sparql’. (Default value = “sql”)

  • parameters (query parameters, optional) – parameters to the query - if SPARQL query, this should be a dict containing named parameters, if SQL query,then this should be a list containing positional parameters. Boolean values will be converted to xsd:boolean, Integer values to xsd:integer, and other Numeric values to xsd:decimal. Anything else is treated as a String literal (Default value = None)

Returns

Object containing the results of the query

Return type

QueryResults

Raises

RuntimeError – If a server error occurs