Standalone functions
load_dataset
(dataset_key, force_update=False, auto_update=False)Load a dataset from the local filesystem, downloading it from data.world first, if necessary.
This function returns an object of type LocalDataset. The object allows access to metedata via it’s describe() method and to all the data via three properties raw_data, tables and dataframes, all of which are mappings (dict-like structures).
- Parameters
dataset_key (str) – Dataset identifier, in the form of owner/id or of a url
force_update (bool) – Flag, indicating if a new copy of the dataset should be downloaded replacing any previously downloaded copy (Default value = False)
auto_update (bool) – Flag, indicating that dataset be updated to the latest version
- Returns
The object representing the dataset
- Return type
LocalDataset
- Raises
RestApiError – If a server error occurs
open_remote_file
(dataset_key, file_name, mode='w', **kwargs)Open a remote file object that can be used to write to or read from a file in a data.world dataset
- Parameters
dataset_key (str) – Dataset identifier, in the form of owner/id
file_name (str) – The name of the file to open
mode (str, optional) – the mode for the file - must be ‘w’, ‘wb’, ‘r’, or ‘rb’ - indicating read/write (‘r’/’w’) and optionally “binary” handling of the file data. (Default value = ‘w’)
chunk_size (int, optional) – size of chunked bytes to return when reading streamed bytes in ‘rb’ mode
decode_unicode (bool, optional) – whether to decode textual responses as unicode when returning streamed lines in ‘r’ mode
**kwargs –
Examples
>>> importdatadotworldasdw>>>>>> # write a text file>>> withdw.open_remote_file('username/test-dataset',... 'test.txt')asw:... w.write("this is a test.")>>>>>> # write a jsonlines file>>> importjson>>> withdw.open_remote_file('username/test-dataset',... 'test.jsonl')asw:... json.dump({'foo':42,'bar':"A"},w)... w.write("\n")... json.dump({'foo':13,'bar':"B"},w)... w.write("\n")>>>>>> # write a csv file>>> importcsv>>> withdw.open_remote_file('username/test-dataset',... 'test.csv')asw:... csvw=csv.DictWriter(w,fieldnames=['foo','bar'])... csvw.writeheader()... csvw.writerow({'foo':42,'bar':"A"})... csvw.writerow({'foo':13,'bar':"B"})>>>>>> # write a pandas dataframe as a csv file>>> importpandasaspd>>> df=pd.DataFrame({'foo':[1,2,3,4],'bar':['a','b','c','d']})>>> withdw.open_remote_file('username/test-dataset',... 'dataframe.csv')asw:... df.to_csv(w,index=False)>>>>>> # write a binary file>>> withdw.open_remote_file('username/test-dataset',>>> 'test.txt',mode='wb')asw:... w.write(bytes([100,97,116,97,46,119,111,114,108,100]))>>>>>> # read a text file>>> withdw.open_remote_file('username/test-dataset',... 'test.txt',mode='r')asr:... print(r.read())>>>>>> # read a csv file>>> withdw.open_remote_file('username/test-dataset',... 'test.csv',mode='r')asr:... csvr=csv.DictReader(r)... forrowincsvr:... print(row['column a'],row['column b'])>>>>>> # read a binary file>>> withdw.open_remote_file('username/test-dataset',... 'test',mode='rb')asr:... bytes=r.read()
query
(dataset_key, query, query_type='sql', parameters=None)Query an existing dataset
- Parameters
dataset_key (str) – Dataset identifier, in the form of owner/id or of a url
query (str) – SQL or SPARQL query
query_type ({'sql', 'sparql'}, optional) – The type of the query. Must be either ‘sql’ or ‘sparql’. (Default value = “sql”)
parameters (query parameters, optional) – parameters to the query - if SPARQL query, this should be a dict containing named parameters, if SQL query,then this should be a list containing positional parameters. Boolean values will be converted to xsd:boolean, Integer values to xsd:integer, and other Numeric values to xsd:decimal. Anything else is treated as a String literal (Default value = None)
- Returns
Object containing the results of the query
- Return type
QueryResults
- Raises
RuntimeError – If a server error occurs