Docs portal

Reference

Account removal due to terms violations - FAQ

If an account or materials created by an account is in violation of our terms of use and our community guidelines, we may remove the account and/or materials from data.world. Below are some common questions in respect to this:

Why was my account deleted?

Common reasons for account deletion may include:

  • Programmatic Access:We may delete an account if it appears that the owner is accessing our site using robots, spiders, scrapers or other automated means in violation of our terms of use, for the purpose of bulk extracting or "scraping" content and data from data.world. We provide APIs for the purpose of interacting programmatically with our resources in approved ways.

  • Spam: We often delete accounts because they are spammy or fake, which can lead to security risks for us and our users. These types of accounts are against our terms of use and our community guideline

  • Account security at risk: If we think an account has been hacked or compromised, we may delete it so it does not put other accounts, users or us at risk. We will work with the owner of the account to secure his or her information on our site.

  • Abusive behavior: We may delete an account if we receive reports from others or notice that a user is being abusive, threatening or engaging in other egregious behavior in violation of our terms of use and community guidelines.

Can I access my account after it has been deleted?

Once we have deleted your account, you will not be able to access it, any datasets or comments on them. In addition, you will not be able to contribute to any other datasets or projects, and contributors to your datasets and projects will not be able to access them or your user page.

Can I or a family member open a new account if my account is deleted?

You will not be allowed to register a new account after your account has been deleted. We may also delete existing accounts that anyone in your household opens.

Have any other questions?

Contact help@data.world

Aggregation and function support

Aggregation

Support

approx_distinct

Natively Supported

approx_median

Natively Supported

approx_percentile

Natively Supported

arbitrary

Natively Supported

array_agg

Natively Supported

avg

Natively Supported

bool_and

Natively Supported

bool_or

Natively Supported

checksum

Natively Supported

correlation

Natively Supported

count

Natively Supported

count(*)

Natively Supported

count_if

Natively Supported

covar_pop

Natively Supported

covar_samp

Natively Supported

group_concat

Natively Supported

kurtosis

Natively Supported

max

Natively Supported

max_by

Natively Supported

min

Natively Supported

min_by

Natively Supported

regr_avgx

Natively Supported (DISTINCT emulated)

regr_avgy

Natively Supported (DISTINCT emulated)

regr_count

Natively Supported (DISTINCT emulated)

regr_intercept

Natively Supported

regr_r2

Natively Supported (DISTINCT emulated)

regr_slope

Natively Supported

regr_sxx

Natively Supported (DISTINCT emulated)

regr_sxy

Natively Supported (DISTINCT emulated)

regr_syy

Natively Supported (DISTINCT emulated)

skewness

Natively Supported

std_pop

Natively Supported

std_samp

Natively Supported

stdev

Natively Supported

sum

Natively Supported

var_pop

Natively Supported

var_samp

Natively Supported

variance

Natively Supported

Athena function support

Function

Support

abs

Natively Supported

acos

Natively Supported

array

Natively Supported

array_append

Natively Supported

array_concat

Natively Supported

array_contains

Natively Supported

array_join

Natively Supported

array_length

Natively Supported

array_prepend

Natively Supported

asin

Natively Supported

at_time_zone

Emulated

atan

Natively Supported

atan2

Natively Supported

attr_of

Emulated

ceiling

Natively Supported

char

Emulated

coalesce

Natively Supported

concat

Natively Supported

cos

Natively Supported

cosh

Natively Supported

current_user

Emulated

date_add

Emulated

date_diff

Natively Supported

date_format

Emulated

date_parse

Emulated

date_part

Emulated

date_sub

Emulated

date_trunc

Natively Supported

day

Natively Supported

degrees

Natively Supported

element_at

Natively Supported

exp

Natively Supported

exp10

Natively Supported

floor

Natively Supported

get_path

Emulated

greatest

Natively Supported

hours

Natively Supported

iri_of

Emulated

json_extract_scalar

Natively Supported

label_of

Emulated

least

Natively Supported

left

Natively Supported

length

Natively Supported

like

Natively Supported

log

Natively Supported

log10

Natively Supported

lower

Natively Supported

lpad

Natively Supported

ltrim

Natively Supported

md5

Natively Supported

mid

Natively Supported

minutes

Natively Supported

mod

Natively Supported

month

Natively Supported

now

Natively Supported

pi

Natively Supported

position

Natively Supported

pow

Natively Supported

radians

Natively Supported

rand

Natively Supported

random

Natively Supported

regex

Emulated

regexp_extract

Natively Supported

replace

Natively Supported

right

Natively Supported

round

Natively Supported

rpad

Natively Supported

rtrim

Natively Supported

seconds

Natively Supported

sha1

Natively Supported

sha256

Natively Supported

sha384

Emulated

sha512

Natively Supported

sign

Natively Supported

sin

Natively Supported

sinh

Natively Supported

sqrt

Natively Supported

string_split

Emulated

substring

Natively Supported

tan

Natively Supported

tanh

Natively Supported

trim

Natively Supported

upper

Natively Supported

url_extract_fragment

Natively Supported

url_extract_host

Natively Supported

url_extract_parameter

Natively Supported

url_extract_path

Natively Supported

url_extract_port

Natively Supported

url_extract_protocol

Natively Supported

url_extract_query

Natively Supported

year

Natively Supported

Aggregation

Support

approx_distinct

Emulated

approx_median

Emulated

approx_percentile

Emulated

arbitrary

Natively Supported

array_agg

Unavailable

avg

Natively Supported

bool_and

Natively Supported

bool_or

Natively Supported

checksum

Natively Supported

correlation

Emulated

count

Natively Supported

count(*)

Emulated

count_if

Natively Supported

covar_pop

Emulated

covar_samp

Emulated

group_concat

Emulated

kurtosis

Emulated

max

Natively Supported

max_by

Emulated

min

Natively Supported

min_by

Emulated

regr_avgx

Natively Supported (DISTINCT emulated)

regr_avgy

Natively Supported (DISTINCT emulated)

regr_count

Natively Supported (DISTINCT emulated)

regr_intercept

Natively Supported (DISTINCT emulated)

regr_r2

Natively Supported (DISTINCT emulated)

regr_slope

Natively Supported (DISTINCT emulated)

regr_sxx

Natively Supported (DISTINCT emulated)

regr_sxy

Natively Supported (DISTINCT emulated)

regr_syy

Natively Supported (DISTINCT emulated)

skewness

Emulated

std_pop

Natively Supported

std_samp

Emulated

stdev

Natively Supported

sum

Natively Supported

var_pop

Natively Supported

var_samp

Emulated

variance

Natively Supported

Azure Synapse function support

Function

Support

abs

Natively Supported

acos

Natively Supported

array

Unavailable

array_append

Unavailable

array_concat

Unavailable

array_contains

Unavailable

array_join

Unavailable

array_length

Unavailable

array_prepend

Unavailable

asin

Natively Supported

at_time_zone

Emulated

atan

Natively Supported

atan2

Natively Supported

attr_of

Emulated

ceiling

Natively Supported

char

Natively Supported

coalesce

Natively Supported

concat

Natively Supported

cos

Natively Supported

cosh

Natively Supported

current_user

Emulated

date_add

Natively Supported

date_diff

Natively Supported

date_format

Emulated

date_parse

Emulated

date_part

Natively Supported

date_sub

Natively Supported

date_trunc

Natively Supported

day

Natively Supported

degrees

Natively Supported

element_at

Unavailable

exp

Natively Supported

exp10

Natively Supported

floor

Natively Supported

get_path

Emulated

greatest

Natively Supported

hours

Natively Supported

iri_of

Emulated

json_extract_scalar

Emulated

label_of

Emulated

least

Natively Supported

left

Natively Supported

length

Natively Supported

like

Natively Supported (3-argument version emulated)

log

Natively Supported

log10

Natively Supported

lower

Natively Supported

lpad

Emulated

ltrim

Natively Supported

md5

Natively Supported

mid

Natively Supported

minutes

Natively Supported

mod

Natively Supported

month

Natively Supported

now

Natively Supported

pi

Natively Supported

position

Natively Supported

pow

Natively Supported

radians

Natively Supported

rand

Natively Supported

random

Natively Supported

regex

Emulated

regexp_extract

Emulated

replace

Natively Supported

right

Natively Supported

round

Natively Supported

rpad

Emulated

rtrim

Natively Supported

seconds

Natively Supported

sha1

Natively Supported

sha256

Natively Supported

sha384

Emulated

sha512

Natively Supported

sign

Natively Supported

sin

Natively Supported

sinh

Natively Supported

sqrt

Natively Supported

string_split

Emulated

substring

Natively Supported

tan

Natively Supported

tanh

Natively Supported

trim

Natively Supported

upper

Natively Supported

url_extract_fragment

Emulated

url_extract_host

Emulated

url_extract_parameter

Emulated

url_extract_path

Emulated

url_extract_port

Emulated

url_extract_protocol

Emulated

url_extract_query

Emulated

year

Natively Supported

Aggregation

Support

approx_distinct

Natively Supported

approx_median

Emulated

approx_percentile

Emulated

arbitrary

Natively Supported

array_agg

Unavailable

avg

Natively Supported

bool_and

Natively Supported

bool_or

Natively Supported

checksum

Emulated

correlation

Natively Supported

count

Natively Supported

count(*)

Natively Supported

count_if

Natively Supported

covar_pop

Natively Supported

covar_samp

Natively Supported

group_concat

Natively Supported

kurtosis

Emulated

max

Natively Supported

max_by

Emulated

min

Natively Supported

min_by

Emulated

regr_avgx

Natively Supported (DISTINCT emulated)

regr_avgy

Natively Supported (DISTINCT emulated)

regr_count

Natively Supported (DISTINCT emulated)

regr_intercept

Natively Supported (DISTINCT emulated)

regr_r2

Natively Supported (DISTINCT emulated)

regr_slope

Natively Supported (DISTINCT emulated)

regr_sxx

Natively Supported (DISTINCT emulated)

regr_sxy

Natively Supported (DISTINCT emulated)

regr_syy

Natively Supported (DISTINCT emulated)

skewness

Emulated

std_pop

Natively Supported

std_samp

Natively Supported

stdev

Natively Supported

sum

Natively Supported

var_pop

Natively Supported

var_samp

Natively Supported

variance

Natively Supported

BigQuery function support

Function

Support

abs

Natively Supported

acos

Natively Supported

array

Unavailable

array_append

Unavailable

array_concat

Unavailable

array_contains

Unavailable

array_join

Unavailable

array_length

Unavailable

array_prepend

Unavailable

asin

Natively Supported

at_time_zone

Natively Supported

atan

Natively Supported

atan2

Natively Supported

attr_of

Emulated

ceiling

Natively Supported

char

Emulated

coalesce

Natively Supported

concat

Natively Supported

cos

Natively Supported

cosh

Natively Supported

current_user

Emulated

date_add

Natively Supported

date_diff

Natively Supported

date_format

Emulated

date_parse

Emulated

date_part

Natively Supported

date_sub

Natively Supported

date_trunc

Natively Supported

day

Natively Supported

degrees

Natively Supported

element_at

Unavailable

exp

Natively Supported

exp10

Natively Supported

floor

Natively Supported

get_path

Emulated

greatest

Natively Supported

hours

Natively Supported

iri_of

Emulated

json_extract_scalar

Emulated

label_of

Emulated

least

Natively Supported

left

Natively Supported

length

Natively Supported

like

Natively Supported

log

Natively Supported

log10

Natively Supported

lower

Natively Supported

lpad

Natively Supported

ltrim

Natively Supported

md5

Natively Supported

mid

Natively Supported

minutes

Natively Supported

mod

Natively Supported

month

Natively Supported

now

Natively Supported

pi

Natively Supported

position

Natively Supported

pow

Natively Supported

radians

Natively Supported

rand

Natively Supported

random

Natively Supported

regex

Natively Supported (3-argument version emulated)

regexp_extract

Natively Supported (3-argument version emulated)

replace

Natively Supported

right

Natively Supported

round

Natively Supported

rpad

Natively Supported

rtrim

Natively Supported

seconds

Natively Supported

sha1

Natively Supported

sha256

Natively Supported

sha384

Emulated

sha512

Natively Supported

sign

Natively Supported

sin

Natively Supported

sinh

Natively Supported

sqrt

Natively Supported

string_split

Emulated

substring

Natively Supported

tan

Natively Supported

tanh

Natively Supported

trim

Natively Supported

upper

Natively Supported

url_extract_fragment

Emulated

url_extract_host

Emulated

url_extract_parameter

Emulated

url_extract_path

Emulated

url_extract_port

Emulated

url_extract_protocol

Emulated

url_extract_query

Emulated

year

Natively Supported

Aggregation

Support

approx_distinct

Emulated

approx_median

Emulated

approx_percentile

Emulated

arbitrary

Emulated

array_agg

Unavailable

avg

Natively Supported

bool_and

Natively Supported

bool_or

Natively Supported

checksum

Emulated

correlation

Emulated

count

Natively Supported

count(*)

Emulated

count_if

Natively Supported

covar_pop

Emulated

covar_samp

Emulated

group_concat

Emulated

kurtosis

Emulated

max

Natively Supported

max_by

Emulated

min

Natively Supported

min_by

Emulated

regr_avgx

Natively Supported (DISTINCT emulated)

regr_avgy

Natively Supported (DISTINCT emulated)

regr_count

Natively Supported (DISTINCT emulated)

regr_intercept

Natively Supported (DISTINCT emulated)

regr_r2

Natively Supported (DISTINCT emulated)

regr_slope

Natively Supported (DISTINCT emulated)

regr_sxx

Natively Supported (DISTINCT emulated)

regr_sxy

Natively Supported (DISTINCT emulated)

regr_syy

Natively Supported (DISTINCT emulated)

skewness

Emulated

std_pop

Natively Supported

std_samp

Natively Supported

stdev

Natively Supported

sum

Natively Supported

var_pop

Natively Supported

var_samp

Natively Supported

variance

Natively Supported

Denodo function support

Function

Support

abs

Natively Supported

acos

Natively Supported

array

Unavailable

array_append

Unavailable

array_concat

Unavailable

array_contains

Unavailable

array_join

Unavailable

array_length

Unavailable

array_prepend

Unavailable

asin

Natively Supported

at_time_zone

Emulated

atan

Natively Supported

atan2

Natively Supported

attr_of

Emulated

ceiling

Natively Supported

char

Emulated

coalesce

Natively Supported

concat

Natively Supported

cos

Natively Supported

cosh

Emulated

current_user

Emulated

date_add

Natively Supported

date_diff

Emulated

date_format

Natively Supported

date_parse

Emulated

date_part

Natively Supported

date_sub

Natively Supported

date_trunc

Natively Supported

day

Natively Supported

degrees

Natively Supported

element_at

Unavailable

exp

Natively Supported

exp10

Natively Supported

floor

Natively Supported

get_path

Emulated

greatest

Natively Supported

hours

Natively Supported

iri_of

Emulated

json_extract_scalar

Emulated

label_of

Emulated

least

Natively Supported

left

Natively Supported

length

Natively Supported

like

Natively Supported

log

Natively Supported

log10

Natively Supported

lower

Natively Supported

lpad

Emulated

ltrim

Natively Supported

md5

Emulated

mid

Natively Supported

minutes

Natively Supported

mod

Natively Supported

month

Natively Supported

now

Natively Supported

pi

Natively Supported

position

Natively Supported

pow

Natively Supported

radians

Natively Supported

rand

Natively Supported

random

Natively Supported

regex

Natively Supported

regexp_extract

Emulated

replace

Natively Supported

right

Natively Supported

round

Natively Supported

rpad

Emulated

rtrim

Natively Supported

seconds

Natively Supported

sha1

Emulated

sha256

Emulated

sha384

Emulated

sha512

Emulated

sign

Natively Supported

sin

Natively Supported

sinh

Emulated

sqrt

Natively Supported

string_split

Emulated

substring

Natively Supported

tan

Natively Supported

tanh

Emulated

trim

Natively Supported

upper

Natively Supported

url_extract_fragment

Emulated

url_extract_host

Emulated

url_extract_parameter

Emulated

url_extract_path

Emulated

url_extract_port

Emulated

url_extract_protocol

Emulated

url_extract_query

Emulated

year

Natively Supported

Aggregation

Support

approx_distinct

Emulated

approx_median

Emulated

approx_percentile

Emulated

arbitrary

Emulated

array_agg

Unavailable

avg

Natively Supported

bool_and

Natively Supported

bool_or

Natively Supported

checksum

Emulated

correlation

Natively Supported

count

Natively Supported

count(*)

Natively Supported

count_if

Natively Supported

covar_pop

Natively Supported

covar_samp

Natively Supported

group_concat

Emulated

kurtosis

Emulated

max

Natively Supported

max_by

Emulated

min

Natively Supported

min_by

Emulated

regr_avgx

Natively Supported

regr_avgy

Natively Supported

regr_count

Natively Supported

regr_intercept

Natively Supported

regr_r2

Natively Supported

regr_slope

Natively Supported

regr_sxx

Natively Supported

regr_sxy

Natively Supported

regr_syy

Natively Supported

skewness

Emulated

std_pop

Natively Supported

std_samp

Natively Supported

stdev

Natively Supported

sum

Natively Supported

var_pop

Natively Supported

var_samp

Natively Supported

variance

Natively Supported

Function

Support

abs

Natively Supported

acos

Natively Supported

array

Unavailable

array_append

Unavailable

array_concat

Unavailable

array_contains

Unavailable

array_join

Unavailable

array_length

Unavailable

array_prepend

Unavailable

asin

Natively Supported

at_time_zone

Natively Supported

atan

Natively Supported

atan2

Natively Supported

attr_of

Emulated

ceiling

Natively Supported

char

Emulated

coalesce

Natively Supported

concat

Natively Supported

cos

Natively Supported

cosh

Emulated

current_user

Emulated

date_add

Natively Supported

date_diff

Natively Supported

date_format

Natively Supported

date_parse

Emulated

date_part

Natively Supported

date_sub

Natively Supported

date_trunc

Natively Supported

day

Natively Supported

degrees

Natively Supported

element_at

Unavailable

exp

Natively Supported

exp10

Natively Supported

floor

Natively Supported

get_path

Emulated

greatest

Natively Supported

hours

Natively Supported

iri_of

Emulated

json_extract_scalar

Emulated

label_of

Emulated

least

Natively Supported

left

Natively Supported

length

Natively Supported

like

Natively Supported

log

Natively Supported

log10

Natively Supported

lower

Natively Supported

lpad

Natively Supported

ltrim

Natively Supported

md5

Natively Supported

mid

Natively Supported

minutes

Natively Supported

mod

Natively Supported

month

Natively Supported

now

Natively Supported

pi

Natively Supported

position

Natively Supported

pow

Natively Supported

radians

Natively Supported

rand

Natively Supported

random

Natively Supported

regex

Natively Supported

regexp_extract

Emulated

replace

Natively Supported

right

Natively Supported

round

Natively Supported

rpad

Natively Supported

rtrim

Natively Supported

seconds

Natively Supported

sha1

Emulated

sha256

Emulated

sha384

Emulated

sha512

Emulated

sign

Natively Supported

sin

Natively Supported

sinh

Emulated

sqrt

Natively Supported

string_split

Emulated

substring

Natively Supported

tan

Natively Supported

tanh

Emulated

trim

Natively Supported

upper

Natively Supported

url_extract_fragment

Emulated

url_extract_host

Emulated

url_extract_parameter

Emulated

url_extract_path

Emulated

url_extract_port

Emulated

url_extract_protocol

Emulated

url_extract_query

Emulated

year

Natively Supported

Aggregation

Support

approx_distinct

Emulated

approx_median

Emulated

approx_percentile

Emulated

arbitrary

Emulated

array_agg

Unavailable

avg

Natively Supported

bool_and

Natively Supported

bool_or

Natively Supported

checksum

Emulated

correlation

Emulated

count

Natively Supported

count(*)

Natively Supported

count_if

Natively Supported

covar_pop

Emulated

covar_samp

Emulated

group_concat

Emulated

kurtosis

Emulated

max

Natively Supported

max_by

Emulated

min

Natively Supported

min_by

Emulated

regr_avgx

Emulated

regr_avgy

Emulated

regr_count

Emulated

regr_intercept

Emulated

regr_r2

Emulated

regr_slope

Emulated

regr_sxx

Emulated

regr_sxy

Emulated

regr_syy

Emulated

skewness

Emulated

std_pop

Natively Supported

std_samp

Natively Supported

stdev

Natively Supported

sum

Natively Supported

var_pop

Natively Supported

var_samp

Natively Supported

variance

Natively Supported

Redshift function support

Function

Support

abs

Natively Supported

acos

Natively Supported

array

Unavailable

array_append

Unavailable

array_concat

Unavailable

array_contains

Unavailable

array_join

Unavailable

array_length

Unavailable

array_prepend

Unavailable

asin

Natively Supported

at_time_zone

Natively Supported

atan

Natively Supported

atan2

Natively Supported

attr_of

Emulated

ceiling

Natively Supported

char

Emulated

coalesce

Natively Supported

concat

Natively Supported

cos

Natively Supported

cosh

Emulated

current_user

Emulated

date_add

Natively Supported

date_diff

Natively Supported

date_format

Natively Supported

date_parse

Emulated

date_part

Natively Supported

date_sub

Natively Supported

date_trunc

Natively Supported

day

Natively Supported

degrees

Natively Supported

element_at

Unavailable

exp

Natively Supported

exp10

Natively Supported

floor

Natively Supported

get_path

Emulated

greatest

Natively Supported

hours

Natively Supported

iri_of

Emulated

json_extract_scalar

Emulated

label_of

Emulated

least

Natively Supported

left

Natively Supported

length

Natively Supported

like

Natively Supported

log

Natively Supported

log10

Natively Supported

lower

Natively Supported

lpad

Natively Supported

ltrim

Natively Supported

md5

Natively Supported

mid

Natively Supported

minutes

Natively Supported

mod

Natively Supported

month

Natively Supported

now

Natively Supported

pi

Natively Supported

position

Natively Supported

pow

Natively Supported

radians

Natively Supported

rand

Natively Supported

random

Natively Supported

regex

Natively Supported

regexp_extract

Emulated

replace

Natively Supported

right

Natively Supported

round

Natively Supported

rpad

Natively Supported

rtrim

Natively Supported

seconds

Natively Supported

sha1

Emulated

sha256

Emulated

sha384

Emulated

sha512

Emulated

sign

Natively Supported

sin

Natively Supported

sinh

Emulated

sqrt

Natively Supported

string_split

Emulated

substring

Natively Supported

tan

Natively Supported

tanh

Emulated

trim

Natively Supported

upper

Natively Supported

url_extract_fragment

Emulated

url_extract_host

Emulated

url_extract_parameter

Emulated

url_extract_path

Emulated

url_extract_port

Emulated

url_extract_protocol

Emulated

url_extract_query

Emulated

year

Natively Supported

Aggregation

Support

approx_distinct

Natively Supported

approx_median

Natively Supported (2-argument version emulated)

approx_percentile

Natively Supported (3-argument version emulated)

arbitrary

Natively Supported

array_agg

Unavailable

avg

Natively Supported

bool_and

Natively Supported

bool_or

Natively Supported

checksum

Emulated

correlation

Natively Supported

count

Natively Supported

count(*)

Natively Supported (DISTINCT emulated)

count_if

Natively Supported

covar_pop

Natively Supported

covar_samp

Natively Supported

group_concat

Natively Supported

kurtosis

Emulated

max

Natively Supported

max_by

Emulated

min

Natively Supported

min_by

Emulated

regr_avgx

Natively Supported

regr_avgy

Natively Supported

regr_count

Natively Supported

regr_intercept

Natively Supported

regr_r2

Natively Supported

regr_slope

Natively Supported

regr_sxx

Natively Supported

regr_sxy

Natively Supported

regr_syy

Natively Supported

skewness

Emulated

std_pop

Natively Supported

std_samp

Natively Supported

stdev

Natively Supported

sum

Natively Supported

var_pop

Natively Supported

var_samp

Natively Supported

variance

Natively Supported

Snowflake function support

Function

Support

abs

Natively Supported

acos

Natively Supported

array

Unavailable

array_append

Unavailable

array_concat

Unavailable

array_contains

Unavailable

array_join

Unavailable

array_length

Unavailable

array_prepend

Unavailable

asin

Natively Supported

at_time_zone

Emulated

atan

Natively Supported

atan2

Natively Supported

attr_of

Emulated

ceiling

Natively Supported

char

Emulated

coalesce

Natively Supported

concat

Natively Supported

cos

Natively Supported

cosh

Natively Supported

current_user

Emulated

date_add

Natively Supported

date_diff

Natively Supported

date_format

Emulated

date_parse

Emulated

date_part

Natively Supported

date_sub

Natively Supported

date_trunc

Natively Supported

day

Natively Supported

degrees

Natively Supported

element_at

Unavailable

exp

Natively Supported

exp10

Natively Supported

floor

Natively Supported

get_path

Natively Supported

greatest

Natively Supported

hours

Natively Supported

iri_of

Emulated

json_extract_scalar

Natively Supported

label_of

Emulated

least

Natively Supported

left

Natively Supported

length

Natively Supported

like

Natively Supported

log

Natively Supported

log10

Natively Supported

lower

Natively Supported

lpad

Natively Supported

ltrim

Natively Supported

md5

Natively Supported

mid

Natively Supported

minutes

Natively Supported

mod

Natively Supported

month

Natively Supported

now

Natively Supported

pi

Natively Supported

position

Natively Supported

pow

Natively Supported

radians

Natively Supported

rand

Natively Supported

random

Natively Supported

regex

Natively Supported

regexp_extract

Natively Supported

replace

Natively Supported

right

Natively Supported

round

Natively Supported

rpad

Natively Supported

rtrim

Natively Supported

seconds

Natively Supported

sha1

Natively Supported

sha256

Emulated

sha384

Emulated

sha512

Emulated

sign

Natively Supported

sin

Natively Supported

sinh

Natively Supported

sqrt

Natively Supported

string_split

Emulated

substring

Natively Supported

tan

Natively Supported

tanh

Natively Supported

trim

Natively Supported

upper

Natively Supported

url_extract_fragment

Emulated

url_extract_host

Emulated

url_extract_parameter

Emulated

url_extract_path

Emulated

url_extract_port

Emulated

url_extract_protocol

Emulated

url_extract_query

Emulated

year

Natively Supported

function

Support

approx_distinct

Emulated

approx_median

Emulated

approx_percentile

Emulated

arbitrary

Natively Supported

array_agg

Unavailable

avg

Natively Supported

bool_and

Natively Supported

bool_or

Natively Supported

checksum

Natively Supported

correlation

Emulated

count

Natively Supported

count(*)

Natively Supported

count_if

Natively Supported

covar_pop

Emulated

covar_samp

Emulated

group_concat

Emulated

kurtosis

Emulated

max

Natively Supported

max_by

Emulated

min

Natively Supported

min_by

Emulated

regr_avgx

Natively Supported (DISTINCT emulated)

regr_avgy

Natively Supported (DISTINCT emulated)

regr_count

Natively Supported (DISTINCT emulated)

regr_intercept

Natively Supported (DISTINCT emulated)

regr_r2

Natively Supported (DISTINCT emulated)

regr_slope

Natively Supported (DISTINCT emulated)

regr_sxx

Natively Supported (DISTINCT emulated)

regr_sxy

Natively Supported (DISTINCT emulated)

regr_syy

Natively Supported (DISTINCT emulated)

skewness

Emulated

std_pop

Natively Supported

std_samp

Emulated

stdev

Natively Supported

sum

Natively Supported

var_pop

Natively Supported

var_samp

Emulated

variance

Natively Supported

SQL Server function support

Function

Support

abs

Natively Supported

acos

Natively Supported

array

Unavailable

array_append

Unavailable

array_concat

Unavailable

array_contains

Unavailable

array_join

Unavailable

array_length

Unavailable

array_prepend

Unavailable

asin

Natively Supported

at_time_zone

Emulated

atan

Natively Supported

atan2

Natively Supported

attr_of

Emulated

ceiling

Natively Supported

char

Natively Supported

coalesce

Natively Supported

concat

Natively Supported

cos

Natively Supported

cosh

Natively Supported

current_user

Emulated

date_add

Natively Supported

date_diff

Natively Supported

date_format

Emulated

date_parse

Emulated

date_part

Natively Supported

date_sub

Natively Supported

date_trunc

Natively Supported

day

Natively Supported

degrees

Natively Supported

element_at

Unavailable

exp

Natively Supported

exp10

Natively Supported

floor

Natively Supported

get_path

Emulated

greatest

Natively Supported

hours

Natively Supported

iri_of

Emulated

json_extract_scalar

Emulated

label_of

Emulated

least

Natively Supported

left

Natively Supported

length

Natively Supported

like

Natively Supported

log

Natively Supported

log10

Natively Supported

lower

Natively Supported

lpad

Emulated

ltrim

Natively Supported

md5

Natively Supported

mid

Natively Supported

minutes

Natively Supported

mod

Natively Supported

month

Natively Supported

now

Natively Supported

pi

Natively Supported

position

Natively Supported

pow

Natively Supported

radians

Natively Supported

rand

Natively Supported

random

Natively Supported

regex

Emulated

regexp_extract

Emulated

replace

Natively Supported

right

Natively Supported

round

Natively Supported

rpad

Emulated

rtrim

Natively Supported

seconds

Natively Supported

sha1

Natively Supported

sha256

Natively Supported

sha384

Emulated

sha512

Natively Supported

sign

Natively Supported

sin

Natively Supported

sinh

Natively Supported

sqrt

Natively Supported

string_split

Emulated

substring

Natively Supported

tan

Natively Supported

tanh

Natively Supported

trim

Natively Supported

upper

Natively Supported

url_extract_fragment

Emulated

url_extract_host

Emulated

url_extract_parameter

Emulated

url_extract_path

Emulated

url_extract_port

Emulated

url_extract_protocol

Emulated

url_extract_query

Emulated

year

Natively Supported

Changing email notifications

Data.world sends out two types of emails:

  • General platform emails

  • Project and dataset updates and requests for access

Modifying the subscription settings for each type of email requires a different method.

General platform emails

This category of e-mails includes the types that data.world may send to all users. Examples are feature updates, weekly Data Digests, and our Blog Newsletter.

To unsubscribe from these, follow the Update your email preferences link at the bottom of any of those e-mails. From this page, you can choose certain categories of e-mails you would like to unsubscribe from, or select Unsubscribe me from all mailing lists at the bottom of that page.

Project and dataset updates and requests for access

For any data.world project or dataset that you are a contributor to, you will by default receive email notifications any time that project or dataset is updated (which can include when a new comment is left, a new Insight is created, or new files are added) or a request for access is submitted.

For a specific dataset or project, to add an additional email in addition to the dataset/project admin to receive these notifications:

  1. Go to the dataset or project of your choosing

  2. Click Settings

  3. Under the General tab, in the section labeled Additional notification recipient, specify an email address (like dataowner@yourcompany.com) to receive dataset notifications. Dataset/project admins will continue to receive emails. Specify only one additional email address.

Screen_Shot_2020-01-31_at_12.25.55_PM.png

For an organization overall, to add an additional email in addition to the organization admins to receive these notifications:

  1. Navigate to your Organization page (e.g. https://data.world/YOURORGNAMEHERE)

  2. Click Settings

  3. Click Preferences

  4. In the section labeled Additional notification recipient, specify an email address (like dataowner@yourcompany.com) to receive dataset notifications. Dataset/project admins will continue to receive emails. Specify only one additional email address.

Screen_Shot_2020-01-31_at_12.24.58_PM.png

To unsubscribe from these:

  1. Click on your account avatar on the top right corner of your screen while logged into data.world

  2. Click Settings

  3. On the left side of the screen, select Notifications

  4. Use the toggle next to each dataset or project to deactivate the notification.

notifications.png

Unsubscribing from these must be done on a one-by-one basis - these notifications cannot be turned off in bulk.

Common license types for datasets

Common licenses in order of most open to most restrictive:
Public Domain
Public Domain Mark

Dedicate your dataset to the public domain: This isn’t technically a license since you are relinquishing all your rights in your dataset by choosing to dedicate your dataset to the public domain. To donate your work to the public domain, you can select “public domain” from the license menu when creating your dataset.

CC-0
Creative Commons Public Domain Dedication

This license is one of the open Creative Commons licenses and is like a public domain dedication. It allows you, as a dataset owner, to use a license mechanism to surrender your rights in a dataset when you might not otherwise be able to dedicate your dataset to the public domain under applicable law.

PDDL
Open Data Commons Public Domain Dedication and License

This license is one of the Open Data Commons licenses and is like a public domain dedication. It allows you, as a dataset owner, to use a license mechanism to surrender your rights in a dataset when you might not otherwise be able to dedicate your dataset to the public domain under applicable law.

CC-BY
Creative Commons Attribution 4.0 International

This license is one of the open Creative Commons licenses and allows users to share and adapt your dataset so long as they give credit to you.

CDLA-Permissive-1.0
Community Data License Agreement – Permissive, Version 1.0

This license is one of the Community Data License Agreement licenses and is similar to permissive open source licenses. It allows users to use, modify and adapt your dataset and the data within it, and to share it so long as they give credit to you. The CDLA-Permissive terms explicitly do not impose any obligations or restrictions on results obtained from users’ computational use of the data.

ODC-BY
Open Data Commons Attribution License

This license is one of the Open Data Commons licenses and allows users to share and adapt your dataset so long as they give credit to you.

CC-BY-SA
Creative Commons Attribution-ShareAlike 4.0 International

This license is one of the open Creative Commons licenses and allows users to share and adapt your dataset so long as they give credit to you and distribute any additions, transformations or changes to your dataset under this license. We consider this license (a.k.a a viral license) problematic since others may decide not to work with your CC-BY-SA licensed dataset if there is risk that by doing so their work on your dataset will need to be shared under this license when they would rather use another license.

CDLA-Sharing-1.0
Community Data License Agreement – Sharing, Version 1.0

This license is one of the Community Data License Agreement licenses and was designed to embody the principles of "copyleft" in a data license. It allows users to use, modify and adapt your dataset and the data within it, and to share the dataset and data with their changes so long as they do so under the CDLA-Sharing and give credit to you. The CDLA-Sharing terms explicitly do not impose any obligations or restrictions on results obtained from users’ computational use of the data.

ODC-ODbL
Open Data Commons Open Database License

This license is one of the Open Data Commons licenses and allows users to share and adapt your dataset so long as they give credit to you and distribute any additions, transformation or changes to your dataset under this license. We consider this license (a.k.a a viral license) problematic since others may decide not to work with your ODC-ODbL licensed dataset if there is risk that by doing so their work on your dataset will need to be shared under this license when they would rather use another license.

CC BY-NC
Creative Commons Attribution-NonCommercial 4.0 International

This license is one of the more restrictive Creative Commons licenses. Users can share and adapt your dataset if they give credit to you and do not use your dataset for any commercial purposes.

CC BY-ND
Creative Commons Attribution-NoDerivatives 4.0 International

This license is one of the more restrictive Creative Commons licenses. Users can share your dataset if they give credit to you, but they cannot make any additions, transformations or changes to your dataset under this license.

CC BY-NC-SA
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International

This license is one of the most restrictive Creative Commons licenses. Users can share your dataset only if they (1) give credit to you, (2) do not use your dataset for any commercial purposes, and (3) distribute any additions, transformations or changes to your dataset under this license. We consider this license a viral license since users will need to share their work on your dataset under this same license and any users of the adapted dataset would likewise need to share their work on the adapted dataset under this license and so on for any other changes to those modified datasets.

CC BY-NC-ND
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

This license is one of the most restrictive Creative Commons licenses. Users can share only your unmodified dataset if they give credit to you and do not share it for commercial purposes. Users cannot make any additions, transformations or changes to your dataset under this license.

Other
Additional License Coverage Options

If a license is not listed in the data.world menu options, you may select Other and specify the details in the summary of your dataset.

No license specified

No one can use, share, distribute, re-post, add to, transform or change your dataset if you have not specified a license.

These descriptions are only summaries of these licenses. For the actual text of the licenses, which we strongly encourage you to read, click on the links provided.

Summary of common license types:
Public Domain

The work has been dedicated to the public domain by waiving all rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.

Attribution

You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

Share-alike

If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

Non-commercial

You may not use the material for commercial purposes.

Database Only

License applies to the database only and not its contents or data.

No Derivatives

No Derivative Works. You may not alter, transform, or build upon this work.

All licenses that begin with CC-BY in the table above refer to version 4.0 of those licenses

Please submit a ticket if you have additional licensing questions.

Data Inspections

When loading your file into data.world, the following warnings may be generated. These warnings will only be visible to the dataset owner and any contributors with write access to the dataset.

Warnings are informational only and may be ignored.

Geospatial

Country name/abbreviation actually exists

This cell doesn't contain a valid country name

State name actually exists

This cell doesn't contain a valid state name

State abbreviation actually exists

This cell doesn't contain a valid state abbreviation

Noise

Mostly numeric columns with rare (non-N/A) text

This value doesn't look like a number

Occasional percentage or currency numbers

This value doesn't look like a number

Numeric

Numeric outlier

This number looks way too [big or small]

Likely noise numbers

This number looks like a placeholder

Numeric truncation

Possible Numeric Truncation

Security / PII

Social security numbers

This column may contain social security numbers

Credit card numbers

This column may contain credit card numbers

Phone numbers

This column may contain phone numbers

Email addresses

This column may contain email addresses

Structural Warnings

Empty Columns

This column is blank

Duplicate Rows

This row is a duplicate of the one above it.

Likely row truncation

Possible row truncation. The number of characters in this row could indicate that some amount of data was clipped.

Likely column truncation

Possible column truncation. The number of characters in a given field could indicate that some amount of data was clipped.

Suspiciously round numbers of rows

Suspiciously round number of rows. I.e. exactly 1000 rows. Perhaps this isn’t the full data, but rather a subset.

Rare blank cells in columns

A column which contains mostly filled values has some small number left blank.

Text

String truncation

Possible string truncation.

Likely noise text

This text looks like a placeholder (i.e. qwerty, asdf)

String length outlier

This text looks [longer or shorter] than the rest

Data limits

The size of data files you can store on data.world is set by your account plan. To see your file limits, go to your profile >settings> > billing. More information on free and paid accounts can be found here. Here's what we currently support:

Dataset Limits:

A dataset ingested by data.world may have a maximum size of 1GB and up to 250 individual files. Datasets from live connections have no size limit, nor do metadata management datasets created by metadata crawling.

Individual File Upload Limits:

The maximum size for an individual file is 1GB. If you have a file that is larger than that, try compressing the file to get it under the limit, but note that it would then only be available for download due to size constraints.

Inference & Preview Limits:

Non-tabular files that can be previewed only display a file preview if less than 40k. Images will be displayed beyond that limit if possible.

For xls / xlsx, the file must be less than 100MB uncompressed for us to support query and data preview functionality.

For other supported data files, we will provide data preview and query capabilities up to 1GB.

For deeper details we have tables with specific size limit and timeout information. Please contact us if your application requires a greater number of files or a larger maximum file size.

Definitions of common data.world terms

Name

Description

Summary

Administrator

The person in an organization who can manage organization members and access levels, and access all data sets and projects owned by the organization (even private ones).

API

Application Program Interface

A set of routines, protocols, and tools for building software applications. Basically, an API specifies how software components should interact. Additionally, APIs are used when programming graphical user interface (GUI) components.

Article

Documentation on data.world is broken up into four different types. One of those types is articles which are instructional for a specific task or feature, and are not hands-on.

Best practices

Best practices is a type of documentation which is instructional, not hands on, and recommends a specific way of doing something.

Bookmarks

You can add a bookmark to any dataset or project that interests you, whether or not it is owned by you or your organization. Search is enabled in your bookmarks section to help you quickly find datasets or projects. If your data project is bookmarked, you can think of it as similar to a "like" on Facebook.

Business glossary

A list of terms defined as they are used in your specific business environment.

Catalog

A catalog is an organized list of information.

CC BY-NC

Creative Commons Attribution-NonCommercial 4.0 International

This license is one of the more restrictive Creative Commons licenses. Users can share and adapt your dataset if they give credit to you and do not use your dataset for any commercial purposes.

CC BY-NC-ND

Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

This license is one of the most restrictive Creative Commons licenses. Users can share only your unmodified dataset if they give credit to you and do not share it for commercial purposes. Users cannot make any additions, transformations or changes to your dataset under this license.

CC BY-NC-SA

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International

This license is one of the most restrictive Creative Commons licenses. Users can share your dataset only if they (1) give credit to you, (2) do not use your dataset for any commercial purposes, and (3) distribute any additions, transformations or changes to your dataset under this license. We consider this license a viral license since users will need to share their work on your dataset under this same license and any users of the adapted dataset would likewise need to share their work on the adapted dataset under this license and so on for any other changes to those modified datasets.

CC BY-ND

Creative Commons Attribution-NoDerivatives 4.0 International

This license is one of the more restrictive Creative Commons licenses. Users can share your dataset if they give credit to you, but they cannot make any additions, transformations or changes to your dataset under this license.

CC-0

Creative Commons Public Domain Dedication

This license is one of the open Creative Commons licenses and is like a public domain dedication. It allows you, as a dataset owner, to use a license mechanism to surrender your rights in a dataset when you might not otherwise be able to dedicate your dataset to the public domain under applicable law.

CC-BY

Creative Commons Attribution 4.0 International

This license is one of the open Creative Commons licenses and allows users to share and adapt your dataset so long as they give credit to you.

CC-BY-SA

Creative Commons Attribution-ShareAlike 4.0 International

This license is one of the open Creative Commons licenses and allows users to share and adapt your dataset so long as they give credit to you and distribute any additions, transformations or changes to your dataset under this license. We consider this license (a.k.a a viral license) problematic since others may decide not to work with your CC-BY-SA licensed dataset if there is risk that by doing so their work on your dataset will need to be shared under this license when they would rather use another license.

CDLA-Permissive-1.0

Community Data License Agreement – Permissive, Version 1.0

This license is one of the Community Data License Agreement licenses and is similar to permissive open source licenses. It allows users to use, modify and adapt your dataset and the data within it, and to share it so long as they give credit to you. The CDLA-Permissive terms explicitly do not impose any obligations or restrictions on results obtained from users’ computational use of the data.

CDLA-Sharing-1.0

Community Data License Agreement – Sharing, Version 1.0

This license is one of the Community Data License Agreement licenses and was designed to embody the principles of "copyleft" in a data license. It allows users to use, modify and adapt your dataset and the data within it, and to share the dataset and data with their changes so long as they do so under the CDLA-Sharing and give credit to you. The CDLA-Sharing terms explicitly do not impose any obligations or restrictions on results obtained from users’ computational use of the data.

Classroom

A classroom is a type of organization you can set-up in data.world so you and your students can upload datasets, create projects, discuss, and share insights. A classroom includes unlimited private projects & datasets, 1GB per project/dataset, & up to 100 members, so it's a perfect way to collaborate with any group that needs to learn together.

Columns

Data in tabular format is arranged into rows and columns. Columns represent data of the same type across all the records.

Community

The data.world community includes every person who uses the platform whether enterprise, educational, or individual.

Content contributor

A Content Contributor is a person in an organization who can create and interact with the organization's projects and datasets.

Contributor

A Contributor is a person who is invited to access a dataset or project. Contributor permissions can be set to Discover only, View only, Edit (view and edit), or Manage (view, edit, and manage).

Created and Updated Date

Created and updated are two operators which can be used to find datasets, projects, insights, users and organizations based on the date they were added or last updated. Timestamps are set in UTC, not your local time, so you might get results that are a day off of your local time depending on where you are:

Creator

The creator of a dataset or project is the individual who creates it. The creator can be different from the owner (see owner for more details). The distinction between owner and creator is important for organizations as the owner manages a resource with the same privileges as the creator, but owners can be changed (as personnel changes) while creator is a static entry.

Crowdsourced data

An organization can be configured so that an individual outside the organization can propose that the organization own a dataset created by the individual. Datasets created in this way are called crowdsourced data.

CSV

Comma-Separated-Value is a file format used to transform text into tables. Commas are used to separate the data into columns of the same data type, and paragraph breaks are used to separate it into records or rows.

Data

Data is just information, and it can take many forms from images to spreadsheets. Data in data.world can be in any file format.

Database

A structured set of data held in a computer, especially one that is accessible in various ways.

Data dictionary

The data dictionary contains all the metadata (data about the data) for the files, tables and columns in a dataset. For all files it contains:

The names of all the files in the dataset, a place to add descriptions for each file, and the labels for each file. For tabular files it has: The column names, the format of the data in each column, and a place to add a description for each column.

Data inspector

When data is ingested into data.world the Data Inspector evaluates it to rapidly diagnose issues with it. The inspector does not examine data brought in through a live connection, only data uploaded to data.world

Data sources

A data source is any place you can get data from including databases, local files, cloud-based files, real-time sources like log files, SaaS data, URL's, a corporate network.

Dataset

Datasets are where all data is stored and documented for later sharing and use in projects. A dataset is the basic repository for data files and associated metadata, documentation, scripts, and any other supporting resources that should be stored alongside the data.

Description fields

Datasets, projects, all the files in each, and all the columns in any structured data files have description fields associated with them. Descriptions are very short and serve as a quick reference for the item they describe.

FAQ

Frequently Asked Question

A document format consisting of questions and answers.

Glossary

A glossary is an alphabetical list of terms or words found in or relating to a specific subject with explanations; a brief dictionary.

Graph database

A graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data.

Insights

Findings, conclusions, and interesting points for discussion about a project are stored as insights in the project.

Integration

An application or program that connects to data.world in order to transport, manipulate, sync, or share data and analyses of the data.

JSON

JavaScript Object Notation

JSON (pronounced jay-saun) is a language-independent, open standard file format, and data interchange format, that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and array data types (or any other serializable value).

Resources

Your resources are the datasets and projects owned by you or your organization(s).

license

data.world allows you to specify how you allow data you own to be used by others.

license type

By providing a license, you are setting expectations about how you want your data to be used. You can think of a license as the Terms of Use for your data.

Markup language

A markup language is a computer language that uses tags to define elements within a document. It is human-readable, meaning markup files contain standard words, rather than typical programming syntax. The two most common mark-up languages are HTML and XML.

Metadata

National Information Standards Organization (NISO), Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information.

Metadata catalog

An organized list containing all the information about your data resources. For example, the source, the type, the location, the owner, the update and creation dates, descriptions of the resource, etc.

Metamap

A graph-based data repository containing the metadata about all public datasets stored in data.world.

ODC-BY

Open Data Commons Attribution License

This license is one of the Open Data Commons licenses and allows users to share and adapt your dataset so long as they give credit to you.

ODC-ODbL

Open Data Commons Open Database License

This license is one of the Open Data Commons licenses and allows users to share and adapt your dataset so long as they give credit to you and distribute any additions, transformation or changes to your dataset under this license. We consider this license (a.k.a a viral license) problematic since others may decide not to work with your ODC-ODbL licensed dataset if there is risk that by doing so their work on your dataset will need to be shared under this license when they would rather use another license.

OKTA

Cloud software that helps companies manage and secure user authentication into modern applications, and for developers to build identity controls into applications, website web services and devices. Provides secure identity management with Single Sign-On, Multi-factor Authentication and Lifecycle Management (Provisioning).

Organization

A group on data.world that you belong to which determines what data resources you can see and edit.

Owner

When a dataset or project is created the person creating it is the creator, but the owner can be designated as either the person who created it, one of the organizations in which the creator is a member, or an organization that accepts ownership proposals. The owner has all the same permissions for management and editing of the dataset or project that the creator has.

PDDL

Open Data Commons Public Domain Dedication and License

This license is one of the Open Data Commons licenses and is like a public domain dedication. It allows you, as a dataset owner, to use a license mechanism to surrender your rights in a dataset when you might not otherwise be able to dedicate your dataset to the public domain under applicable law.

Platform

The data.world application is also referred to as the platform.

Project

Projects are where all querying, analysis and discussion of data takes place in data.world. Data in different datasets can be used for many different projects, but each project contains all and only the data that is relevant for that project. The information in a project can come from datasets, files attached directly to the project, insights written by the project's team members about the data and the project, and discussions about the project.

Public API

The public API is used to create an integration or application with data.world. The API can also be used to get data out of data.world.

Public Domain

Public Domain License

The work has been dedicated to the public domain by waiving all rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.

Query

A statement written to retrieve information from a dataset on data.world. Queries can be written in SQL or SPARQL.

Quick start guide

A quick-start guides is a short hands-on type of documentation derived from tutorials and designed to quickly get users comfortable with basic use of the data.world platform.

RDF

Resource Description Framework

RDF represents information using semantic triples, which comprise a subject, predicate, and object. Turtle provides a way to group three URIs to make a triple, and provides ways to abbreviate such information, for example by factoring out common portions of URIs.

RDF triple store

An RDF triple store is similar to a graph database and stores information in semantic triples. It is accessed and manipulated using the SPARQL query language.

Reference

A type of documentation that includes tables, lists, glossaries, appendices, etc. It is informational, not instructional, in format and is not hands-on.

Release notes

Release notes on data.world are compiled in a rolling article that is regularly updated with new features and updates to data.world.

SAML

Security Assertion Markup Language

An open standard for exchanging authentication and authorization data between parties, in particular, between an identity provider and a service provider. SAML enables Single-Sign On (SSO)

Share-alike license

If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

SPARQL

SPARQL Protocol and RDF Query Language

Pronounced "sparkle", SPARQL is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in RDF format.

SQL

Structured Query Language

SQL is a language used to access and manipulate relational database management systems.

SSO

Single Sign-on

a property of access control of multiple related, yet independent, software systems. With this property, a user logs in with a single ID and password to gain access to any of several related systems.

Streams

Streams are a type of input (jsonl) that allows you to update and append records to a data file on data.world instead of having to re-upload the entire file when changes need to be made.

Summary

The summary is one of two documents created with a dataset or project. The summary is where all of the information about the origin of the data, why you created the dataset, further documentation of your work, etc. is found. Use the Summary section to tell your data's story.

Tag

Tags can be used to organize and group your dataset or project by topic, category, source, department, or team. They can be searched for explicitly with the tag search operator, and can also help to filter down more generic search results.

Team

A team is a group of people working on a project. A team could be an organization or a subset of an organization.

Title

The name of the dataset or project. Titles are accessible via search.

Triple

AKA Semantic triples

A triple is a set of three entities that arranges a statement about semantic data in the form of subject–predicate–object expressions. Each item in the triple is expressed as a Web URI.

TTL or Turtle

Terse RDF Triple Language

Terse RDF Triple Language (Turtle) is a syntax and file format for expressing data in the RDF data model. Turtle syntax is similar to that of SPARQL. Turtle provides a way to group three URIs to make a triple, and provides ways to abbreviate such information, for example by factoring out common portions of URIs.

Tutorial

One of our four types of documentation is a tutorial. Tutorials are instructional, in depth, and hands-on. A variation on the tutorial is a quick start which is a shorter, derivative version of a tutorial.

URI

Uniform Resource Identifier

A string of characters that unambiguously identifies a particular resource. To guarantee uniformity, all URIs follow a predefined set of syntax rules but also maintain extensibility through a separately defined hierarchical naming scheme (e.g. http://).The most common form of URI is the Uniform Resource Locator (URL), frequently referred to informally as a web address.

White paper

A high-level, but very technical document. It is informational, not instructional, in format and is not hands on.

FAQ

Can I change my user name?

We don't currently offer the ability to change your username within data.world, however, there are a couple of workarounds:

  1. create a new account using a different email address. If you'd like the initial account removed once you've created your new account, just submit a request for us to do so. Once removed, you could then go into your account settings to update your email if desired.

  2. submit a request for us to delete your account which will free up your email address so you'll be able to create a new account with the preferred username.

Note that both of these options will remove all content and social activity (likes, follows, etc.) associated with the account being deleted. Please be sure to back up your work and be ready to recreate it under the new account.

Can I update a file in my dataset?

Yes! To update a file on data.world simply upload the updated version with the same name and we will overwrite the existing one with the new version.

Note that we also store previous versions of you files so you can always revert to them if you need to.

How do I delete my account?

To remove or cancel a data.world account, you'll currently need to submit a request for data.world to manually remove it.

We hope to allow members to manage this in the future, but until then are happy to help with your direct request and also appreciate any final feedback you might share with us as part of it.

Note that upon deletion, all content stored under your account will be removed and your username will be back up for grabs by new members.

How much will data.world cost me?

data.world is free for individuals and small teams to discover and use open data, as well as create and collaborate on their own Datasets and Data Projects up to a specific size and number.

In line with data.world's mission to build the most meaningful, collaborative, and abundant data resource in the world, there is no limit to the number of public Datasets or Projects created and we encourage all of our members to help in this mission by adding open datasets they're building or working with!

For members and teams who need additional limits and features beyond what our free tier provides, please see our pricing page for details on the available options.

What are the size limits for data.world?

The data.world team is hard at work in extending the boundaries of the platform. Depending on your account plan (free or paid). A list of of what we currently support can be found in the article on data size limits.

What file types can I upload?

There is no restriction on file types that can be uploaded or downloaded on data.world, and a dataset can consist of any combination of files added to it. There are some, size limitations, and files are handled differently based on their extension.

What's the difference between Open and Private?

When creating a dataset or Data Project, you're given the option between open and private. Open datasets and projects will be visible, in their entirety, to anyone signed into data.world. They could be returned in search results, will be visible under your profile and will be available for querying and downloads. No other members will be able to change the dataset or project without explicit permission from you by adding them as a contributor with edit rights. If you are in an organization you have additional options for determining who can access and use your data. For more information see the article on setting dataset permissions .

If you can neither upload nor download data from data.world you might be behind a firewall that's blocking your access. If you think that might be the problem, try performing the same tasks on a different network, such as your home internet connection. You can find information about configuring your network firewall to accept data.world connections in Allowlist for data.world.

File upload status messages

Below is a list of status messages you might encounter when uploading data files to data.world. Please open a support ticket for additional assistance.

Error message

More details

No data could be extracted from this file **

This status will display if a file type is supported by data.world, yet cannot be previewed.

Check for syntax or formatting errors within your file.

Want to see data previews? Reupload this file with an extension.

Currently, data.world depends on file extensions to determine how best to prepare your data. If a file is uploaded with no extension, then you will see this status message.

If you believe this file’s data is actually a known format (say, .csv), then re-upload this file with the new extension added.

Excel files >100MB may only be downloaded.**

Due to how Excel files are structured, in some cases we are not able to fully preview the data inside the file. It is, however, still available for sharing and download.

This file type >100MB can only be downloaded.

The file is too large to properly ingest into data.world and is unavailable for queries or previews.

This file is shareable, though some advanced features may be unavailable due to its size.**

This status indicates that a file contains more cells of data than data.world was expecting. In some cases, you might be able to remove any unnecessary blank columns, rows or tabs.

Only the first 50 of 111 files were extracted.

When uploading archived or compressed files (zip, tar, etc), ensure each contains 50 files or less. Any files over this limit will not be extracted.

2 files were too large to be extracted from this archive. **

If a file within an archive exceeds data.world's data limits, we will show this status.

Try splitting the file into multiple smaller files within our size limits, then reupload.

Sorry, we can't extract the contents of this archive. It may be corrupted.

The archive cannot be extracted for another reason - it may be an invalid archive or an unsupported file type

No data could be extracted from this file.

The file is of a supported type, but has a structural problem that prevents its from being extracted.

This file is shareable, though some advanced features may be unavailable due to the size of this dataset. **

If a data file is uploaded to a dataset that results in the total dataset exceeding what data.world can process, this status will be displayed.

Check for and remove any unnecessary blank columns, rows or tabs from all tabular files within the dataset, or contact support for further assistance.

** Note that these errors are related to enhancing tabular and graph data to provide advanced functionality (data previews and queries). The file will still be uploaded to data.world and be available for download.

Finding help

We offer a number of different help resources for data.world members, including a Slack channel, documentation portal, and a blog with great content. Here we've included details on each.

Slack

The data.world Slack community can be wealth of knowledge, and even includes many of the data scientists and developers building the data.world platform. If you have questions, especially beyond bug reports and functionality requests you should stop by and engage some of the expert users of the platform.

In order to request an invite please visit our Slack sign up page and enter your email address:

mceclip0.png

Follow the instructions to be invited. When your account is active, you can visit the Slack sign-in page to jump in to participate in the conversation!

Blog: Distinct Values

Our blog Distinct Values is a collection of content related to data catalogs, cultures, and communities. We strive to provide thought leadership, interesting news about our platform, and exciting happenings in related to data analysis and visualization.

Documentation portal

data.world has a robust documentation portal that can help with many of the tasks and questions that you may encounter on the platform. In addition to our docs portal we also have documentation on:

Still need help?

If you can’t find what you need, please contact our support team using one of the following methods:

A guide to icons

Here is a list of the icons and the extensions associated with them for popular file on data.world :

  • Tabular: csv, xlsx, xls, json, jsonl, tsv, txt

  • Graph: ttl, rdf, nt, n3

  • Document: md, doc, docx, txt, rtf, pdf, ppt, gslides, gdoc

  • Image: jpg, jpeg, png, gif, svg, vg.json, vl.json

  • Archive: zip, gz, tar, tgz

  • Script: py, ipynb, r, rmd, sas, js, feather, css, html, rproj, htm, html, rdata

  • Query: sql and sparql queries (native to dw)

  • Geo: kml, shp, shx, cpg, prj, geojson, atx

  • Non-tabular data: sqlite, nested json

  • Generic: anything not listed above or when no file type has been given/inferred

SPARQL_query.svg

SPARQL query

SQL_query.svg

SQL query

project.svg

Project

dataset.svg

Dataset

tabular_file.svg

Tabular file

image.svg

Image file

pdf.svg

PDF

insight.svg

Insight

Document.svg

Document

graph.svg

Graph

script.svg

Script

dashboard.svg

Dashboard

tableau.svg

Tableau file

string.svg

Non-tabular data

zup_file.svg

Archive file

geographic.svg

Geo

generic_file.svg

Generic file

How to generate an API token for data.world

When you need an API token for a third-party application or data.world's metadata catalog collector, you can get it from your profile settings. Click on your avatar and choose Settings:

Profile_settings.png

Then select Advanced from the sidebar:

Advanced tab on profile settings.png

Both Read/Write and Admin tokens are provided. For the metadata catalog collector you can use the Read/Write token for the metadata catalog collector if you have write permissions to your organization's ddw-catalogs dataset.

Licensing and data you found

I've found an interesting dataset and want to put it on data.world. Can I do that?

You'll need to check the licensing terms on that dataset to see if you are authorized by the owner to distribute, re-post, re-publish or share it. If those terms allow you to do these things, you'll also need to review and comply with the conditions under which you can do so. We've put together a list of common licenses for datasets with links to the license terms here.

If the dataset is available to the public on the Internet, why do I need to check and comply with the terms?

Even if datasets are publicly available, their owners can continue to have rights in those datasets. Those rights extend to how the data is organized, displayed, described, visualized, etc. and can include the effort in compiling the data. These intellectual property rights need to be respected. To do so, make sure that you read and comply with the license terms on the dataset.

What happens if I don't comply with a dataset's license or terms?

If you don't comply with the license and terms of use on a dataset, you could be found to be in breach of contract and/or violation of copyright law. For example, if you are found by a court to have violated US copyright law, you would have to pay damages set by law without the owner of the copyright having to prove he or she suffered financially from your actions.

You could also be in violation of our terms of use by not having the right to post a dataset to data.world, including if you don't specify the appropriate license on a dataset, and you and/or the dataset could be removed from our platform.

Where can I find a dataset's licensing terms and conditions?

Sometimes finding the license terms on a dataset can be difficult. You can look for them:

  • On the main webpage

  • On the page where the summary or description of the dataset is located

  • On the download page of the dataset

  • In the terms of use or terms of service located in the footer of the webpage

  • Under "legal" in the footer of the webpage

But I can't find those license terms. Now what?

After searching the site where you found the dataset, you can't locate any terms or licenses that cover the dataset, you can reach out to the owner to see if he or she will give you permission to use the dataset or put a license on the dataset on the site. A dataset that does not have any license terms means the owner retains all rights in the dataset and does not authorize anyone else to use, copy, distribute, share, combine it with other data, or make any changes to it or derivative works from it.

What about fair use?

Fair use is a tricky area. If you use copyrighted materials in a certain way that complies with the fair use doctrine, you might not be infringing on the copyright. However, courts look at the specific circumstances of the usage, so even if your usage is similar to how others have used copyrighted materials, there is no guaranty that a court will find that you have not violated someone's copyright since your circumstances may be different.

The US Copyright office has summarized Section 107 of the US Copyright Act.

Section 107 provides the framework for determining whether something is a fair use and identifies certain types of uses—such as criticism, comment, news reporting, teaching, scholarship, and research—as examples of activities that may qualify as fair use. Section 107 calls for consideration of the following four factors in evaluating a question of fair use:

  • Purpose and character of the use, including whether the use is of a commercial nature or is for nonprofit educational purposes: Courts look at how the party claiming fair use is using the copyrighted work, and are more likely to find that nonprofit educational and noncommercial uses are fair. This does not mean, however, that all nonprofit education and noncommercial uses are fair and all commercial uses are not fair; instead, courts will balance the purpose and character of the use against the other factors below. Additionally, "transformative" uses are more likely to be considered fair. Transformative uses are those that add something new, with a further purpose or different character, and do not substitute for the original use of the work.

  • Nature of the copyrighted work: This factor analyzes the degree to which the work that was used relates to copyright's purpose of encouraging creative expression. Thus, using a more creative or imaginative work (such as a novel, movie, or song) is less likely to support a claim of a fair use than using a factual work (such as a technical article or news item). In addition, use of an unpublished work is less likely to be considered fair.

  • Amount and substantiality of the portion used in relation to the copyrighted work as a whole: Under this factor, courts look at both the quantity and quality of the copyrighted material that was used. If the use includes a large portion of the copyrighted work, fair use is less likely to be found; if the use employs only a small amount of copyrighted material, fair use is more likely. That said, some courts have found use of an entire work to be fair under certain circumstances. And in other contexts, using even a small amount of a copyrighted work was determined not to be fair because the selection was an important part—or the "heart"—of the work.

  • Effect of the use upon the potential market for or value of the copyrighted work: Here, courts review whether, and to what extent, the unlicensed use harms the existing or future market for the copyright owner's original work. In assessing this factor, courts consider whether the use is hurting the current market for the original work (for example, by displacing sales of the original) and/or whether the use could cause substantial harm if it were to become widespread.

In addition to the above, other factors may also be considered by a court in weighing a fair use question, depending upon the circumstances. Courts evaluate fair use claims on a case-by-case basis, and the outcome of any given case depends on a fact-specific inquiry. This means that there is no formula to ensure that a predetermined percentage or amount of a work—or specific number of words, lines, pages, copies—may be used without permission.

Licensing and data you own

Why license your dataset?

If your dataset does not have any license terms, it means you do not authorize anyone else to use, copy, distribute, share, combine it with other data, or make any changes to it or make derivative works from it. This absence of a license greatly reduces the reuse potential and usefulness of your dataset.

We encourage pick as open a license as you feel comfortable to maximize the benefits of your dataset. We believe the more open a license is, the more others will use your dataset. For more information on the details of licenses, see our list of common license types for datasets.

Common license considerations
Choose an established and current license

By choosing an established license like one from our list of common license types, you are choosing a license that is widely adopted. Such licenses were drafted by organizations dedicated to making those licenses functional in many situations as well as making them interoperable, clear and understandable. You'll need to read the actual licenses by clicking on the links we've provided to make sure you've picked the appropriate one for your dataset and how you would like others to interact with your dataset.

Consider how you want others to use your dataset

The more open a license you choose, the more others can use, share and distribute your dataset to get to insights faster. Your dataset could be important to solving a pressing issue. We encourage you to maximize your dataset's potential by choosing an open license.

Consider the results of a data project

When a project involves a number of datasets, each with different licenses, the licenses may conflict and greatly restrict or even prohibit the resulting work. By choosing the most open license, you amplify your dataset's usefulness. Another tip is to review the licenses of the other datasets that may be involved in a project or used in your industry to determine what type of license would allow your dataset to be used alongside those datasets. Usually, two datasets, both with CC-BY licenses, can be combined under those license terms. However, you will still need to pay attention to the different versions of those licenses to make sure they work with one another. In addition, just because datasets have licenses which are similar like a CC-BY and ODC-ODbL, does not mean those datasets can be combined because of conflicts between those licenses.

Our recommendation

We like the current versions of the open Creative Commons licenses, since these licenses are widely adopted, are applicable to databases and facilitate collaboration. We believe these licenses are becoming the more widely accepted for datasets and databases. In addition, Creative Commons has created a tool to help you choose the appropriate license for your dataset.

For instructions on how to set the license type for a dataset, see Setting a license type

To help determine the license to select, see Common license types for datasets

Find a dataset you'd like to share on data.world? Check out Licensing and data you found.

Notifications

To help you stay on top of what's happening with your data and in your organization, data.world provides a variety of notifications in different formats to various users. To make the notification process more transparent, we have the following tables which lay out the relationships between user and organization permissions, activity in the platform, and notification formats.

Query editor shortcuts

Query editor shortcuts for both SQL and SPARQL are available on data.world. Below is a list of the supported commands for Mac and Windows:

command + option + L

(ctrl + alt + L on Windows)

Automatically reformat your query to make it more readable.

command + shift + enter

(ctrl + shift + enter on Windows)

Automatically reformat AND run your query.

command + enter

(ctrl + enter on Windows)

Run your query.

command + S

(ctrl + S on Windows)

Save your query

Size limit and timeout specifications

Size limits

Account type

Individual/Team Free

Individual/Team Professional

Enterprise

Dataset ingested to data.world

100 MB

1 GB

1 GB

Metadata management datasets

n/a

n/a

no limit

Project

100 MB of project-specific files, no limit on linked datasets.

1GB of project-specific files, no limit on linked datasets.

1 GB

Derived dataset

100 MB

1 GB

1 GB

Virtual dataset (hosted on a remote server)

Size not limited by data.world

Size not limited by data.world

Size not limited by data.world

Size of a file in a dataset

100 MB

1 GB

1 GB

Number of files in a dataset

250

250

250

Number of columns in a table

limited by file size

limited by file size

limited by file size

Number of columns previewed in a table

50

50

50

Number of columns previewed in query results

500

500

500

Number of rows in a table

limited by file size

limited by file size

limited by file size

Number of rows previewed in query results

10,000

10,000

10,000

Rate limiting: Number of burst streams API calls

5 in the first second or after a 5 second idle period, then 1 per second

5 in the first second or after a 5 second idle period, then 1 per second

5 in the first second or after a 5 second idle period, then 1 per second

Size of a record streamed

1 MiB

1 MiB

1 MiB

Size of a request streamed

100 MB

1 GB

1 GB

Number of JSON objects in a stream

100 MB divided by the avg record size

1 GB divided by the avg record size

1 GB divided by the avg record size

Timeouts

Account type

Individual/Team Free

Individual/Team Professional

Enterprise

Query timeout before first byte is transmitted

1 minute

1 minute

1 minute

(upgrade to 5 minutes available upon request)

Query timeout before last byte is transmitted

60 minutes

60 minutes

60 minutes

Data upload timeout

None: As long as packets continue to be passed the connection will stay open

None: As long as packets continue to be passed the connection will stay open

None: As long as packets continue to be passed the connection will stay open

Supported file types

There is no restriction on file types that can be uploaded or downloaded on data.world, and a dataset can consist of any combination of files added to it. There are some size limitations, and files are handled differently based on the extension as follows:

Database file formats

Formats: sqlite

Database dumps will consist of multiple tables, and a schema that models the type information and the relationships between those tables. Each table will be represented as a data.world table, which can be previewed and queried naturally via our SQL engine.

Tabular files

Formats: csv, tsv, xls, xlsx

Tabular files are presented in a spreadsheet-style preview and we perform basic analyses on each of the columns:

supported_file_types.png

The data is then queryable using SQL and SPARQL; take a look at Query basics for more info on getting started with querying.

To provide these querying capabilities and in line with our mission to connect the world’s data (by making it linkable), we’re converting it to RDF Triples, or graph data, under-the-hood. To learn more, check out our blog post on the matter and the W3C primer on RDF.

Excel files will include all of the underlying sheets in a tabbed interface. Only the tabular data will be included; other elements like pivot tables and charts will not be shown in the preview but they will still be available in the original file.

In addition to viewing a preview of the data in the table, you can also see the metadata for the table by clicking on Switch to column overview. The ability to switch between the data preview and the column overview persists in the summary even after it's been saved. For more information about column overviews see our article Column overview.

Structured files

Formats: json, ND-JSON, other 'sufficiently tabular' json files

When a JSON file has a "sufficiently tabular" structure, we will attempt to produce a table of data that represents the contents of the file. Common logging formats that include JSON arrays of simple objects or newline-separated JSON objects will generally work great with this interpretation. If the structure of the file is too hierarchical or inconsistent in nature, the file will instead be treated in its raw form - you can view or download the file, but it’s not queryable through our query engine.

RDF data

Formats: rdf, rdfs, owl, jsonl, nt, ttl, n3

These formats are serializations of RDF data - since RDF is the native data format for data.world’s platform, the statements in this file are simply loaded into the graph for the dataset or project that the file is added to. By uploading raw RDF data into a dataset or project, that data is searchable via the attached SPARQL endpoint. Take a look at Query basics for more info on getting started with queries. We show a preview of the contents of the file, including summaries of the classes, properties, and namespaces used in the file.

Archive and compressed formats

Formats: zip, tar, tbz2, tbz, bz2, tgz, gz, -gz, z, -z

Archives that contain multiple files can be extracted and the first 50 files are stored in the dataset. Each extracted file is then handled using the criteria established for its extension. Please note that archives are not extracted by default. To do so, a Contributor must click on the ‘Extract’ button on the right-hand side of the archive.

Individual files that are compressed (i.e. foo.csv.gz) are decompressed and then treated as though the uncompressed file had been added directly.

Images

Formats: jpg, jpeg, png, gif, svg

Images are displayed in-line.

Source files

Formats: ipynb (version 4 and higher), js, r, py, as, apl, bash, bas, bat, c, cpp, cs, css, d, dart, diff, go, ini, java, julia, kt, lua, matlab, nasm, ml, perl, php, ps1, rb, scala, sql, tcl, ts, vim, yaml, xml, asp, jade, tex, less, sass, scss, Dockerfile

Source files are presented with full syntax highlighting where appropriate.

Documents

Formats: txt, html, md, pdf

The above document formats are rendered during preview. Other document types can be uploaded but are not available for preview.

Note that iframe embeds are not rendered in html files. Instead, try adding it as an Insight in a Data Project.

---

All other file types can be uploaded and downloaded as long as they are within the supported size limits.

Updating your profile settings

Your data.world settings page allows you to configure a number of options in the following categories:

  • Profile

  • Account

  • Organizations

  • Billing

  • Notifications

  • Advanced

If you're logged into your account, update these settings by clicking on your profile image (or placeholder image) in the top right corner of data.world and select Settings:

update-profile-settings-1.png
Profile

The entries in this section appear on your profile page and will be visible to other data.world users. Settings in this section include:

  • Full name (required)

  • Company or organization

  • Website

  • Bio

  • Photo - this can be uploaded from your computer, connected via Dropbox, linked via a URL, or taken with your computer's camera via the browser

Account
  • E-mail address. Only one account is allowed for each e-mail address registered on data.world. When changing your e-mail address, we will send you a verification e-mail to the new address. Follow the instructions in that e-mail to verify the new address.

  • Password

  • Data.world currently doesn't support changing your username, so if you need it updated or run into any other login issues, please submit a request through a ticket or by emailing help@data.world.

Organizations
  • Create a new organization in which you will be an administrator

  • Leave any organizations that you are a part of

  • If you have the appropriate access level, manage the membership of the organization and modify the subscription level of the organization

Billing
  • Modify the subscription level of your individual account

  • Modify the subscription level of any organizations in which you are an administrator

  • Update credit card information for a subscription

Notifications
  • Toggle e-mail notifications for projects and datasets and projects that you're part of

Advanced
  • Access and reset account-wide API tokens

  • Revoke access to any authorized integrations

  • Enable experimental features