Skip to main content

Licensing and data you found

I've found an interesting dataset and want to put it on data.world. Can I do that?

You'll need to check the licensing terms on that dataset to see if you are authorized by the owner to distribute, re-post, re-publish or share it. If those terms allow you to do these things, you'll also need to review and comply with the conditions under which you can do so. We have put together a list of common licenses for datasets with links to the license terms here.

If the dataset is available to the public on the Internet, why do I need to check and comply with the terms?

Even if datasets are publicly available, their owners can continue to have rights in those datasets. Those rights extend to how the data is organized, displayed, described, visualized, etc. and can include the effort in compiling the data. These intellectual property rights need to be respected. To do so, make sure that you read and comply with the license terms on the dataset.

What happens if I don't comply with a dataset's license or terms?

If you don't comply with the license and terms of use on a dataset, you could be found to be in breach of contract and/or violation of copyright law. For example, if you are found by a court to have violated US copyright law, you would have to pay damages set by law without the owner of the copyright having to prove he or she suffered financially from your actions.

You could also be in violation of our terms of use by not having the right to post a dataset to data.world, including if you don't specify the appropriate license on a dataset, and you and/or the dataset could be removed from our platform.

Where can I find a dataset's licensing terms and conditions?

Sometimes finding the license terms on a dataset can be difficult. You can look for them:

  • On the main webpage

  • On the page where the summary or description of the dataset is located

  • On the download page of the dataset

  • In the terms of use or terms of service located in the footer of the webpage

  • Under "legal" in the footer of the webpage

But I can't find those license terms. Now what?

After searching the site where you found the dataset, you can't locate any terms or licenses that cover the dataset, you can reach out to the owner to see if he or she will give you permission to use the dataset or put a license on the dataset on the site. A dataset that does not have any license terms means the owner retains all rights in the dataset and does not authorize anyone else to use, copy, distribute, share, combine it with other data, or make any changes to it or derivative works from it.

What about fair use?

Fair use is a tricky area. If you use copyrighted materials in a certain way that complies with the fair use doctrine, you might not be infringing on the copyright. However, courts look at the specific circumstances of the usage, so even if your usage is similar to how others have used copyrighted materials, there is no guaranty that a court will find that you have not violated someone's copyright since your circumstances may be different.

The US Copyright office has summarized Section 107 of the US Copyright Act.

Section 107 provides the framework for determining whether something is a fair use and identifies certain types of uses—such as criticism, comment, news reporting, teaching, scholarship, and research—as examples of activities that may qualify as fair use. Section 107 calls for consideration of the following four factors in evaluating a question of fair use:

  • Purpose and character of the use, including whether the use is of a commercial nature or is for nonprofit educational purposes: Courts look at how the party claiming fair use is using the copyrighted work, and are more likely to find that nonprofit educational and noncommercial uses are fair. This does not mean, however, that all nonprofit education and noncommercial uses are fair and all commercial uses are not fair; instead, courts will balance the purpose and character of the use against the other factors below. Additionally, "transformative" uses are more likely to be considered fair. Transformative uses are those that add something new, with a further purpose or different character, and do not substitute for the original use of the work.

  • Nature of the copyrighted work: This factor analyzes the degree to which the work that was used relates to copyright's purpose of encouraging creative expression. Thus, using a more creative or imaginative work (such as a novel, movie, or song) is less likely to support a claim of a fair use than using a factual work (such as a technical article or news item). In addition, use of an unpublished work is less likely to be considered fair.

  • Amount and substantiality of the portion used in relation to the copyrighted work as a whole: Under this factor, courts look at both the quantity and quality of the copyrighted material that was used. If the use includes a large portion of the copyrighted work, fair use is less likely to be found; if the use employs only a small amount of copyrighted material, fair use is more likely. That said, some courts have found use of an entire work to be fair under certain circumstances. And in other contexts, using even a small amount of a copyrighted work was determined not to be fair because the selection was an important part—or the "heart"—of the work.

  • Effect of the use upon the potential market for or value of the copyrighted work: Here, courts review whether, and to what extent, the unlicensed use harms the existing or future market for the copyright owner's original work. In assessing this factor, courts consider whether the use is hurting the current market for the original work (for example, by displacing sales of the original) and/or whether the use could cause substantial harm if it were to become widespread.

In addition to the above, other factors may also be considered by a court in weighing a fair use question, depending upon the circumstances. Courts evaluate fair use claims on a case-by-case basis, and the outcome of any given case depends on a fact-specific inquiry. This means that there is no formula to ensure that a predetermined percentage or amount of a work—or specific number of words, lines, pages, copies—may be used without permission.