Tabular Data Package

Tabular Data Package is a simple format for publishing and sharing tabular data with the following key features:

As suggested by the name, Tabular Data Packages extend and specialize the Data Package format for the specific case where the data is tabular.

Read the full RFC-style specification of Tabular Data Package to complement this quick introduction.


There is a growing set of online and offline for working with Tabular Data Packages including tools for creating, viewing and validating.

Getting Started

Here's an example of a minimal Tabular Data Package dataset:

There are just 2 files, the data file data.csv and the datapackage.json:


data.csv looks like:


That is there are 3 fields (columns) and 2 rows of data.

A simple datapackage.json for this data would be:

      "name": "my-dataset",
      # here we list the data files in this dataset
      "resources": [
          "path": "data.csv",
          "schema": {
            "fields": [
                "name": "var1",
                "type": "string"
                "name": "var2",
                "type": "integer"
                "name": "var3",
                "type": "number"


CSVs are plain text files with commas separating each column and each row on one line (normally!). CSVs can be produced and consumed by almost all tools including spreadsheet programs like Excel and databases like MySQL. Read more about CSVs here.

There are a few specific requirements for CSV files in Tabular Data Packages:

  • They must use the UTF-8 character encoding.
  • They must be well-formatted - a single header row at the top of the file, no blank rows before, within, or after the data rows, etc
  • Use "," as the field delimiter unless explicitly stated otherwise

Delimiters other than Comma

CSV files in Tabular Data Package are not absolutely required to have "," as the field delimeter - you can use tab, ";" or any other kind of character.

If you do use a delimiter other than "," you must specify the info about the delimiters in the resource entry using a "dialect" attribute and CSV Dialect Description Format.

Here is an example of how the datapackage.json would look in this case:

    resources: [
        dialect: { # as per CSV Dialect specification
          "delimiter": ";"
        schema:  # as per Table Schema

Table Schema

The schema for each CSV resource follows Table Schema. Its purpose is to provide crucial additional information about the fields (columns) in the CSV file, most importantly the type (string, number, date etc) of that data.

The structure of the fields attribute is an array with each entry being a field descriptor. Field descriptors look like:

      # required
      "name": "name of field/column (should correspond to field name in data)",

      # not strictly required but strongly recommended
      "type": "A string specifying the data type for data in this field",

      # all of these are optional ...
      "title": "A nicer human readable label or title for the field",
      "format": "A string specifying format of data (e.g. date format)",
      "description": "A description for the field",

The Table Schema spec has a full list of data types that are supported including string, number, date etc.


There are many examples of Tabular Data Packages in the Core Data project on DataHub. Specific examples:

World GDP

A Data Package which includes the data locally in the repo (data is CSV).

Here's the datapackage.json:

S&P 500 Companies Data

This is an example with more than one resource in the data package.

Here's the datapackage.json:

bookdocsexternal fforumgithubgitterheartpackageplayrocket softwaretools