CSV Dialect

Author(s) Rufus Pollock,
JSON Schema (for spec) /schemas/csv-dialect.json
Version 1.2


The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119.


CSV Dialect defines a simple format to describe the various dialects of CSV files in a language agnostic manner. It aims to deal with a reasonably large subset of the features which differ between dialects, such as terminator strings, quoting rules, escape rules and so on. The specification has been modeled around the union of the csv modules in Python and Ruby, and the bulk load capabilities of MySQL and PostgresQL.


CSV Dialect has nothing to do with the names, contents or types of the headers or data within the CSV file, only how it is formatted. However, CSV Dialect does allow the presence or absence of a header to be specified, similarly to RFC4180.

CSV Dialect is also orthogonal to the character encoding used in the CSV file. Note that it is possible for files in CSV format to contain data in more than one encoding.


CSV Dialect is useful for programmes which might have to deal with multiple dialects of CSV file, but which can rely on being told out-of-band which dialect will be used in a given input stream. This reduces the need for heuristic inference of CSV dialects, and simplifies the implementation of CSV readers, which must juggle dialect inference, schema inference, unseekable input streams, character encoding issues, and the lazy reading of very large input streams.

Some related work can be found in this comparison of csv dialect support, this example of similar JSON format, and in Python's PEP 305.


A CSV Dialect descriptor, dialect, MUST be a JSON object with the following properties:

  • delimiter - specifies the character sequence which should separate fields (aka columns). Default = ,. Example \t.
  • lineTerminator - specifies the character sequence which should terminate rows. Default = \r\n
  • quoteChar - specifies a one-character string to use as the quoting character. Default = "
  • doubleQuote - controls the handling of quotes inside fields. If true, two consecutive quotes should be interpreted as one. Default = true
  • escapeChar - specifies a one-character string to use for escaping (for example, \), mutually exclusive with quoteChar. Not set by default
  • nullSequence - specifies the null sequence (for example \N). Not set by default
  • skipInitialSpace - specifies how to interpret whitespace which immediately follows a delimiter; if false, it means that whitespace immediately after a delimiter should be treated as part of the following field. Default = true
  • header - indicates whether the file includes a header row. If true the first row in the file is a header row, not data. Default = true
  • commentChar - indicates a one-character string to indicate lines whose remainder should be ignored
  • caseSensitiveHeader - indicates that case in the header is meaningful. For example, columns CAT and Cat should not be equated. Default = false
  • csvddfVersion - a number, in n.n format, e.g., 1.2. If not present, consumers should assume latest schema version.


Here's an example:

  "dialect": {
    "csvddfVersion": 1.2,
    "delimiter": ";",
    "doubleQuote": true,
    "lineTerminator": "\r\n",
    "quoteChar": "\"",
    "skipInitialSpace": true,
    "header": true,
    "commentChar": "#"


See the Changelog for information.


The following implementations are available for csv-dialect:

See the implementation page for further information on writing an implementation of a Frictionless Data specification.

bookdocsexternal fforumgithubgitterheartpackageplayrocket softwaretools