CSV - Comma Separated Values

This page provides an overview CSV (Comma Separated Values) format for data.

CSV is a very old, very simple and very common "standard" for (tabular) data. We say "standard" in quotes because there was never a formal standard for CSV, though in 2005 someone did put together a RFC for it.

CSV is supported by a huge number of tools from spreadsheets like Excel, OpenOffice and Google Docs to complex databases to almost all programming languages. As such it is probably the most widely supported structured data format in the world.

The Format

Key points are:

  • CSV is probably the simplest possible structured format for data
  • CSV strikes a delicate balance, remaining readable by both machines & humans
  • CSV is a two dimensional structure consisting of rows of data, each row containing multiple cells. Rows are (usually) separated by line terminators so each row corresponds to one line. Cells within a row are separated by commas (hence the C(ommmas) part)
    • Note that strictly we're really talking about DSV files in that we can allow 'delimiters' between cells other than a comma. However, many people and many programs still call such data CSV (since comma is so common as the delimiter)
  • CSV is a "text-based" format, i.e. a CSV file is a text file. This makes it amenable for processing with all kinds of text-oriented tools (from text editors to unix tools like sed, grep etc)

What a CSV looks like

If you open up a CSV file in a text editor it would look something like:


Here there are 3 rows each of 3 columns. Notice how the second column in the last line is "quoted" because the content of that value actually contains a "," character. Without the quotes this character would be interpreted as a column separator. To avoid this confusion we put quotes around the whole value. The result is that we have 3 rows each of 3 columns (Note a CSV file does not have to have the same number of columns in each row).

Dialects of CSVs

As mentioned above, CSV files can have quite a bit of variation in structure. Key options are:

  • Field delimiter: rather than comma , people often use things like \t (tab), ; or |
  • Record terminator / line terminator: is \n (unix), \n\r (dos) or something else ...
  • How do you quote records that contain your delimiter

You can read more in the CSV Dialect Description Format which defines a small JSON-oriented structure for specifying what options a CSV uses.

What is Missing in CSV?

  • CSV lacks any way to specify type information: that is, there is no way to distinguish "1" the string from 1 the number. This shortcoming can be addressed by adding some form of simple schema. For example Table Schema provides a very simple way to describe your schema externally whilst Linked CSV is an example of doing this "inline" (that is, in the CSV).
  • No support for relationships between different "tables". This is similar to the previous point and again Table Schema provides a way to address this by providing additional schema information externally.
  • CSV is really only for tabular data -- it is not so good for data with nesting or where structure is not especially tabular (though remember most data can be put into tabular form if you try hard enough!)

Specifications and overviews:


The great thing about CSV is the huge level of tool support. The following is not intended to be comprehensive but is more at the electic end of the spectrum.


All spreadsheet programs including Excel, OpenOffice, Google Docs Spreadsheets supporting opening, editing and saving CSVs.

View a CSV file in your Browser

You can view a CSV file (saving you the hassle of downloading it and opening it). Options include:

Unix Command Line Manipulation


Power Tools

  • OpenRefine is a powerful tool for editing and manipulating data and works very well with CSV
  • Data Explorer supports importing CSVs and manipulating and changing them using javascript in the browser


This is heavily biased towards python!


  • Built in csv library is good
  • The wonderful csvkit (python)
  • messytables (python) - convert lots of badly structured data into CSV (or other formats)


Nothing in standard lib yet and best option seems to be:

Tips and Tricks

CSVs and Git

Get git to handle CSV diffs in a sensible way (very useful if you are using git or another version control system to store data).

Make these changes to config files:

# ~/.config/git/attributes
*.csv diff=csv

# ~/.gitconfig
[diff "csv"]
  wordRegex = [^,\n]+[,\n]|[,]

Then do:

git diff --word-diff
# make it even nicer
git diff --word-diff --color-words

Credit for these fixups to contributors on this question on StackExchange and to James Smith.

bookdocsexternal fforumgithubgitterheartpackageplayrocket softwaretools