Skip to main content

module lang::csv::IO

rascal-0.34.0

Functions for reading and writing Comma-Separated Values (CSV) files.

Usage

import lang::csv::IO;

Dependencies

import lang::csv::\syntax::Parse;
import lang::csv::ast::CSV;
import lang::csv::ast::Implode;
import Map;
import List;

Description

The http://tools.ietf.org/html/rfc4180[CSV format] is used for exchanging information between spreadsheets and databases. A CSV file has the following structure:

  • An optional header line consisting of field names separated by comma's.
  • One or more lines consisting of values separated by comma's.

The following functions are provided:

Examples

  • CSV file with headers
field_name1,field_name2,field_name3
aaa,bbb,ccc
zzz,yyy,xxx

function readCSV

Read a relation from a CSV (Comma Separated Values) file.

value readCSV(loc location, bool header = true, str separator = ",", str encoding = "UTF8")

value readCSV(loc location, map[str,str] options)

&T readCSV(type[&T] result, loc location, bool header = true, str separator = ",", str encoding = "UTF8")

Read a CSV file and return a value of a required type.

The result argument is the required type of the value that is produced by reading the CSV that is found at location. Optionally, the following arguments can be supplied:

  • header = true specifies that a header is present (default).
  • header = false specifies that no header is present.
  • separator = "," specifies that , is the separator character between fields (default).

The CSV data should conform to the specified type (if any).

If the required type is not specified, it is inferred in three steps:

Step 1: The type of each field occurrence is inferred from its contents using the following rules:

  • An empty value is of type void.
  • A field that contains a string that corresponds to a number is numeric.
  • A field that contains true or false is of type is bool.
  • In all other cases the field is of type str.

Step 2: The type of each field is inferred from the type of all of its occurrences:

  • If all occurrences have a numeric type, then the smallest possible type is used.
  • If the occurrences have a mixed type, i.e., numeric, non-numeric, boolean or string, then the type is str.
  • If the requested type for a field is str and another type would be inferred by the preceeding two rules, its inferred type will be str.

Reading the values in fields is straightforward, except for the case that the text in the field is enclosed between double quotes ("):

  • the text may include line breaks which are represented as \n in the resulting string value of the field.
  • the text may contain escaped double quotes ("") which are represented as \" in the resulting string value.

Examples

Given is the follwing file ex1.csv:

position;artist;title;year
1;Eagles;Hotel California;1977
2;Queen;Bohemian rhapsody;1975
3;Boudewijn de Groot;Avond;1997

We can read it in various ways:

rascal>import lang::csv::IO;
ok
rascal>R1 = readCSV(#rel[int position, str artist, str title, int year], |lib://rascal/org/rascalmpl/library/lang/csv/examples/ex1.csv|, separator = ";");
rel[int position,str artist,str title,int year]: {
<1,"Eagles","Hotel California",1977>,
<2,"Queen","Bohemian rhapsody",1975>,
<3,"Boudewijn de Groot","Avond",1997>
}

Now we can, for instance, select one of the fields of R1:

rascal>R1.artist;
set[str]: {"Queen","Boudewijn de Groot","Eagles"}

It is also possible to infer the type:

rascal>R1 = readCSV(|lib://rascal/org/rascalmpl/library/lang/csv/examples/ex1.csv|, separator = ";");
rel[int position,str artist,str title,int year]: {
<1,"Eagles","Hotel California",1977>,
<2,"Queen","Bohemian rhapsody",1975>,
<3,"Boudewijn de Groot","Avond",1997>
}

function getCSVType

type[value] getCSVType(loc location, bool header = true, str separator = ",", str encoding = "UTF8")

function writeCSV

Write a relation to a CSV (Comma Separated Values) file.

void writeCSV(type[&T] schema, &T relation, loc location, bool header = true, str separator = ",", str encoding = "UTF8")

Write relation to a CSV file at location. The options influence the way the actrual CSV file is written:

  • header: add or omit a header (based on the labels of the relation).
  • separator: defines the separator character between fields (default is ,).

Examples

rascal>import lang::csv::IO;
ok
rascal>rel[int position, str artist, str title, int year] R1 = {
>>>>>>> <1,"Eagles","Hotel California",1977>,
>>>>>>> <2,"Queen","Bohemian rhapsody",1975>,
>>>>>>> <3,"Boudewijn de Groot","Avond",1997>
>>>>>>>};
rel[int position,str artist,str title,int year]: {
<1,"Eagles","Hotel California",1977>,
<2,"Queen","Bohemian rhapsody",1975>,
<3,"Boudewijn de Groot","Avond",1997>
}

we can write the CSV with a header row:

rascal>writeCSV(#rel[int position, str artist, str title, int year], R1, |tmp:///ex1a.csv|);
ok
rascal>

or write it without the header row:

rascal>writeCSV(#rel[int, str, str, int], R1, |tmp:///ex1b.csv|, header = false, separator = ";");
ok

The result of both calls to writeCSV are included below:

ex1a.csv (with a header line and default separator ,):

position,artist,title,year
1,Eagles,Hotel California,1977
2,Queen,Bohemian rhapsody,1975
3,Boudewijn de Groot,Avond,1997

ex1b.csv (without a header line with separator ;):

1;Eagles;Hotel California;1977
2;Queen;Bohemian rhapsody;1975
3;Boudewijn de Groot;Avond;1997

function loadCSV

lang::csv::ast::CSV::Table loadCSV(loc l)

function loadNormalizedCSV

lang::csv::ast::CSV::Table loadNormalizedCSV(loc l)

function generate

Generator for CSV resources

str generate(str moduleName, loc uri)