Manipulating CSV Files

C# .NET language

How manipulating CSV files in the internet era? Comma Separated Files (CSV) are text files that contain multiple records (rows), each with one or more elements (columns) separated by a comma character. Actually, the elements can be separated by any type of delimiter, not only a comma. For instance, another common file format is the Tab Separated Value (TSV) file where every element is separated by a tab character.

So, CSV files present a unique set of opportunities for sharing large quantities of data as they’re dense and contain little of the wasted content that’s commonly found in JSON or XML files. They also compress rather nicely, which lowers bandwidth uses.

The basic example

As stated above, this article will be all about reading and writing movie data formatted in various CSV formats. The following class code represents the data:

public class Movie
{
    public string Name { get; set; } = "";
    public string Director { get; set; } = "";
    public DateTime DateReleased { get; set; }
    public decimal
    BoxOfficeGross { get; set; } = 0.0m;
}

public static List<Movie> GetMovies()
{
    var movies = new List< Movie >();

    movies.Add(new Movie (){
        Name = "American Graffiti",
        Director = "George Lucas",
        DateReleased = new DateTime(1977,5,23),
        BoxOfficeGross = 123456});

    movies.Add(new Movie () { 
        Name = "Star Wars",
        Director = "George Lucas",
        DateReleased = new DateTime(1977, 5, 23),
        BoxOfficeGross = 123456 });

    movies.Add(new Movie () { 
        Name = "Empire Strikes Back",
        Director = "Irving Kirshner",
        DateReleased = new DateTime(1977, 5, 23),
        BoxOfficeGross = 123456 });

    movies.Add(new Movie (){ 
        Name = "Return of the Jedi",
        Director = "Richard Marquand",
        DateReleased = new DateTime(1977, 5, 23),
        BoxOfficeGross = 123456
    });
    return movies;
}

Transform a list in a CSV file

Looking around for a package that manages CSV file, I found CSVHelper. Also, I found another package oriented to Excel that can help you to also generate CSV called ClosedXML and I talked about in my post “How to Export Data to Excel in Blazor“.

To install CSVHelper, you the following command

Install-Package CsvHelper

Writing CSV Files

Once you’ve created your basic project, you can start by outputting a collection of data to a CSV file. The following code demonstrates the simplest mechanism for writing a collection of movie records to a CSV file.

public static void WriteCsvFile(List<Movie> dataToWrite, string outputFile)
{
    var config = new CsvConfiguration(CultureInfo.InvariantCulture);

    using (var writer = new StreamWriter(outputFile))
    using (var csv = new CsvWriter(writer, config))
    {
        csv.WriteRecords(dataToWrite);
    }
}

When you examine this code, take notice of the following items:

  • The code creates a CSVConfiguration object. This object will be used to control the output of your CSV file.
  • The file opens a StreamWriter that controls where your file will be written.
  • The code then creates a CSVWriter object passing in the configuration object. This Writer sends your data to the stream opened by the writer using the passed-in configuration settings.
  • Finally, the call to WriteRecords routine takes an IEnumarable collection and writes to the CSV file.
Movie Data Output as CSV file - Manipulating CSV Files
Movie Data Output as CSV file

Configuring Writer Options

As stated earlier, the CSVWriter accepts a configuration object that’s used to control output options. A few of the key options will be covered next.

No Header Column

You may or may not want to include a header file in your CSV files. By default, CSVHelper adds the name of your class’ properties in a header row. You can turn off the header by setting it with the following code:

config.HasHeaderRecord = false;
A CSV file with no header
Shows the results of this option

Changing Delimiters

One of the more common options is the delimiter used between each data element. By default, CSVHelper delimits data comma characters. The following three examples show how you can change the delimiter to the PIPE, TAB, and a “crazy” delimiter.

  • Changing the delimiter to PIPE:
config.Delimiter = "|";
The CSV file with PIPE delimiter
Shows the PIPE delimiter in action
  • Changing the delimiter to TAB:
config.Delimiter = "\t";
The CSV file with TAB delimiter
Shows the TAB delimiter in action
  • Creating a “Crazy” delimiter (This is just to demonstrate that your delimiter can be anything you desire):
config.Delimiter = "[[YES_IM_A_DELIMETER]]";

Figure 6 .

The CSV file with the CRAZY delimiter
Shows the “Crazy” delimiter doing its thing

Quote Delimiting

I’ve found in many situations that my data needs to have each data element wrapped in quotation marks. This is especially true when your data contains delimiters within their fields, e.g., commas. CSVHelper allows you to quote-delimit your data using the following options.

config.ShouldQuote = args => true;

Figure 7 Shows the CSV with quoted content.

The CSV file with quoted content
Shows the CSV with quoted content

Formatting Output with Map Classes

Another very handy tool is the ability to control the output sent to your file. By default, CSVHelper outputs elements by reflecting on the class they come from and creating columns for each property. There are many situations where you may want to export a limited set of properties, or you wish to change the order of the output files.

This is where mapping classes come in. When exporting data CSVHelper can accept a mapping object derived from the ClassMap class. The following code demonstrates a ClassMap that limits the data exported to two properties.

public class MovieOutputClassMap: ClassMap<Movie>
{
    public MovieOutputClassMap()
    {
        Map(m => m.Name);
        Map(m => m.DateReleased);
    }
}

Once you’ve built your class map, you need to apply it to your writer. This is done using two commands. The first one creates an instance of your class map.

var classMap = new MovieOutputClassMap();

The second registers it with the writer Context property:

csv.Context.RegisterClassMap(classMap);

The full writer code is shown below:

public static void WriteCsvFile(List<Movie> dataToWrite, string outputFile)
{
    var config = new CsvConfiguration(CultureInfo.InvariantCulture);

    //include header
    config.HasHeaderRecord = false;

    //change delimiter
    config.Delimiter = "|";

    //quote delimit
    config.ShouldQuote = args => true;

    //changing the order of fields
    var classMap = new MovieOutputClassMap();

    using (var writer = new StreamWriter(outputFile))

    using (var csv = new CsvWriter(writer, config))
    {
        csv.Context.RegisterClassMap(classMap);
        csv.WriteRecords(dataToWrite);
    }
}
CSV file with only two columns exported
Shows the CSV file with two columns

You can also use a class map to reorder your output

public class MovieOutputClassMap: ClassMap<Movie>
{
    public MovieOutputClassMap()
    {
        Map(m => m.Name);
        Map(m => m.DateReleased);
        Map(m => m.Director);
        Map(m => m.BoxOfficeGross);
    }
}
The CSV file columns reordered
Shows the CSV file with its columns reordered

Along with altering the number of columns exported and changing the ordinal position of them, you can also control the text that’s emitted into the CSV stream. Altering the output (and input) is done using a class that implements the ITypeConverter interface.

The code below demonstrates creating a type converter that alters the output of the DateReleased property removing the time component.

This code receives the property’s value and returns a string using the ConvertToString aspect of the type converter. There’s also a corollary for reading these values from strings via an implementation of the ConvertFromString function.

public class DateOutputConverter : ITypeConverter
{
    public object ConvertFromString(string text,
        IReaderRow row, MemberMapData memberMapData)
    {
        throw new NotImplementedException();
    }
    public string ConvertToString(
        object value,
        IWriterRow row,
        MemberMapData memberMapData)
    {
        var retval = ((DateTime) value).ToString("d");
        return retval;
    }
}

Once you’ve created your converter, you attach it to your column via the mapping class. The following code shows how to attach a converter to a property map.

public class MovieOutputClassMap : ClassMap<Movie>
{
    public MovieOutputClassMap()
    {
        Map(m => m.Name);
        Map(m => m.DateReleased).TypeConverter(new DateOutputConverter());
        Map(m => m.Director);
        Map(m => m.BoxOfficeGross);
    }
}

Figure 10 Shows the CSV file with the date formatting altered.

The CSV file with the date formatting altered
Shows the CSV file with the date formatting altered

Reading CSV Files

Now that you have a basic understanding of writing CSV files, you can turn your sights to reading CSV files. There are two primary mechanisms for reading a file. The first is to open the file and iterate through it one record at a time.

When you examine this set of code for reading files, take notice of the following items:

  • The code creates a CSVConfiguration object. This object is used to control how the reader manipulated your CSV data as it was read.
  • The file opens a StreamReader, which controls where your file will be read from.
  • The code then creates a CSVReader object passing in the configuration object. This reader is used to iterate through your CSV file one record at a time.
  • The code iterates the file using the Read() function, which moves down the file one record at a time. Note that the code does a Read() immediately, to skip the record header.
  • Finally, the code uses various Getter functions to read data from each column.
public static List<Movie> ManualReadCsvFile(string inputFile)
{
    var retval = new List<Movie>();
    var config = new CsvConfiguration(CultureInfo.InvariantCulture);

    using (var reader = new StreamReader(inputFile))
    using (var csv = new CsvReader(reader, config))
    {
        //skip the header
        csv.Read();
        while (csv.Read())
        {
            var movie = new Movie();
            movie.Name = csv.GetField(0);
            movie.Director = csv.GetField(1);
            movie.DateReleased = csv.GetField<DateTime>(2);
            movie.BoxOfficeGross = csv.GetField<decimal>(3);
            retval.Add(movie);
        }
    }
    return retval;
}

Another and much simpler way to read a file is to use CSVHelper’s built-in mechanism for iterating through a file automatically transforming CSV records into to .NET classes.

When you examine this set of code for reading files, take notice of the following items:

  • The code creates a CSVConfiguration object. This object is used to control how the reader manipulated your CSV data as it was read.
  • The file opens a StreamReader, which controls where your file will be read from.
  • The code then creates a CSVReader object passing in the configuration object. This reader is used to iterate through your CSV file one record at a time.
  • The code then reads all the records using the GetRecords<T> method. This function returns an IEnumerable collection.
  • The collection is then added to the functions return value via the AddRange() method.
public static List<Movie> ReadCsvFile(string inputFile)
{
    var retval = new List<Movie>();
    var config = new CsvConfiguration(CultureInfo.InvariantCulture);
    using (var reader =new StreamReader(inputFile))
    using (var csv = new CsvReader(reader, config))
    {
        retval.AddRange(csv.GetRecords<Movie>());
    }
    return retval;
}

As you can see, this style of code is much simpler to deal with.

You can also use class maps to change the order of how CSV elements are read from your CSV file and are applied to the returned object’s properties. The following class map reads content from the CSV created earlier in this article. Notice the column order.

public class MovieInputClassMap : ClassMap<Movie>
{
    public MovieInputClassMap()
    {
        Map(m => m.Name);
        Map(m => m.DateReleased);
        Map(m => m.Director);
        Map(m => m.BoxOfficeGross);
    }
}

The code used to attach a class map is exactly like the writer. You simply create an instance of the class map and apply it to the CSVReader’s Context property:

public static List<Movie> ReadCsvFile(string inputFile)
{
    var retval = new List<Movie>();
    var config = new CsvConfiguration(CultureInfo.InvariantCulture);

    var classMap = new MovieInputClassMap();

    using (var reader = new StreamReader(inputFile))

    using (var csv = new CsvReader(reader, config))
    {
        csv.Context.RegisterClassMap(classMap);
        retval.AddRange(csv.GetRecords<Movie>());
    }
    return retval;
}

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.