PHP CSV Reader and Writer

By Michael Kramer on October 31st 2014 @ 8:55 am

One thing I find myself doing often is reading and writing CSVs using PHP, so I finally sat down and created some helper classes that are lightweight, and allow me to quickly read and write CSVs. I chose to do them as separate classes, however you could easily combine them into one if you so please.

This class has proved highly efficient at parsing through large CSV files, I've personally used it on files with over 2 million records.

csv reader

Probably the most common task is reading from a CSV file, which is done a few ways, either the field_names are in the first row, or you already know the columns and what fields they go to.

Let's start our class with:

class CSV_Reader {
    protected $_handle = null; //file handle
    protected $_init = false; //have we initalized ourselves
    protected $_csv = null; // full path to CSV file
    protected $_layout = array(); //array that defines layout of CSV
    protected $_row = null; //current row of the CSV we are on
};

I setup the reader so you could easily setup the fields, I was doing way to much of $line[5]; //first_name $line[6]; //last_name and then if the structure changed I'd have to go through and re-update all of my code.

So for this, I call it the layout, which is essentially an associative array, that you can either set directly, or read from the first row.

Let's add our getter/setter functions:

    /**
     * Set our CSV Layout, what fields correspond to what
     * @param array $layout
     */
    public function setLayout(array $layout){
        $this->_layout = $layout;
    }
    
    /**
     * Return our CSV Layout
     * @return array
     */
    public function getLayout(){
        return $this->_layout;
    }

And then the function to read from the first row:

    /**
     * Read the layout from the values in the first row, i.e,
     * first_name, last_name,
     */
    public function readLayoutFromFirstRow(){
        $this->_init(); //our initializer function, will go over this next
        $this->_layout = array(); //reset for multiple
        $line = fgetcsv($this->_handle, 4096, ',');
        if(!$line){
            fclose($this->_handle);
            throw new Exception('Invalid File, could not read layout from first row');
        }
        
        foreach($line as $key){
            $this->_layout[] = strtolower($key);
        }
    }

Alright, so now we can choose to read the field_names from the first row of our CSV, or set our layout directly like:

$reader = new CSV_Reader();
$reader->setLayout(array('first_name', 'last_name'));

If you noticed I had a _init function, this opens the file for reading, and sets up some variables that we need.

    /**
     * Initialize CSV, open file and get it ready for reading
     * @throws Exception
     */
    protected function _init(){
        ini_set('auto_detect_line_endings', 1);
        $this->_init = true;
        $this->_handle = fopen($this->_csv, "r");
        if(!$this->_handle){
            throw new Exception('Could not open file: ' . $this->_csv);
        }
    }

In order to read the layout from the first row, it needs to know what file it is reading from, so we add a function called setCsv(); this can be an absolute or relative path to the CSV file.

    /**
     * 
     * @param string $csv
     */
    public function setCsv($csv){
        $this->_csv = $csv;
    }

A simple usage of this would be:

$reader = new CSV_Reader();
$reader->setCsv('/path/to/file.csv');
$reader->readLayoutFromFirstRow();

We are now ready to begin looping through the rows and getting our data, this is done via a "process" command, which will turn the row into an associative array, either based on the layout we explicitly set, or the one generated by the first row.

   /**
     * 
     */
    public function process(){
        if(!$this->_init){
            $this->_init();
        }
        $line = fgetcsv($this->_handle, 4096);
        if(!$line){
            fclose($this->_handle);
            return false;
        }
        $i = 0;
        $row = array();
        foreach($this->_layout as $key){
            if(isset($line[$i])){
                $row[$key] = $line[$i];
            } else {
                $row[$key] = NULL;
            }
            $i++;
        }
        $this->_row = $row;
        return true;
    }

Using the process command we can do:

while($reader->process){
    $row = $reader->getRow();
    //our logic here
}

We can now loop through all the rows using this function, the reason we do this, instead of looping through it ahead of time and saving it to an array, is that this allows us to process extremely large files without worrying about a memory footprint, as we aren't adding to an array continuously.

We now need to add the getRow() function to return the current row, this is pretty simple:

    /**
     * Returns the current row
     * @return array
     */
    public function getRow(){
        return $this->_row;
    }

That's it, we can now use our class and quickly read large (or small) CSV files, and access the data as an associative array.

csv writer

The writer is very similar to the reader, except there are a few other tasks that it must accomplish, mainly writing a row to an output file.

We will start by showing what is the same as the CSV Reader, namely the layout. Let's start our class:

class CSV_Writer{

    protected $_handle = null; //file handle
    
    protected $_init = false; //have we initialized 
    
    protected $_output_csv = null; //path to output csv
    
    protected $_layout = array(); //our csv definition
    
    protected $_filename = ''; //filename of our CSV

};

The getters and setters for the layout are the same, except there is no readLayoutFromFirstRow function anymore. The init function has a small change:

    /**
     * Initialize CSV, open file and get it ready for reading
     * @throws Exception
     */
    protected function _init(){
        ini_set('auto_detect_line_endings', 1);
        $this->_init = true;
        $this->_handle = fopen($this->_output_csv, "w+");
        if(!$this->_handle){
            throw new Exception('Could not open file for write: ' . $this->_output_csv);
        }
    }

The handle is now opening this->_ouput_csv for writing. The "w+" will create the file if it doesn't exist.

We need a setter function for our Ouput CSV:

    /**
     * Set what our outputted CSV should be
     * @param string $filename
     * @param string $working_dir
     */
    public function setOutputCsv($filename, $working_dir){
        return $this->_output_csv = $this->_createWorkingFile($filename, $working_dir);
    }

Here we pass in the filename, such as "export.csv" and the working directory, which is an absolute or relative path where you want to write the file, such as: "/home/user/public_html/export/"

The _createWorkingFile function will actually write to the file, this ensures that we can in fact write to it, and will error and return false if it is unable to write.

    /**
     * Create a working CSV File for us to parse, modify, edit, add to etc
     * 
     * @param type $filename
     * @param type $working_dir
     * @return boolean
     */
    protected function _createWorkingFile($filename, $working_dir){
        $contents = ''; //empty file
        $timestamp = time();
        
        if(!is_dir($working_dir)){
            $result = mkdir($working_dir, 0755, true);
            if(!$result){
                throw new Exception();
            }
        }
        
        $handle = fopen($working_dir . $filename, "w+");
        if($handle){
            $this->_filename = $filename;
            fwrite($handle, $contents);
            fclose($handle);
            return $working_dir . $filename;
        }
        //log the error
        return false;
    }

This leaves us with two last functions, writeRow, and clean. writeRow will append a line on to the end of the CSV, and clean will close the file stream. You simply pass writeRow an associative array, and it will build the CSV in the correct order based on the layout you set.

    /**
     * Write row to CSV
     * @param type $row
     */
    public function writeRow($row = array()){
        if(!$this->_init){
            $this->_init();
        }

        $line = array();
        foreach($this->_layout as $key){
            if(isset($row[$key])){
                $line[$key] = $row[$key];
            } else {
                $line[$key] = NULL;
            }
        }

        $result = fputcsv($this->_handle, $line ,',', '"');
        if($result === FALSE){
            throw new Exception('Failed to write row ('.var_export($row, true).') for CSV '.$this->_filename);
        }
    }

    /**
     * Clean up our file streams, and anything that we don't need
     */
    public function clean(){
        if($this->_handle){
            fclose($this->_handle);
        }
        $this->_init = false;
    }

Usage of this class is as follows:

    $writer = new CSV_Writer();
    $writer->setLayout(array('first_name', 'last_name'));
    $writer->setOuputCsv('export.csv', '/abs/path/to/dir');
    $writer->writeRow(array('first_name' => 'Michael', 'last_name' => 'Kramer'));
    $writer->clean();

This will create a CSV called export.csv containing 1 row "Michael,Kramer"

comments powered by Disqus