PHP: Parse and Retrieve Data from XLSX Files


  • SPYRO KiD
  • admin[~@t~]spyrozone[~d.t~]net
  • Monday, May 9th, 2011
  • CopyLEFT (c) 2011++ www.spyrozone.net All Rights Reserved


Sebelumnya, rekan-rekan saya sudah sangat terbiasa untuk parsing data dari file MS Excel 2003 (.xls) dengan PHP. Seiring semakin banyaknya pengguna MS Office 2007, kini permintaan client juga ikut-ikutan meningkat :D Mereka jadi malas untuk menyimpan dalam format Office 2003, dan meminta rekan-rekan saya untuk upgrade sistem informasi yang kami bangun agar bisa membaca file *.xlsx ^^”

Untungnya ada orang baik hati bernama Sergey Shuchkin dari Rusia membuat sebuah class “simplexlsx.class.php” untuk membantu parsing data dari MS Excel 2007 (.xlsx). Fungsi yang dimiliki memang tidak terlalu lengkap, namun sudah sangat membantu. Saya telah menambahkan fungsi sederhana untuk mengambil nama Worksheet dimana fungsi tersebut tidak dimiliki versi aslinya.

Berikut adalah contoh hasil parsing file *.xlsx dengan class tersebut ;)

File Excel yang ingin dibaca:

 

{image: XLSX File}

XLSX File

Hasil parsing dalam format tabel:

 

{image: Hasil parsing file XLSX dalam tabel}

Hasil parsing file XLSX dalam tabel

Hasil parsing dalam array:

 

{image: Hasil parsing file XLSX dalam array}

Hasil parsing file XLSX dalam array

Source Code

simplexlsx.class.php

<?php
// SimpleXLSX php class v0.4
// MS Excel 2007 workbooks reader
// Example:
//   $xlsx = new SimpleXLSX('book.xlsx');
//   print_r( $xlsx->rows() );
// Example 2:
//   $xlsx = new SimpleXLSX('book.xlsx');
//   print_r( $xlsx->rowsEx() );
// Example 3:
//   $xlsx = new SimpleXLSX('book.xlsx');
//   print_r( $xlsx->rows(2) ); // second worksheet
//
// 0.4 sheets(), sheetsCount(), unixstamp( $excelDateTime ), getWorksheetName() by SPYRO KiD - http://www.spyrozone.net
// 0.3 - fixed empty cells (Gonzo patch)

class SimpleXLSX {
 // Don't remove this string! Created by Sergey Schuchkin from http://www.sibvison.ru - professional php developers team 2010-2011
 private $sheets;
 private $hyperlinks;
 private $package;
 private $sharedstrings;
 // scheme
 const SCHEMA_OFFICEDOCUMENT  =  'http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument';
 const SCHEMA_RELATIONSHIP  =  'http://schemas.openxmlformats.org/package/2006/relationships';
 const SCHEMA_SHAREDSTRINGS =  'http://schemas.openxmlformats.org/officeDocument/2006/relationships/sharedStrings';
 const SCHEMA_WORKSHEETRELATION =  'http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet';

 function __construct( $filename ) {
 $this->_unzip( $filename );
 $this->_parse();
 }
 function sheets() {
 return $this->sheets;
 }
 function sheetsCount() {
 return count($this->sheets);
 }
 function worksheet( $worksheet_id ) {
 if ( isset( $this->sheets[ $worksheet_id ] ) ) {
 $ws = $this->sheets[ $worksheet_id ];

 if (isset($ws->hyperlinks)) {
 $this->hyperlinks = array();
 foreach( $ws->hyperlinks->hyperlink as $hyperlink ) {
 $this->hyperlinks[ (string) $hyperlink['ref'] ] = (string) $hyperlink['display'];
 }
 }

 return $ws;
 } else
 throw new Exception('Worksheet '.$worksheet_id.' not found.');
 }
 function dimension( $worksheet_id = 1 ) {
 $ws = $this->worksheet($worksheet_id);
 $ref = (string) $ws->dimension['ref'];
 $d = explode(':', $ref);
 $index = $this->_columnIndex( $d[1] );       
 return array( $index[0]+1, $index[1]+1);
 }
 // sheets numeration: 1,2,3....
 function rows( $worksheet_id = 1 ) {

 $ws = $this->worksheet( $worksheet_id);

 $rows = array();
 $curR = 0;

 foreach ($ws->sheetData->row as $row) {

 foreach ($row->c as $c) {
 list($curC,) = $this->_columnIndex((string) $c['r']);
 $rows[ $curR ][ $curC ] = $this->value($c);
 }

 $curR++;
 }
 return $rows;
 }
 function rowsEx( $worksheet_id = 1 ) {
 $rows = array();
 $curR = 0;
 if (($ws = $this->worksheet( $worksheet_id)) === false)
 return false;
 foreach ($ws->sheetData->row as $row) {

 foreach ($row->c as $c) {
 list($curC,) = $this->_columnIndex((string) $c['r']);
 $rows[ $curR ][ $curC ] = array(
 'name' => (string) $c['r'],
 'value' => $this->value($c),
 'href' => $this->href( $c ),
 );
 }
 $curR++;
 }
 return $rows;

 }
 // thx Gonzo
 function _columnIndex( $cell = 'A1' ) {

 if (preg_match("/([A-Z]+)(\d+)/", $cell, $matches)) {

 $col = $matches[1];
 $row = $matches[2];

 $colLen = strlen($col);
 $index = 0;

 for ($i = $colLen-1; $i >= 0; $i--)
 $index += (ord($col{$i}) - 64) * pow(26, $colLen-$i-1);

 return array($index-1, $row-1);
 } else
 throw new Exception("Invalid cell index.");
 }
 function value( $cell ) {
 // Determine data type
 $dataType = (string)$cell["t"];
 switch ($dataType) {
 case "s":
 // Value is a shared string
 if ((string)$cell->v != '') {
 $value = $this->sharedstrings[intval($cell->v)];
 } else {
 $value = '';
 }

 break;

 case "b":
 // Value is boolean
 $value = (string)$cell->v;
 if ($value == '0') {
 $value = false;
 } else if ($value == '1') {
 $value = true;
 } else {
 $value = (bool)$cell->v;
 }

 break;

 case "inlineStr":
 // Value is rich text inline
 $value = $this->_parseRichText($cell->is);

 break;

 case "e":
 // Value is an error message
 if ((string)$cell->v != '') {
 $value = (string)$cell->v;
 } else {
 $value = '';
 }

 break;

 default:
 // Value is a string
 $value = (string)$cell->v;

 // Check for numeric values
 if (is_numeric($value) && $dataType != 's') {
 if ($value == (int)$value) $value = (int)$value;
 elseif ($value == (float)$value) $value = (float)$value;
 elseif ($value == (double)$value) $value = (double)$value;
 }
 }
 return $value;
 }
 function href( $cell ) {
 return isset( $this->hyperlinks[ (string) $cell['r'] ] ) ? $this->hyperlinks[ (string) $cell['r'] ] : '';
 }
 function _unzip( $filename ) {
 // Clear current file
 $this->datasec = array();

 // Package information
 $this->package = array(
 'filename' => $filename,
 'mtime' => filemtime( $filename ),
 'size' => filesize( $filename ),
 'comment' => '',
 'entries' => array()
 );
 // Read file
 $oF = fopen($filename, 'rb');
 $vZ = fread($oF, $this->package['size']);
 fclose($oF);
 // Cut end of central directory
 $aE = explode("\x50\x4b\x05\x06", $vZ);

 // Normal way
 $aP = unpack('x16/v1CL', $aE[1]);
 $this->package['comment'] = substr($aE[1], 18, $aP['CL']);

 // Translates end of line from other operating systems
 $this->package['comment'] = strtr($this->package['comment'], array("\r\n" => "\n", "\r" => "\n"));

 // Cut the entries from the central directory
 $aE = explode("\x50\x4b\x01\x02", $vZ);
 // Explode to each part
 $aE = explode("\x50\x4b\x03\x04", $aE[0]);
 // Shift out spanning signature or empty entry
 array_shift($aE);

 // Loop through the entries
 foreach ($aE as $vZ) {
 $aI = array();
 $aI['E']  = 0;
 $aI['EM'] = '';
 // Retrieving local file header information
//            $aP = unpack('v1VN/v1GPF/v1CM/v1FT/v1FD/V1CRC/V1CS/V1UCS/v1FNL', $vZ);
 $aP = unpack('v1VN/v1GPF/v1CM/v1FT/v1FD/V1CRC/V1CS/V1UCS/v1FNL/v1EFL', $vZ);
 // Check if data is encrypted
//            $bE = ($aP['GPF'] && 0x0001) ? TRUE : FALSE;
 $bE = false;
 $nF = $aP['FNL'];
 $mF = $aP['EFL'];

 // Special case : value block after the compressed data
 if ($aP['GPF'] & 0x0008) {
 $aP1 = unpack('V1CRC/V1CS/V1UCS', substr($vZ, -12));

 $aP['CRC'] = $aP1['CRC'];
 $aP['CS']  = $aP1['CS'];
 $aP['UCS'] = $aP1['UCS'];

 $vZ = substr($vZ, 0, -12);
 }

 // Getting stored filename
 $aI['N'] = substr($vZ, 26, $nF);
 if (substr($aI['N'], -1) == '/') {
 // is a directory entry - will be skipped
 continue;
 }

 // Truncate full filename in path and filename
 $aI['P'] = dirname($aI['N']);
 $aI['P'] = $aI['P'] == '.' ? '' : $aI['P'];
 $aI['N'] = basename($aI['N']);

 $vZ = substr($vZ, 26 + $nF + $mF);

 if (strlen($vZ) != $aP['CS']) {
 $aI['E']  = 1;
 $aI['EM'] = 'Compressed size is not equal with the value in header information.';
 } else {
 if ($bE) {
 $aI['E']  = 5;
 $aI['EM'] = 'File is encrypted, which is not supported from this class.';
 } else {
 switch($aP['CM']) {
 case 0: // Stored
 // Here is nothing to do, the file ist flat.
 break;
 case 8: // Deflated
 $vZ = gzinflate($vZ);
 break;
 case 12: // BZIP2
 if (! extension_loaded('bz2')) {
 if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') {
 @dl('php_bz2.dll');
 } else {
 @dl('bz2.so');
 }
 }
 if (extension_loaded('bz2')) {
 $vZ = bzdecompress($vZ);
 } else {
 $aI['E']  = 7;
 $aI['EM'] = "PHP BZIP2 extension not available.";
 }
 break;
 default:
 $aI['E']  = 6;
 $aI['EM'] = "De-/Compression method {$aP['CM']} is not supported.";
 }
 if (! $aI['E']) {
 if ($vZ === FALSE) {
 $aI['E']  = 2;
 $aI['EM'] = 'Decompression of data failed.';
 } else {
 if (strlen($vZ) != $aP['UCS']) {
 $aI['E']  = 3;
 $aI['EM'] = 'Uncompressed size is not equal with the value in header information.';
 } else {
 if (crc32($vZ) != $aP['CRC']) {
 $aI['E']  = 4;
 $aI['EM'] = 'CRC32 checksum is not equal with the value in header information.';
 }
 }
 }
 }
 }
 }

 $aI['D'] = $vZ;

 // DOS to UNIX timestamp
 $aI['T'] = mktime(($aP['FT']  & 0xf800) >> 11,
 ($aP['FT']  & 0x07e0) >>  5,
 ($aP['FT']  & 0x001f) <<  1,
 ($aP['FD']  & 0x01e0) >>  5,
 ($aP['FD']  & 0x001f),
 (($aP['FD'] & 0xfe00) >>  9) + 1980);

 //$this->Entries[] = &new SimpleUnzipEntry($aI);
 $this->package['entries'][] = array(
 'data' => $aI['D'],
 'error' => $aI['E'],
 'error_msg' => $aI['EM'],
 'name' => $aI['N'],
 'path' => $aI['P'],
 'time' => $aI['T']
 );

 } // end for each entries
 }
 function getPackage() {
 return $this->package;
 }
 function getEntryData( $name ) {
 $dir = dirname( $name );
 $name = basename( $name );
 foreach( $this->package['entries'] as $entry)
 if ( $entry['path'] == $dir && $entry['name'] == $name)
 return $entry['data'];
 }
 function unixstamp( $excelDateTime ) {
 $d = floor( $excelDateTime ); // seconds since 1900
 $t = $excelDateTime - $d;
 return ($d > 0) ? ( $d - 25569 ) * 86400 + $t * 86400 : $t * 86400;
 }
 function _parse() {
 // Document data holders
 $this->sharedstrings = array();
 $this->sheets = array();

 // Read relations and search for officeDocument

 $relations = simplexml_load_string( $this->getEntryData("_rels/.rels") );
 foreach ($relations->Relationship as $rel) {
 if ($rel["Type"] == SimpleXLSX::SCHEMA_OFFICEDOCUMENT) {
 // Found office document! Read relations for workbook...
 $workbookRelations = simplexml_load_string($this->getEntryData( dirname($rel["Target"]) . "/_rels/" . basename($rel["Target"]) . ".rels") );
 $workbookRelations->registerXPathNamespace("rel", SimpleXLSX::SCHEMA_RELATIONSHIP);

 // Read shared strings
 $sharedStringsPath = $workbookRelations->xpath("rel:Relationship[@Type='" . SimpleXLSX::SCHEMA_SHAREDSTRINGS . "']");
 $sharedStringsPath = (string)$sharedStringsPath[0]['Target'];             
 $xmlStrings = simplexml_load_string($this->getEntryData( dirname($rel["Target"]) . "/" . $sharedStringsPath) );  
 if (isset($xmlStrings) && isset($xmlStrings->si)) {
 foreach ($xmlStrings->si as $val) {
 if (isset($val->t)) {
 $this->sharedstrings[] = (string)$val->t;
 } elseif (isset($val->r)) {
 $this->sharedstrings[] = $this->_parseRichText($val);
 }
 }
 }

 // Loop relations for workbook and extract sheets...
 foreach ($workbookRelations->Relationship as $workbookRelation) {
 if ($workbookRelation["Type"] == SimpleXLSX::SCHEMA_WORKSHEETRELATION) {
 $this->sheets[ str_replace( 'rId', '', (string) $workbookRelation["Id"]) ] =
 simplexml_load_string( $this->getEntryData( dirname($rel["Target"]) . "/" . dirname($workbookRelation["Target"]) . "/" . basename($workbookRelation["Target"])) );
 }
 }

 break;
 }
 }

 // Sort sheets
 ksort($this->sheets);
 }
 private function _parseRichText($is = null) {
 $value = array();

 if (isset($is->t)) {
 $value[] = (string)$is->t;
 } else {
 foreach ($is->r as $run) {
 $value[] = (string)$run->t;
 }
 }

 return implode(' ', $value);
 }
 function getWorksheetName($dimId = 0){
 $worksheetName = array();
 $xmlWorkBook = simplexml_load_string( $this->getEntryData("xl/workbook.xml") );
 if($dimId==0){
 foreach ($xmlWorkBook->sheets->sheet as $sheetName) {
 $worksheetName[] = $sheetName['name'];
 }
 }else{
 $worksheetName[] = $xmlWorkBook->sheets->sheet[$dimId-1]->attributes()->name;
 }
 return $worksheetName;
 }
}
?>

index.php

<html>
<head><title>PHP: Parse and Retrieve Data from XLSX Files | SPYROZONE.NET</title>
<style>
<!---
a{text-decoration:none;color:#0000FF;}
a:hover{text-decoration:underline;color:#0000FF;}
a:visited{text-decoration:none;color:#0000FF;}
a:active{text-decoration:none;position: relative;top: 1px;}
h1{font-family: "Arial","Verdana","Lucida Sans Unicode"; font-size: 16pt; line-height:150%; margin-top:20; margin-bottom:0 ;text-align:center;}
h2{font-family: "Arial","Verdana","Lucida Sans Unicode"; font-size: 12pt; font-weight: bold; text-decoration: underline; line-height:150%; margin-top:40; margin-bottom:10 }
h3{font-family: "Arial","Verdana","Lucida Sans Unicode"; font-size: 11pt; line-height:150%; margin-top:20; margin-bottom:0}
ul{font-family: "Arial","Verdana","Lucida Sans Unicode"; font-size: 11pt;word-spacing: 0; line-height: 150%; margin-top: 0; margin-bottom: 0}
p{font-family: "Arial","Verdana","Lucida Sans Unicode"; font-size: 11pt }
#datacontent{margin-left:20pt}
#footer{ font-family: "Arial","Verdana","Lucida Sans Unicode"; font-size: 10pt; line-height:150%; margin-top:40; margin-bottom:0;text-align:center}
#xlsxTable{font-family: "Arial","Verdana","Lucida Sans Unicode";font-size: 11pt;margin: 15px;text-align: left;border-collapse: collapse;}
#xlsxTable th{padding: 8px;font-weight: normal;font-size: 13px;color: #039;background: #b9c9fe;}
#xlsxTable td{padding: 8px;background: #e8edff;border-top: 1px solid #fff;color: #669;}
#xlsxTable tbody tr:hover td{background: #d0dafd;}
//--->
</style>
</head>
<body>
<h1>PHP: Parse and Retrieve Data from XLSX Files</h1>
<div id="datacontent">
<h2>Upload *.xlsx File:</h1>
<form method="post" enctype="multipart/form-data">
<p>File: <input type="file" name="file"  /><input type="submit" value="Parse" /></p>
</form>
<h2>Example XLSX File:</h1>
<ul><li><a href="example-data.xlsx">example-data.xlsx</a></li></ul>
<h2>Download the source:</h1>
<ul><li><a href="Parse-and-Retrieve-Data-from-XLSX-Files.zip">Parse-and-Retrieve-Data-from-XLSX-Files.zip</a></li></ul>
<h2>About:</h1>
<p>Parse and Retrieve Data from XLSX Files based on "simplexlsx.class.php" by Sergey Shuchkin under <a rel="nofollow" href="http://www.opensource.org/licenses/artistic-license.html">Artistic License</a>. I've added function getWorksheetName() to get worksheet name.</p>
</div>
<?php
if((!empty($_FILES["file"])) && ($_FILES['file']['error'] == 0)) {

 $limitSize    = 15000; //(15 kb) - Maximum size of uploaded file, change it to any size you want
 $fileName    = basename($_FILES['file']['name']);
 $fileSize    = $_FILES["file"]["size"];
 $fileExt    = substr($fileName, strrpos($fileName, '.') + 1);

 if (($fileExt == "xlsx") && ($fileSize < $limitSize)) {

 require_once "simplexlsx.class.php";
 $getWorksheetName = array();
 $xlsx = new SimpleXLSX( $_FILES['file']['tmp_name'] );
 $getWorksheetName = $xlsx->getWorksheetName();
 echo '    <hr>
 <div id="datacontent">
 <h1>Result</h1>
 ';
 echo '<h2>File Info:</h1><ul>';
 echo '<li><b>File Name:</b> '.$fileName.'</li>';
 echo '<li><b>File Size:</b> '.($fileSize/1000).' kb</li></li>';
 echo '</ul>

 <h2>Worksheets:</h1><ul>';
 foreach ($getWorksheetName as $value) {
 echo '<li>'.$value.'</li>';
 }
 echo '</ul>';

 echo '<h2>Display data in table format:</h2>
 <div id="datacontent">';
 for($j=1;$j <= $xlsx->sheetsCount();$j++){
 echo '<h3>Worksheet Name: '.$getWorksheetName[$j-1].'</h1>';
 echo '<table id="xlsxTable">';
 list($cols,) = $xlsx->dimension($j);
 //Prepare table
 foreach( $xlsx->rows($j) as $k => $r) {
 if ($k == 0){
 $trOpen        = '<th';
 $trClose    = '</th>';
 $tbOpen        = '<thead>';
 $tbClose    = '</thead>';
 }else{
 $trOpen        = '<td';
 $trClose    = '</td>';
 $tbOpen        = '<tbody>';
 $tbClose    = '</tbody>';
 }
 echo $tbOpen;
 echo '<tr>';
 for( $i = 0; $i < $cols; $i++)
 //Display data
 echo $trOpen.'>'.( (isset($r[$i])) ? $r[$i] : '&nbsp;' ).$trClose;
 echo '</tr>';
 echo $tbClose;
 }
 echo '</table>';
 }
 echo '</div>

 <h2>Display as Array:</h2>
 <div id="datacontent" style="overflow: auto; height: 400px; width: 550px; border: 1px #008080 solid;">';
 echo '<h3>$xlsx->getWorksheetName()</h1>';
 echo '<pre>';
 print_r($xlsx->getWorksheetName());
 echo '</pre>';
 for($j=1;$j <= $xlsx->sheetsCount();$j++){
 echo '<h3>$xlsx->rows('.$j.')</h1>';
 echo '<pre>';
 print_r( $xlsx->rows($j) );
 echo '</pre>';
 echo '<h3>$xlsx->rowsEx('.$j.')</h1>';
 echo '<pre>';
 print_r( $xlsx->rowsEx($j) );
 echo '</pre>';
 }
 echo '    </div>
 </div>
 <hr>';
 }else{
 echo '<script>alert("Sory, this demo page only allowed .xlsx file under '.($limitSize/1000).' Kb!\nIf you want to try upload larger file, please download the source and try it on your own webserver.")</script>';
 }
}
?>
<div id="footer">&copy; 2011++ SPYRO KiD | http://www.spyrozone.net<br>All Rights Reserved</div>
</body>
</html>

Example Spreadsheet

example-data.xlsx

Demo & Download

Silahkan coba sendiri secara online atau download class, file contoh penggunaan beserta contoh file xlsx-nya pada halaman project:

http://www.spyrozone.net/project/xlsx-parser/


//E.O.F