Archive

Archive for the ‘PHP’ Category

JSON_ERROR_SYNTAX while decoding huge json-encoded array

2014/02/21 3 comments

It took me a lot of time to eventually find out what was throwing the JSON error known as “JSON_ERROR_SYNTAX”. I was dealing with a huge array, more than 100,000 entries and 2MB in size. The PHP script processing the data always hangs when trying to decode the big array into a PHP variable. Using PHP function json_last_error() I could find out that JSON decoder was throwing the error code 4, meaning there was a syntax error in the json string.

http://php.net/function.json_last_error

Finally I found the array entry that makes JSON go crazy:

{...
,"laptops":"26"
,null:"26"
,"cad":"26"
,...}

NULL. Yes. That’s the culprit. So, bottom line: avoid array null keys at all cost.

Here is the code I used to locate the error:


<?php
	function json_error() {
		switch(json_last_error()) {
			case JSON_ERROR_NONE:
				return '';
			break;
			case JSON_ERROR_DEPTH:
				return 'JSON_ERROR_DEPTH';
			break;
			case JSON_ERROR_STATE_MISMATCH:
				return 'JSON_ERROR_STATE_MISMATCH';
			break;
			case JSON_ERROR_CTRL_CHAR:
				return 'JSON_ERROR_CTRL_CHAR';
			break;
			case JSON_ERROR_SYNTAX:
				return 'JSON_ERROR_SYNTAX';
			break;
			case JSON_ERROR_UTF8:
				return 'JSON_ERROR_UTF8';
			break;
		}

		return 'Unknown Json error' . json_last_error();
	}

$json='{"laptops":"26"
,null:"26"
,"cad":"26"}';
echo "\nstrlen(json) = " . strlen($json);
$dummy = json_decode($json, 1);
if (json_error()) die('Json Error: ' . json_error() . "\n");
?>

Hope this helps …

PHP time function, ISO 8601

2012/10/09 1 comment

This is a PHP time function according to the output format specified by ISO 8601 (yyyy-mm-dd hh:mm:ss):

function time_iso8601() {
$timearray = getdate(time());
return sprintf("%04d-%02d-%02d %02d:%02d:%02d", $timearray['year'], $timearray['mon'], $timearray['mday'], $timearray['hours'], $timearray['minutes'], $timearray['seconds']);
}

Categories: PHP, Snippets of Code Tags: , , ,

PHP function to convert seconds into human readable format: months, days, hours, minutes, …

2012/08/25 7 comments

This is a simple PHP code that converts a number of seconds into a string according to the following format: M d h m s.

function seconds2human($ss) {
$s = $ss%60;
$m = floor(($ss%3600)/60);
$h = floor(($ss%86400)/3600);
$d = floor(($ss%2592000)/86400);
$M = floor($ss/2592000);

return "$M months, $d days, $h hours, $m minutes, $s seconds";
}

Number of months is an approx calculation supposing months are 30 days long.

Categories: PHP, Snippets of Code

NOW() function in SQLite

2012/08/14 1 comment

NOW() function does not exist in SQLite. Instead use datetime(‘NOW’,’localtime’) as in the example:

<?prepare("SELECT datetime('NOW','localtime') AS now");
if (!$result) { echo "\nError SQLite::errorInfo(): "; print_r($dbsqlite->errorInfo()); }
$result->execute();
$data = $result->fetchAll();
$now_SQLite = $data[0][0];
$dbsqlite = NULL;
echo "\nSQLite Time: " . $now_SQLite;
?>

Categories: PHP, Snippets of Code Tags: , , , ,

How to install PHP 6 on CentOS 6

2011/11/08 Leave a comment

How to install PHP6 on CentOS6: These are the Linux commands required to install latest PHP source code into a CentOS 6 LAMP server, and several useful libraries as: GD, Curl, Tidy, JSON, SQLite, PSpell, …


yum install wget
wget "http://snaps.php.net/php-trunk-201111071530.tar.bz2"
tar -jxvf php-trunk-201111071530.tar.bz2
cd php-trunk-201111071530
yum groupinstall "Development Tools"
yum install libxml2 libxml2-devel
yum install httpd-devel
yum install pcre pcre-devel
yum install bzip2 bzip2-devel
yum install gmp gmp-devel
yum install tidy libtidy libtidy-devel
yum install curl libcurl libcurl-devel
yum install libjpeg libjpeg-devel
yum install libpng libpng-devel
yum install libXpm libXpm-devel
yum install freetype freetype-devel
yum install aspell aspell-devel

./configure –build=x86_64-unknown-linux-gnu –host=x86_64-unknown-linux-gnu –target=x86_64-redhat-linux-gnu –program-prefix= –prefix=/usr –exec-prefix=/usr –bindir=/usr/bin –sbindir=/usr/sbin –sysconfdir=/etc –datadir=/usr/share –includedir=/usr/include –libdir=/usr/lib64 –libexecdir=/usr/libexec –localstatedir=/var –sharedstatedir=/var/lib –mandir=/usr/share/man –infodir=/usr/share/info –cache-file=../config.cache –with-libdir=lib64 –with-config-file-path=/etc –with-config-file-scan-dir=/etc/php.d –disable-debug –with-pic –disable-rpath –without-pear –with-bz2 –with-exec-dir=/usr/bin –with-freetype-dir=/usr –with-png-dir=/usr –with-xpm-dir=/usr –enable-gd-native-ttf –without-gdbm –with-gettext –with-gmp –with-iconv –with-jpeg-dir=/usr –with-openssl –with-pcre-regex=/usr –with-zlib –with-layout=GNU –enable-exif –enable-ftp –enable-magic-quotes –enable-sockets –enable-sysvsem –enable-sysvshm –enable-sysvmsg –with-kerberos –enable-ucd-snmp-hack –enable-shmop –enable-calendar –with-sqlite3 –with-libxml-dir=/usr –enable-xml –with-system-tzdata –with-apxs2=/usr/sbin/apxs –with-mysql –with-gd –disable-dom –disable-dba –without-unixODBC –disable-pdo –disable-xmlreader –disable-xmlwriter –disable-phar –disable-fileinfo –enable-json –with-pspell –disable-wddx –with-curl –disable-posix –disable-sysvmsg –disable-sysvshm –disable-sysvsem –enable-mbstring –with-mysqli –with-tidy

make
make install
sed -i ‘/^LoadModule php5_module/ s/^/#/’ /etc/httpd/conf/httpd.conf
service httpd restart

A Web Search Engine, SQL or NOSQL?

2011/08/23 1 comment

I am trying to build a Web Search Engine with modest resources and targets. Just using one dedicated server running on LAMP environment I would like to reach the target of indexing 5 million web pages, limited to one language (Spanish), and deploying search results in less than 0.1 seconds.

Probably the most important strategy to consider in this enterprise is how to store the information. Search results speed and insertion times depend on it. I find there are two main options to consider: An SQL engine and a NOSQL engine.

In either case I am implementing both on MySQL. In the first case relational structure of MySQL is in use, meanwhile in second case I use MySQL simply as an store engine, capable of saving records and serving them to a client.

In the first case, SQL orthodoxy tells us to create a table for storing pages, another one for storing keywords and finally a big one to store the instance of a keyword on a page. We may call that table as “keypag”. Considering we are indexing 5 million web pages that contain an average of 100 distinct words each one, it means a cardinality of 500 millions for keypag.

In the second case, NOSQL schema, we just need one table called “keywords” with a row for every possible keyword on the corpus (the set of crawled web pages). We may apply here an estimation of cardinality for this table around 1 million rows, taking into account than we are dealing with just one language, and filtering contents through a dictionary before indexing. Table keywords has a big field (around 60 Kbytes long) that contains the ordered matches in web pages for that keyword. Around 100 URLs and snippets can be stored on that field.

In SQL schema, a search means an SQL query returning between 10 and 100 rows from a table “keypag” that has a cardinality around 1 billion. In NOSQL schema, a search means an SQL query returning just one row from a table “keyword” that has a cardinality around 1 million.

So, theoretically we could expect best search performance in NOSQL than in SQL schema. Doing some benchmarks, we may proof that to be correct. Let’s see the resulting graphs:

* SQL performance (someone should reach 1 billion rows if patient enough. I stopped at 11 millions):

* NOSQL performance:

Pink dots represent the time it takes to MySQL to resolve a search query in miliseconds. Blue dots represent how may inserts per second is capable of doing MySQL.

Getting snippets as search results in PHP

2011/08/15 Leave a comment

The following PHP code is a function that extracts snippets of text from a corpus, matching a given keyword and highlighting it with an HTML bold tag:

function get_snippet($keyword, $txt) {
  $snippet='';
  $span = 15;
  preg_match_all("#(\W.{0,$span}\W)($keyword)(\W.{0,$span}\W)#i", "  $txt  ", $matches);
  foreach($matches[0] as $match) {
    if (!$match = trim($match)) continue;
    if (isset($snippet)) $snippet .= "$match..."; else $snippet = "...$match...";
  }
  $snippet = preg_replace("#($keyword)#i", '<b>$1</b>', $snippet);
  return $snippet;
}

Applying to a text like this:

Since its launch in November 2007, Android has not only dramatically increased consumer choice but also
improved the entire mobile experience for users. Today, more than 150 million Android devices have been
activated worldwide with over 550,000 devices now lit up every day through a network of about 39 manufacturers
and 231 carriers in 123 countries. Given Android’s phenomenal success, we are always looking for new ways to
supercharge the Android ecosystem. That is why I am so excited today to announce that we have agreed to acquire
Motorola.

Would throw an output like this, when searching for “Android” keyword:

...November 2007, Android has not only...150 million Android devices have....
Given Android’s phenomenal...supercharge the Android ecosystem. That...