Where can I find the algorithm used to write each PHP "built-in" function? -

January 15, 2010

i built php-based application typically requires several (>10) seconds parse target string (>10 seconds because there many thousands of checks on typically 100kb+ string). looking ways reduce execution time.

i started wonder how each of php's "built-in" functions written. example, if go strpos() reference in manual (this link), there lot of info not algorithm.

who knows, maybe can write function faster built-in function particular application? have no way of knowing algorithm e.g. strpos(). algorithm use method such one:

function strposhypothetical($haystack, $needle) {      $haystacklength = strlen($haystack);     $needlelength   = strlen($needle);//for question let's assume > 0      $pos = false;      for($i = 0; $i < $haystacklength; $i++) {         for($j = 0; $j < $needlelength; $j++) {             $thissum = $i + $j;             if (($thissum > $haystacklength) || ($needle[$j] !== $haystack[$thissum])) break;                   }         if ($j === $needlelength) {             $pos = $i;             break;         }     }     return $pos; }

or use slower method, let's combination of substr_count() occurrences of needle, , if occurrences > 0, loop, or other method?

i have profiled functions , methods in application , made significant progress in way. also, note this post doesn't much. can find out algorithm used each built-in function in php, or information proprietary?

the built-in php functions can found in /ext/standard/ in php source code.

in case of strpos, can find php implementation in /ext/standard/string.c. @ core, function uses php_memnstr, alias of zend_memnstr:

found = (char*)php_memnstr(zstr_val(haystack) + offset,                            z_strval_p(needle),                            z_strlen_p(needle),                            zstr_val(haystack) + zstr_len(haystack));

and if read source of zend_memnstr, can find algorithm used implement strpos:

while (p <= end) {     if ((p = (const char *)memchr(p, *needle, (end-p+1))) && ne == p[needle_len-1]) {         if (!memcmp(needle, p, needle_len-1)) {             return p;         }     }      if (p == null) {         return null;     }     p++; }

ne here represents last character of needle, , p pointer incremented scan through haystack.

the function memchr c function should simple linear search through sequence of bytes find first occurrence of given byte / character in string of bytes. memcmp c function compares 2 byte / character ranges can within strings comparing them byte-by-byte.

a pseudo-code version of function follows:

while (p <= end) {     find next occurrence of first character of needle;     if (occurrence found) {         set `p` point new location in string;         if ((character @ `p` + `length of needle`) == last character of needle) {             if ((next `length of needle` characters after `p`) == needle) {                 return p; // found position `p` of needle in haystack!             }         }     } else {         return null; // needle not exist in haystack.     }     p++; }

this efficient algorithm finding index of substring in string. pretty same algorithm strposhypothetical, , should efficient complexity-wise, unless memcpy doesn't return sees strings differ 1 character, , of course, being implemented in c, leaner , faster.

Search This Blog

Perl

Where can I find the algorithm used to write each PHP "built-in" function? -

Comments

Post a Comment

Popular posts from this blog

jOOQ update returning clause with Oracle -

java - Warning equals/hashCode on @Data annotation lombok with inheritance -

java - BasicPathUsageException: Cannot join to attribute of basic type -