Where can I find the algorithm used to write each PHP "built-in" function? -
i built php-based application typically requires several (>10) seconds parse target string (>10 seconds because there many thousands of checks on typically 100kb+ string). looking ways reduce execution time.
i started wonder how each of php's "built-in" functions written. example, if go strpos()
reference in manual (this link), there lot of info not algorithm.
who knows, maybe can write function faster built-in function particular application? have no way of knowing algorithm e.g. strpos(). algorithm use method such one:
function strposhypothetical($haystack, $needle) { $haystacklength = strlen($haystack); $needlelength = strlen($needle);//for question let's assume > 0 $pos = false; for($i = 0; $i < $haystacklength; $i++) { for($j = 0; $j < $needlelength; $j++) { $thissum = $i + $j; if (($thissum > $haystacklength) || ($needle[$j] !== $haystack[$thissum])) break; } if ($j === $needlelength) { $pos = $i; break; } } return $pos; }
or use slower method, let's combination of substr_count() occurrences of needle, , if occurrences > 0, loop, or other method?
i have profiled functions , methods in application , made significant progress in way. also, note this post doesn't much. can find out algorithm used each built-in function in php, or information proprietary?
the built-in php functions can found in /ext/standard/ in php source code.
in case of strpos
, can find php implementation in /ext/standard/string.c. @ core, function uses php_memnstr
, alias of zend_memnstr
:
found = (char*)php_memnstr(zstr_val(haystack) + offset, z_strval_p(needle), z_strlen_p(needle), zstr_val(haystack) + zstr_len(haystack));
and if read source of zend_memnstr
, can find algorithm used implement strpos
:
while (p <= end) { if ((p = (const char *)memchr(p, *needle, (end-p+1))) && ne == p[needle_len-1]) { if (!memcmp(needle, p, needle_len-1)) { return p; } } if (p == null) { return null; } p++; }
ne
here represents last character of needle
, , p
pointer incremented scan through haystack
.
the function memchr
c function should simple linear search through sequence of bytes find first occurrence of given byte / character in string of bytes. memcmp
c function compares 2 byte / character ranges can within strings comparing them byte-by-byte.
a pseudo-code version of function follows:
while (p <= end) { find next occurrence of first character of needle; if (occurrence found) { set `p` point new location in string; if ((character @ `p` + `length of needle`) == last character of needle) { if ((next `length of needle` characters after `p`) == needle) { return p; // found position `p` of needle in haystack! } } } else { return null; // needle not exist in haystack. } p++; }
this efficient algorithm finding index of substring in string. pretty same algorithm strposhypothetical
, , should efficient complexity-wise, unless memcpy
doesn't return sees strings differ 1 character, , of course, being implemented in c, leaner , faster.
Comments
Post a Comment