excel - On making MATCH function like FIND function -
i'm trying make match function work find function. first of all, generate dummy data use testing. here routine use:
sub data_generator() randomize dim data(1 100000, 1 1) dim p single = 1 100000 p = rnd() if p < 0.4 data(i, 1) = "a" elseif p >= 0.4 , p <= 0.7 data(i, 1) = "b" else data(i, 1) = "c" end if next range("a1:a100000") = data end sub
now, create sub-routine find string a in range data
. there 2 methods use here employ match function. first method reset range of lookup array following code:
sub find_match_1() t0 = timer dim long, j long, k long, data range dim output(1 100000, 1 1) on error goto finish set data = range(cells(j + 1, 1), "a100000") 'reset range of lookup array = worksheetfunction.match("a", data, 0) j = j + output(j, 1) = j 'label position of k = k + 1 'counting number of [a] found loop finish: range("b1:b100000") = output inputbox "the number of [a] found " & k & " in", "process complete", timer - t0 end sub
and second method, assign cell of range a located value vbnullstring
instead of resetting range("a1:a100000")
. idea delete string a after being found , expect match function find next string a in range("a1:a100000")
. here code implement second method:
sub find_match_2() t0 = timer dim n long, long, j long dim data_store() dim output(1 100000, 1 1) data_store = range("a1:a100000") on error goto finish j = worksheetfunction.match("a", range("a1:a100000"), 0) output(j, 1) = j cells(j, 1) = vbnullstring n = n + 1 loop finish: range("a1:a100000") = data_store range("b1:b100000") = output inputbox "the number of [a] found " & n & " in", "process complete", timer - t0 end sub
the goal determine method better @ employing match function in performance. turns out first method completes less 0.4 seconds meanwhile second method completes minute on pc. questions are:
- why second method take time long complete?
- how 1 improve performance of second method?
- can match function used in array?
i agree more of code review question, chose own curiosity, i'll share found.
i think you're hitting classic case of n vs n^2 computational complexity. @ 2 methods, seem remarkably similar, , consider they're doing, keeping in mind match function linear search when use match_type=0 (because data unsorted, whereas other match types binary search on sorted data).
method 1:
- start @ a1
- continue down range until "a" found
- restart @ cell below match
method 2:
- start @ a1
- continue down range until "a" found
- clear "a"
- restart @ a1
it should instantly apparent while 1 method continually shrinking range searches, other starting @ first cell , searching whole range. account of speedup, , boosts method 1 nice lead, it's not full story.
the real key lies in amount of work match has each situation. because range shrinks , moves start further down list, whichever cell method 1's match starts from, has search small number of cells before hits , resumes outer loop. meanwhile, method 2 continually destroying a's, making them less , less dense , forcing search more , more of range before getting hits. end, method 2 looping through 100,000 empty cells/b's/c's before finding next a.
so on average, match method 1 looking through couple of cells each time, while match method 2 taking longer , longer time goes on, until end when forced loop through entire range. on top of that, method 2 doing bunch of writes cell values, slower might think when have tens of thousands of times.
in honesty, best bet loop through cells once, looking a's , handling them go. match brings no advantage table, , method 1 more complicated version of loop described.
i'd write like:
sub find_match_3() t0 = timer dim k long, r range dim output(1 100000, 1 1) each r in range("a1:a100000").cells if r.value = "a" output(r.row, 1) = r.row 'label position of k = k + 1 'counting number of [a] found end if next range("b1:b100000") = output inputbox "the number of [a] found " & k & " in", "process complete", timer - t0 end sub
which 30% faster on machine.
Comments
Post a Comment