Knuth–Morris–Pratt (KMP)

When people first learn substring search, they usually write something like this:

  • Try matching the pattern at index 0
  • On mismatch, shift by one
  • Start over
  • Repeat

That works but in the worst case it’s O(n × m).

KMP is different.

It achieves O(n + m) by ensuring the text index never moves backward.

Let’s unpack how.


The Core Insight

When a mismatch happens, naive search throws away information.

KMP doesn’t.

Instead of restarting from scratch, it asks:

“Given what I already know matched, what is the longest prefix of the pattern that is also a suffix of what I just matched?”

That information is precomputed in something called the:

LPS Array

(Longest Prefix that is also a Suffix)

Also called:

  • prefix function
  • failure function
  • partial match table

What Is LPS?

For every index i in the pattern:

lps[i] = length of the longest proper prefix of pattern[0..i]
         that is also a suffix of pattern[0..i]
  • “Proper” means not the whole substring.
  • It tells us how far we can jump back in the pattern.

Visual Intuition

Let’s build the LPS table for:

Pattern: "ababaca"
Index:     0 1 2 3 4 5 6
Chars:     a b a b a c a

We build it left to right.


Step-by-step LPS Construction

i = 0
Substring: "a"
No proper prefix/suffix match.
lps[0] = 0
i = 1
Substring: "ab"
Prefixes:  a
Suffixes:  b
No match.
lps[1] = 0
i = 2
Substring: "aba"
Prefixes:  a, ab
Suffixes:  a, ba
Match: "a"
lps[2] = 1
i = 3
Substring: "abab"
Prefixes:  a, ab, aba
Suffixes:  b, ab, bab
Match: "ab"
lps[3] = 2
i = 4
Substring: "ababa"
Match: "aba"
lps[4] = 3
i = 5
Substring: "ababac"
No prefix-suffix match.
We fall back using previous LPS values.
lps[5] = 0
i = 6
Substring: "ababaca"
Match: "a"
lps[6] = 1

Final LPS:

Index: 0 1 2 3 4 5 6
LPS:   0 0 1 2 3 0 1

How the Fallback Works (Mermaid Diagram)

Here’s the mental model of fallback transitions:

flowchart LR
    Start[Match k characters]
    Mismatch[Character mismatch]
    Jump["Jump to lps[k-1]"]
    Continue[Continue comparison]
    Reset[If lps=0, advance text index]

    Start --> Mismatch
    Mismatch --> Jump
    Jump --> Continue
    Continue -->|Mismatch again| Jump
    Continue -->|Match| Start
    Jump -->|lps = 0| Reset

Key idea:

We never re-check characters in the text. We only shift inside the pattern.


Full Search Example

Text:

ababcabcabababd

Pattern:

ababd

When mismatch happens after matching "abab":

  • Instead of restarting from scratch,
  • We use lps[3] = 2
  • Resume matching from pattern index 2

That’s the entire optimisation.

No backtracking in the text. Ever.


Complexity Analysis

Let:

  • n = length of text
  • m = length of pattern

LPS construction

Each character is processed once, with occasional fallback. Total: O(m)

Each character in text is visited once. Pattern index may move backward, but never more than total m times overall.

Total: O(n)

Final Complexity

Time:  O(n + m)
Space: O(m)

Compare to naive worst-case:

O(n × m)

Classic pathological example for naive:

Text:    aaaaaaaaaaaaaab
Pattern: aaaaab

Naive keeps restarting. KMP doesn’t.


Implementation

php
python
csharp
swift
<?php
declare(strict_types=1);

function buildLps(string $pattern): array
{
    $m = strlen($pattern);
    $lps = array_fill(0, $m, 0);

    $len = 0; // length of the previous longest prefix suffix
    $i = 1;

    while ($i < $m) {
        if ($pattern[$i] === $pattern[$len]) {
            $len++;
            $lps[$i] = $len;
            $i++;
        } else {
            if ($len !== 0) {
                $len = $lps[$len - 1];
            } else {
                $lps[$i] = 0;
                $i++;
            }
        }
    }

    return $lps;
}

/**
 * @return int[] 0-based match start indices
 */
function kmpSearch(string $text, string $pattern): array
{
    $n = strlen($text);
    $m = strlen($pattern);

    if ($m === 0) {
        // Convention: empty pattern matches at every position (including end)
        $all = [];
        for ($i = 0; $i <= $n; $i++) $all[] = $i;
        return $all;
    }
    if ($n === 0 || $m > $n) return [];

    $lps = buildLps($pattern);
    $matches = [];

    $i = 0; // index for text
    $j = 0; // index for pattern

    while ($i < $n) {
        if ($text[$i] === $pattern[$j]) {
            $i++;
            $j++;

            if ($j === $m) {
                $matches[] = $i - $j;
                $j = $lps[$j - 1];
            }
        } else {
            if ($j !== 0) {
                $j = $lps[$j - 1];
            } else {
                $i++;
            }
        }
    }

    return $matches;
}

// Example:
// print_r(kmpSearch("ababcabcabababd", "ababd")); // [10]
from typing import List

def build_lps(pattern: str) -> List[int]:
    m = len(pattern)
    lps = [0] * m
    length = 0
    i = 1

    while i < m:
        if pattern[i] == pattern[length]:
            length += 1
            lps[i] = length
            i += 1
        else:
            if length != 0:
                length = lps[length - 1]
            else:
                lps[i] = 0
                i += 1
    return lps

def kmp_search(text: str, pattern: str) -> List[int]:
    n, m = len(text), len(pattern)

    if m == 0:
        return list(range(n + 1))
    if n == 0 or m > n:
        return []

    lps = build_lps(pattern)
    matches: List[int] = []

    i = 0  # text index
    j = 0  # pattern index

    while i < n:
        if text[i] == pattern[j]:
            i += 1
            j += 1
            if j == m:
                matches.append(i - j)
                j = lps[j - 1]
        else:
            if j != 0:
                j = lps[j - 1]
            else:
                i += 1

    return matches

# Example:
# print(kmp_search("ababcabcabababd", "ababd"))  # [10]
using System;
using System.Collections.Generic;

public static class Kmp
{
    public static int[] BuildLps(string pattern)
    {
        int m = pattern.Length;
        var lps = new int[m];

        int len = 0;
        int i = 1;

        while (i < m)
        {
            if (pattern[i] == pattern[len])
            {
                len++;
                lps[i] = len;
                i++;
            }
            else
            {
                if (len != 0)
                {
                    len = lps[len - 1];
                }
                else
                {
                    lps[i] = 0;
                    i++;
                }
            }
        }

        return lps;
    }

    // Returns 0-based match start indices
    public static List<int> Search(string text, string pattern)
    {
        int n = text.Length;
        int m = pattern.Length;

        if (m == 0)
        {
            var all = new List<int>(n + 1);
            for (int k = 0; k <= n; k++) all.Add(k);
            return all;
        }
        if (n == 0 || m > n) return new List<int>();

        int[] lps = BuildLps(pattern);
        var matches = new List<int>();

        int i = 0; // text
        int j = 0; // pattern

        while (i < n)
        {
            if (text[i] == pattern[j])
            {
                i++;
                j++;

                if (j == m)
                {
                    matches.Add(i - j);
                    j = lps[j - 1];
                }
            }
            else
            {
                if (j != 0)
                {
                    j = lps[j - 1];
                }
                else
                {
                    i++;
                }
            }
        }

        return matches;
    }
}

// Example:
// var hits = Kmp.Search("ababcabcabababd", "ababd"); // [10]
import Foundation

func buildLps(_ pattern: [Character]) -> [Int] {
    let m = pattern.count
    if m == 0 { return [] }

    var lps = Array(repeating: 0, count: m)
    var length = 0
    var i = 1

    while i < m {
        if pattern[i] == pattern[length] {
            length += 1
            lps[i] = length
            i += 1
        } else {
            if length != 0 {
                length = lps[length - 1]
            } else {
                lps[i] = 0
                i += 1
            }
        }
    }
    return lps
}

// Returns 0-based match start indices
func kmpSearch(text: String, pattern: String) -> [Int] {
    let t = Array(text)
    let p = Array(pattern)

    let n = t.count
    let m = p.count

    if m == 0 {
        return Array(0...n) // empty pattern matches at all positions including end
    }
    if n == 0 || m > n {
        return []
    }

    let lps = buildLps(p)
    var matches: [Int] = []

    var i = 0 // text index
    var j = 0 // pattern index

    while i < n {
        if t[i] == p[j] {
            i += 1
            j += 1

            if j == m {
                matches.append(i - j)
                j = lps[j - 1]
            }
        } else {
            if j != 0 {
                j = lps[j - 1]
            } else {
                i += 1
            }
        }
    }

    return matches
}

// Example:
// print(kmpSearch(text: "ababcabcabababd", pattern: "ababd")) // [10]

Common Pitfalls

1. Forgetting “proper” prefix

You cannot use the entire substring as both prefix and suffix.


2. Off-by-one errors

The fallback is:

j = lps[j - 1]

Not:

j = lps[j]

That one mistake breaks everything.


3. Empty Pattern Handling

Conventionally:

search("", "") → [0]
search("abc", "") → [0,1,2,3]

Decide your convention and document it.


4. Unicode Issues

Languages differ:

  • PHP and C# index bytes unless using Unicode-aware APIs.
  • Swift String indexing is not O(1).
  • Python handles Unicode cleanly.

If doing heavy search in Swift, convert to [UInt8] or [Character].


5. Misunderstanding What LPS Represents

It does NOT mean:

“How many characters matched previously.”

It means:

“If we fail here, what is the largest prefix that we already know matches?”

That’s a subtle but critical difference.


Why KMP Matters

KMP is more than a string search algorithm.

It introduces a pattern you'll see in advanced algorithms:

Precompute structure from the input
So the main algorithm becomes linear.

This same idea appears in:

  • Z-algorithm
  • Aho–Corasick
  • Rabin–Karp
  • Suffix arrays
  • Even dynamic programming patterns

Final Takeaway

KMP eliminates redundant work by remembering what the pattern already proved.

It’s deterministic. It’s linear. It’s elegant.

And once you truly understand LPS, it feels almost obvious.


Published 2026-03-03 >> 2026-03-03