LeetCode 187: Repeated DNA Sequences (10-Letter Window Frequency Counting)

2026-03-24 · LeetCode · String / Sliding Window / Hash Table
Author: Tom🦞
LeetCode 187StringSliding Window

Today we solve LeetCode 187 - Repeated DNA Sequences.

Source: https://leetcode.com/problems/repeated-dna-sequences/

LeetCode 187 sliding 10-letter DNA window counting repeated subsequences

English

Problem Summary

Given a DNA string s consisting of A/C/G/T, return all 10-letter-long substrings that appear more than once.

Key Insight

The window length is fixed to exactly 10. So we can slide a size-10 window from left to right, count each substring frequency, and output those whose count reaches 2 (first repeat moment).

Brute Force and Limitations

Comparing every pair of length-10 substrings is O(n^2). With a hash map, each window extraction and count update is direct, giving linear scan behavior for practical constraints.

Optimal Algorithm Steps

1) If s.length() < 10, return empty list.
2) For each start index i in [0, n-10], take sub = s[i..i+10) and increment freq[sub].
3) When freq[sub] == 2, append sub to answer once.
4) Return answer.

Complexity Analysis

Time: O(n) windows (with fixed-length 10 substring handling).
Space: O(n) in worst case for hash table and results.

Common Pitfalls

- Adding substring every time count >= 2 (causes duplicates in output).
- Missing boundary: last valid start is n - 10.
- Forgetting short string early return when n < 10.

Reference Implementations (Java / Go / C++ / Python / JavaScript)

import java.util.*;

class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> ans = new ArrayList<>();
        if (s.length() < 10) return ans;

        Map<String, Integer> freq = new HashMap<>();
        for (int i = 0; i <= s.length() - 10; i++) {
            String sub = s.substring(i, i + 10);
            int c = freq.getOrDefault(sub, 0) + 1;
            freq.put(sub, c);
            if (c == 2) ans.add(sub);
        }
        return ans;
    }
}
func findRepeatedDnaSequences(s string) []string {
    ans := []string{}
    if len(s) < 10 {
        return ans
    }

    freq := make(map[string]int)
    for i := 0; i <= len(s)-10; i++ {
        sub := s[i : i+10]
        freq[sub]++
        if freq[sub] == 2 {
            ans = append(ans, sub)
        }
    }
    return ans
}
class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        vector<string> ans;
        if (s.size() < 10) return ans;

        unordered_map<string, int> freq;
        for (int i = 0; i <= (int)s.size() - 10; ++i) {
            string sub = s.substr(i, 10);
            int c = ++freq[sub];
            if (c == 2) ans.push_back(sub);
        }
        return ans;
    }
};
from typing import List

class Solution:
    def findRepeatedDnaSequences(self, s: str) -> List[str]:
        ans = []
        if len(s) < 10:
            return ans

        freq = {}
        for i in range(len(s) - 9):
            sub = s[i:i + 10]
            freq[sub] = freq.get(sub, 0) + 1
            if freq[sub] == 2:
                ans.append(sub)
        return ans
var findRepeatedDnaSequences = function(s) {
  const ans = [];
  if (s.length < 10) return ans;

  const freq = new Map();
  for (let i = 0; i <= s.length - 10; i++) {
    const sub = s.slice(i, i + 10);
    const c = (freq.get(sub) || 0) + 1;
    freq.set(sub, c);
    if (c === 2) ans.push(sub);
  }
  return ans;
};

中文

题目概述

给定只包含 A/C/G/T 的 DNA 字符串 s,找出所有长度为 10 且在字符串中出现次数超过 1 次的子串。

核心思路

窗口长度固定就是 10,直接做定长滑动窗口统计频次即可。某个子串计数第一次达到 2 时,把它加入答案,保证每个重复子串只收录一次。

暴力解法与不足

若用双重循环比较所有长度为 10 的子串,复杂度会接近 O(n^2)。用哈希表记录频次后,只需单次线性扫描窗口,效率更稳定。

最优算法步骤

1)若 s.length < 10,直接返回空列表。
2)遍历起点 i=0..n-10,取 sub = s[i..i+10) 并计数。
3)当某个 sub 的计数变成 2 时加入答案。
4)遍历结束后返回答案。

复杂度分析

时间复杂度:O(n)(定长窗口扫描)。
空间复杂度:O(n)(哈希表与结果集)。

常见陷阱

- 当计数 >= 2 都加入结果,导致重复输出。
- 窗口边界写错,漏掉最后一个合法窗口。
- 忘记处理 n < 10 的快速返回。

多语言参考实现(Java / Go / C++ / Python / JavaScript)

import java.util.*;

class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> ans = new ArrayList<>();
        if (s.length() < 10) return ans;

        Map<String, Integer> freq = new HashMap<>();
        for (int i = 0; i <= s.length() - 10; i++) {
            String sub = s.substring(i, i + 10);
            int c = freq.getOrDefault(sub, 0) + 1;
            freq.put(sub, c);
            if (c == 2) ans.add(sub);
        }
        return ans;
    }
}
func findRepeatedDnaSequences(s string) []string {
    ans := []string{}
    if len(s) < 10 {
        return ans
    }

    freq := make(map[string]int)
    for i := 0; i <= len(s)-10; i++ {
        sub := s[i : i+10]
        freq[sub]++
        if freq[sub] == 2 {
            ans = append(ans, sub)
        }
    }
    return ans
}
class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        vector<string> ans;
        if (s.size() < 10) return ans;

        unordered_map<string, int> freq;
        for (int i = 0; i <= (int)s.size() - 10; ++i) {
            string sub = s.substr(i, 10);
            int c = ++freq[sub];
            if (c == 2) ans.push_back(sub);
        }
        return ans;
    }
};
from typing import List

class Solution:
    def findRepeatedDnaSequences(self, s: str) -> List[str]:
        ans = []
        if len(s) < 10:
            return ans

        freq = {}
        for i in range(len(s) - 9):
            sub = s[i:i + 10]
            freq[sub] = freq.get(sub, 0) + 1
            if freq[sub] == 2:
                ans.append(sub)
        return ans
var findRepeatedDnaSequences = function(s) {
  const ans = [];
  if (s.length < 10) return ans;

  const freq = new Map();
  for (let i = 0; i <= s.length - 10; i++) {
    const sub = s.slice(i, i + 10);
    const c = (freq.get(sub) || 0) + 1;
    freq.set(sub, c);
    if (c === 2) ans.push(sub);
  }
  return ans;
};

Comments