test20230131 Round2_test_string1

前言

在此谢罪：虽然这场考试后我干了一些无耻的事情，我认为主凶是数组大小中将 L 写成了 N（我是个伞兵）。

完善情况：

A	B	C	D	E	F
√	√	√	√	✖	✖

P4391 [BOI2009]Radio Transmission 无线传输

给你一个长度为 \(L\) 的字符串 \(s_1\)，它是由某个字符串 \(s_2\) 不断自我连接形成的。但是字符串 \(s_2\) 是不确定的，现在只想知道它的最短长度是多少。
对于全部的测试点，保证 \(1 < L \le 10^6\)。

显然，答案为 \(L-\operatorname{fail}(L)\)。其中 \(\operatorname{fail}\) 是 \(s_1\) 的失配函数。

给个简单的证明：

设 \(K=|s_2|\)。则 \(\operatorname{fail}(K)=0\)（如果不为 \(0\)，那么一定有更小的）。

然后就有 \(\operatorname{fail}(K+n)=n(0\leq n\leq L-K)\)。

所以 \(\operatorname{fail}(L)=L-K\)。

所以 \(K=L-\operatorname{fail}(L)\)。证毕。

代码：

#include <bits/stdc++.h>
#define int long long
using namespace std;

const int N = 1e6+5;
int fail[N],n;
string s;

signed main(){
    ios::sync_with_stdio(false);cin.tie(0);cout.tie(0);
    cin>>n>>s;
    s=" "+s;
    for(int i=2,j=0;i<=n;i++){
        while(j&&s[i]!=s[j+1]) j=fail[j];
        if(s[i]==s[j+1]) j++;
        fail[i]=j;
    }
    cout<<(n-fail[n]);
    return 0;
}

P4824 [USACO15FEB] Censoring S

给出两个字符串 \(S,T\)。你需要进行以上操作：

删除一个 \(S\) 的，等于 \(T\) 的子串。

重复以上操作，直到无法执行上面的操作。

输出最终的 \(S\)。保证答案不会是空串。
\(1 \leq |T| \lt |S| \leq 10^6\)

这道题可以使用 P3121 [USACO15FEB]Censoring G 的栈 + AC 自动机轻松解决(这道题是考试题的加强版）。不过也可以使用与其思想类似的 KMP。

先对 \(t\) 求一遍 \(\operatorname{fail}\)。然后进行匹配。如果匹配到了，说明需要删去。维护一个栈和匹配到的最大的位置 \(\operatorname{pos}\)。需要删去时弹出栈内 \(|t|\) 个元素，匹配指针 \(j\) 也退回以前的 \(\operatorname{pos}\)。这样就可以实现可删除 KMP 了。

时间复杂度不会算，不过大概是 \(O(|s|+|t|)\) 的样子。

#include <bits/stdc++.h>
#define int long long
using namespace std;

string s,t;
int n,m,fail[1000005],pos[1000005],sta[1000005],top;

signed main(){
    cin>>s>>t;
    n=s.length();m=t.length();s=" "+s;t=" "+t;
    for(int i=2,j=0;i<=m;i++){
        while(j&&t[i]!=t[j+1]) j=fail[j];
        if(t[i]==t[j+1]) j++;
        fail[i]=j;
    }
    for(int i=1,j=0;i<=n;i++){
        while(j&&s[i]!=t[j+1]) j=fail[j];
        if(s[i]==t[j+1]) j++;
        pos[i]=j;sta[++top]=i;
        if(j==m){
            top-=m;
            j=pos[sta[top]];
        }
    }
    for(int i=1;i<=top;i++) cout<<s[sta[i]];
    return 0;
}

P4503 [CTSC2014] 企鹅 QQ

定义：如果两个字符串 \(a,b\) 相似，当且仅当 \(|a|=|b|\)，且有且仅有一个 \(k(1 \leq k \leq |a|)\)，\(a_k\neq b_k\)。
给出 \(n\) 个长度为 \(L\) 的，两两不同字符串 \(s_i\)。求出有多少个无序数对 \((i,j)\)，使得 \(a_i,a_j\) 相似。
\(1 \leq n \leq 3\times10^4,1 \leq L \leq 200\)

首先我们考虑如何高效判断两个字符串相似，暴力是 \(O(L)\) 的。而且几乎无法优化。

我们考虑另一种思路，相似的两个字符串一定只有一个字符不同，我们枚举这个字符，然后就转换为了匹配字符前的部分和字符后的部分。容易看出字符前的部分就是前缀，字符后的部分就是后缀。预处理前缀和后缀 Hash 就可以实现高效匹配了。时间复杂度预处理 \(O(nL)\)，单次 \(O(L)\)。

沿着这个思路继续思考，我们可以枚举不同的位置，把所以字符串中以这个位置分隔的前缀 Hash 和后缀 Hash 拉出来。就变成了求一个序列中有多少对相同的元素。

这是一个经典问题，可以使用排序 + 双指针 + 简单的排列组合解决，时间复杂度 \(O(n\log n)\)。

时间复杂度 \(O(nL\log n)\)。可以通过本题。

#include <bits/stdc++.h>
#define int long long
using namespace std;

typedef unsigned long long ull;
const int N = 30005,L = 205;
const ull BASE = 125;
int n,l,s,ans;
ull pre[N][L],suf[N][L];

struct node{
    ull pre,suf;
    bool operator<(const node &x) const {
        return (pre!=x.pre)?(pre<x.pre):(suf<x.suf);
    }
    bool operator==(const node &x) const {
        return pre==x.pre && suf==x.suf;
    }
} buf[N];

signed main(){
    ios::sync_with_stdio(false);cin.tie(0);cout.tie(0);
    cin>>n>>l>>s;
    for(int i=1;i<=n;i++){
        string str;
        cin>>str;
        str=" "+str;
        for(int j=1;j<=l;j++) pre[i][j]=pre[i][j-1]*BASE+str[j];
        for(int j=l;j;j--) suf[i][j]=suf[i][j+1]*BASE+str[j];
    }
    for(int diff=1;diff<=l;diff++){
        for(int j=1;j<=n;j++){
            buf[j].pre=pre[j][diff-1];buf[j].suf=suf[j][diff+1];
        }
        sort(buf+1,buf+n+1);
        int l=1,r=1;
        while(l<=n){
            while(r<=n&&buf[l]==buf[r]) r++;
            r--;
            ans+=(((r-l+1)*(r-l))>>1);
            l=r+1;r=l;
        }
    }
    cout<<ans;
    return 0;
}

P7469 [NOI Online 2021 提高组] 积木小赛

给出两个长度为 \(n\) 的字符串 \(s,t\)。求出有多少个 \(t\) 的非空子串 \(a\)，使得 \(a\) 是 \(s\) 的子序列。
\(1 \leq n \leq 3\times10^3\)。

首先枚举 \(t\) 的所有非空子串，固定左端点 \(l\)，使右端点 \(r\) 递增。这时候如果存在子序列，那么子序列一定可以是一个前缀扩展形式（指前一个字符串是后一个字符串的前缀）。

这时我们可以维护一个指针 \(c\) 往右扫。贪心地找到最先发现的 \(s_c=t_r\)。贪心选前面的一定比选后面的不劣（选前面的选择性更多）。

最后考虑如何判重。我们可以计算出所有满足条件的字符串哈希值。然后丢到 unordered_set，set，或随便搞一个数组（之后进行 sort 和 unique）里面去。实测只有最后一个方法可以通过。

请使用 C++ 98。否则会喜提 \(90\) 分。（如果开 O2 就随意了）

时间复杂度 \(O(n^2\log n)\)。空间复杂度 \(O(n^2)\)。

#include <bits/stdc++.h>
using namespace std;

typedef unsigned long long ull;
int n;
char s[3005],t[3005];
const ull BASE = 29;
ull sset[(3005*3005)],tot;

signed main(){
    ios::sync_with_stdio(false);cin.tie(0);cout.tie(0);
    cin>>n;
    cin>>(s+1)>>(t+1);
    for(int i=1;i<=n;i++){
        int cur=1;
        ull hashing = 0;
        for(int j=i;j<=n;j++){
            while(cur<=n&&s[cur]!=t[j]) cur++; 
            if(cur>n) break;
            hashing=(hashing*BASE)+(t[j]-'a'+1);
            sset[++tot]=hashing;
            cur++;
        }
    }
    sort(sset+1,sset+tot+1);
    cout<<((unique(sset+1,sset+tot+1))-sset-1)<<'\n';
    return 0;
}

UVA11732 "strcmp()" Anyone?

共 \(T\) 组数据，每组数据给出 \(n\) 个字符串 \(S_i\)。对于每组数据，询问 \(\sum_{i=1}^n\sum_{j=i+1}^n cnt(\operatorname{strcmp}(i,j))\) 的值。其中 \(cnt(\operatorname{strcmp}(i,j))\) 表示对 \(S_i,S_j\) 执行一次 \(\operatorname{strcmp}\) 函数需要的比较次数，\(\operatorname{strcmp}\) 的实现如下：
int strcmp(char *s, char *t)
{
   int i;
   for (i=0; s[i]==t[i]; i++)
       if (s[i]=='\0')
           return 0;
   return s[i] - t[i];
}
\(1\leq T\leq 10\)，\(1\le n\le4\times10^3\)，\(1\le |S|\le10^3\)，\(\Sigma=\{\mathtt{a\sim z},\mathtt{A\sim Z},\mathtt{0\sim 9}\}\)，其中 \(\Sigma\) 表示可能在 \(S\) 中出现的字符集合

UVA1401 Remember the Word

多组数据，每组数据给出一个字符串 \(a\) 和一个长度为 \(s\) 的字典。你需要将 \(a\) 分割成几段，使得每一段都在字典中出现过。求方案数。对 \(20071027\) 取模。
\(1 \leq s \leq 4000,1 \leq |a| \leq 3\times 10^5\)，字典中的单词长度不超过 \(100\)，且字典中的单词互不相同。本题中的字符串仅包含英文小写字母。