A de Bruijn sequence of order $k$ over a finite alphabet is a cyclic sequence with the property that it contains every possible $k$-sequence as a substring exactly once. Orthogonal de Bruijn sequences are collections of de Bruijn sequences of the same order, $k$, satisfying the joint constraint that every $(k+1)$-sequence appears as a substring in at most one of the sequences in the collection. Both de Bruijn and orthogonal de Bruijn sequences have found numerous applications in synthetic biology, although the latter remain largely unexplored in the coding theory literature. Here we study three relevant practical generalizations of orthogonal de Bruijn sequences where we relax either the constraint that every $(k+1)$-sequence appears exactly once, or that the sequences themselves are de Bruijn rather than balanced de Bruijn sequences. We also provide lower and upper bounds on the number of fixed-weight orthogonal de Bruijn sequences. The paper concludes with parallel results for orthogonal nonbinary Kautz sequences, which satisfy similar constraints as de Bruijn sequences except for only being required to cover all subsequences of length $k$ whose maximum runlength equals to one.
We introduce and analyze a three-parameter family of self-referential integer sequences $S(x,y,z)$: starting from $a(1)=x$, each term advances by $y$ when the index $k$ has already appeared as a value and by $z$ otherwise. This simple rule generates a surprising zoo of behaviors, many of which are catalogued - albeit in a rather unstructured fashion - in the OEIS. This family has recently and independently been studied by Fokkink and Joshi, who named them "hiccup sequences" and established their general morphic nature. Our work provides a complementary, in-depth analysis of major subfamilies. Whenever $y>z>0$, we prove that the density $a(k)/k$ converges to the positive root of $r^{2}-zr-(y-z)=0$. Two subfamilies, $S(x,Z+1,Z)$ and $S(x,Z,Z+1)$, yield explicit non-homogeneous Beatty sequences, providing explicit formulas for numerous OEIS entries. For $y=0$ and $z \ge 2$, the sequences eventually become periodic and satisfy linear recurrences. Critical cases with a zero discriminant unveil geometric patterns on triangular, square, and hexagonal lattices. Finally, via tree-like representations we uncover a tight link with meta-Fibonacci recurrences. These results position $S(x,
A general construction of a set of time-domain sequences with sparse periodic correlation functions, having multiple segments of consecutive zero-values, i.e. multiple zero correlation zones (ZCZs), is presented. All such sequences have a common and block-repetitive structure of the positions of zeros in their Discrete Fourier Transform (DFT) sequences, where the exact positions of zeros in a DFT sequence do not impact the positions and sizes of ZCZs. This property offers completely new degree of flexibility in designing signals with good correlation properties under various spectral constraints. The non-zero values of the DFT sequences are determined by the corresponding frequency-domain modulation sequences, constructed as the element-by-element product of two component sequences: a "long" one, which is common to the set of time-domain sequences, and which controls the peak-to-average power ratio (PAPR) properties of the time-domain sequences; and a "short" one, periodically extended to match the length of the "long" component sequence, which controls the non-zero crosscorrelation values of all time-domain sequences. It is shown that 0 dB PAPR of time-domain sequences can be obta
A sequence is difference algebraic (or D-algebraic) if finitely many shifts of its general term satisfy a polynomial relationship; that is, they are the coordinates of a generic point on an affine hypersurface. The corresponding equations are denoted algebraic difference equations (ADEs). We propose a formal definition of D-algebraicity for sequences and investigate algorithms for their closure properties. We show that subsequences of D-algebraic sequences, indexed by arithmetic progressions, satisfy ADEs of the same orders as the original sequences. Additionally, we discuss the special difference-algebraic nature of holonomic and $C^2$-finite sequences.
This article introduces idMotif, a visual analytics framework designed to aid domain experts in the identification of motifs within protein sequences. Motifs, short sequences of amino acids, are critical for understanding the distinct functions of proteins. Identifying these motifs is pivotal for predicting diseases or infections. idMotif employs a deep learning-based method for the categorization of protein sequences, enabling the discovery of potential motif candidates within protein groups through local explanations of deep learning model decisions. It offers multiple interactive views for the analysis of protein clusters or groups and their sequences. A case study, complemented by expert feedback, illustrates idMotif's utility in facilitating the analysis and identification of protein sequences and motifs.
Alignment-free sequence analysis approaches provide important alternatives over multiple sequence alignment (MSA) in biological sequence analysis because alignment-free approaches have low computation complexity and are not dependent on high level of sequence identity, however, most of the existing alignment-free methods do not employ true full information content of sequences and thus can not accurately reveal similarities and differences among DNA sequences. We present a novel alignment-free computational method for sequence analysis based on Ramanujan-Fourier transform (RFT), in which complete information of DNA sequences is retained. We represent DNA sequences as four binary indicator sequences and apply RFT on the indicator sequences to convert them into frequency domain. The Euclidean distance of the complete RFT coefficients of DNA sequences are used as similarity measure. To address the different lengths in Euclidean space of RFT coefficients, we pad zeros to short DNA binary sequences so that the binary sequences equal the longest length in the comparison sequence data. Thus, the DNA sequences are compared in the same dimensional frequency space without information loss. W
This paper presents a reinterpretation of a second-order linear recurrence sequence as a sequence of continuants derived from the convergents to a continued fraction. As a result, we are able to derive the generating function and Binet formula for continuants. Using this result, we provide a continuant-based formulation for well-known identities associated with Lucas sequences.
The autocorrelation of a sequence is a useful criterion, among all, of resistance to cryptographic attacks. The behavior of the autocorrelations of random Boolean functions (studied by Florian Caullery, Eric Férard and François Rodier [4]) shows that they are concentrated around a point. We show that the same is true for the evaluation of the periodic autocorrelations of random binary sequences.
Given a set of integers with no three in arithmetic progression, we construct a Stanley sequence by adding integers greedily so that no arithmetic progression is formed. This paper offers two main contributions to the theory of Stanley sequences. First, we characterize well-structured Stanley sequences as solutions to constraints in modular arithmetic, defining the modular Stanley sequences. Second, we introduce the basic Stanley sequences, where elements arise as the sums of subsets of a basis sequence, which in the simplest case is the powers of 3. Applications of our results include the construction of Stanley sequences with arbitrarily large gaps between terms, answering a weak version of a problem by Erdős et al. Finally, we generalize many results about Stanley sequences to $p$-free sequences, where $p$ is any odd prime.
This paper is devoted to the study of eigen-sequences for some important operators acting on sequences. Using functional equations involving generating functions, we completely solve the problem of characterizing the fixed sequences for the Generalized Binomial operator. We give some applications to integer sequences. In particular we show how we can generate fixed sequences for Generalized Binomial and their relation with the Worpitzky transform. We illustrate this fact with some interesting examples and identities, related to Fibonacci, Catalan, Motzkin and Euler numbers. Finally we find the eigen-sequences for the mutual compositions of the operators Interpolated Invert, Generalized Binomial and Revert.
A sequence $a=(a_n)_{n=1}^\infty$ of non-negative integers is called realizable if there is a self-map $T:X\to X$ on a set $X$ such that $a_n$ is equal to the number of periodic points of $T$ in $X$ of (not necessarily exact) period $n$, for all $n\geq1$. The sequence $a$ is called almost realizable if there exists a positive integer $m$ such that $(ma_n)_{n=1}^\infty$ is realizable. In this article, we show that certain wide classes of integer sequences are realizable, which contain many famous combinatorial sequences, such as the sequences of Apéry numbers of both kinds, central Delannoy numbers, Franel numbers, Domb numbers, Zagier numbers, and central trinomial coefficients. We also show that the sequences of Catalan numbers, Motzkin numbers, and large and small Schröder numbers are not almost realizable.
We consider the greatest common divisor (GCD) of all sums of $k$ consecutive terms of a sequence $(S_n)_{n\geq 0}$ where the terms $S_n$ come from exactly one of following six well-known sequences' terms: Pell $P_n$, associated Pell $Q_n$, balancing $B_n$, Lucas-balancing $C_n$, cobalancing $b_n$, and Lucas-cobalancing $c_n$ numbers. For each of the six GCDs, we provide closed forms dependent on $k$. Moreover, each of these closed forms can be realized as braid sequences of Pell and associated Pell numbers in an intriguing manner. We end with partial results on GCDs of sums of squared terms and open questions.
The lonely singles sequence represents the number of noncrossing partitions of the finite set {1,. .. , n} in which no pair of singletons {i} and {j} can be merged into the pair {i, j} so that the partition stays noncrossing. The marriageable singles sequence represents the number of all the other noncrossing partitions and is the difference between the Catalan numbers sequence and the lonely singles sequence. The 14 first terms of these sequences are given, as well as some of their properties. These sequences appear when one wants to count the number of ways to cross simultaneously certain road intersections.
A cryptarithm (or alphametic) is a mathematical puzzle in which numbers are represented with words in such a way that identical letters stand for equal digits and distinct letters for unequal digits. An alphametic puzzle is usually given in the form of an equation that needs to be solved, such as SEND + MORE = MONEY. Alternatively, here we will consider cryptarithms constrained not by an equation but by a particular subsequence of natural numbers, for example perfect squares or primes. Such a cryptarithm has a unique solution if there is exactly one term in the sequence that has the corresponding pattern of digits. We will call such terms cryptarithmically unique. Here we estimate the density of such terms in an arbitrary sequence for which the overall density of terms among integers is known. In particular, among all perfect squares below 10^12, slightly less than one half are cryptarithmically unique, their density increasing toward larger numbers. Cryptarithmically unique prime numbers, however, are initially very scarce. Combinatorial estimates suggest that their density should drop below 10^-300 for decimal lengths of approximately 1829 digits, but then it recovers and is asym
This is a review on subgaussian sequences of random variables, prepared for the Mediterranean Institute for the Mathematical Sciences (MIMS). We first describe the main examples of such sequences. Then we focus on examples coming from the harmonic analysis of Fourier series and we describe the connection of subgaussian sequences of characters on the unidimensional torus (or any compact Abelian group) with Sidon sets. We explain the main combinatorial open problem concerning such subgaussian sequences. We present the answer to the analogous question for subgaussian bounded mean oscillation (BMO) sequences on the unit circle. Lastly, we describe several very recent results that provide a generalization of the preceding ones when the trigonometric system (or its analogue on a compact Abelian group) is replaced by an arbitrary orthonormal system bounded in $L_\infty$.
We find new representations, in terms of constant terms of powers of Laurent polynomials, for all the 15 sporadic Ap{é}ry-like sequences discovered by Zagier, Almkvist-Zudilin and Cooper. The new representations lead to binomial expressions for the sequences, which, as opposed to previous expressions, do not involve powers of 3 or 8. We use these to establish the supercongruence $B_{np^k} \equiv B_{np^{k-1}} \bmod p^{2k}$ for all primes $p \ge 3$ and integers $n,k \ge 1$, where $B_n$ is a sequence discovered by Zagier, known as Sequence $\mathbf{B}$. Additionally, for 14 of the 15 sequences, the Newton polytopes of the Laurent polynomials contain the origin as their only interior integral point. This property allows us to prove that these sequences satisfy a strong form of the Lucas congruences, extending work of Malik and Straub. Moreover, we obtain lower bounds on the $p$-adic valuation of these sequences via recent work of Delaygue.
The Stern diatomic sequence is closely linked to continued fractions via the Gauss map on the unit interval, which in turn can be understood via systematic subdivisions of the unit interval. Higher dimensional analogues of continued fractions, called multidimensional continued fractions, can be produced through various subdivisions of a triangle. We define triangle partition-Stern sequences (TRIP-Stern sequences for short), higher-dimensional generalizations of the Stern diatomic sequence, from the method of subdividing a triangle via various triangle partition algorithms. We then explore several combinatorial results about TRIP-Stern sequences, which may be used to give rise to certain well-known sequences. We finish by generalizing TRIP-Stern sequences and presenting analogous results for these generalizations.
Operators on probability distributions can be expressed as operators on the associated moment sequences, and so correspond to operators on integer sequences. Thus, there is an opportunity to apply each theory to the other. Moreover, probability models can be sources of integer sequences, both classical and new, as we show by considering the classical M/G/1 single-server queueing model. We identify moment sequences that are integer sequences. We establish connections between the M/M/1 busy period distribution and the Catalan and Schroeder numbers.
We study the distribution of sequences of the form $(q_ny)_{n=1}^\infty$, where $(q_n)_{n=1}^\infty$ is some increasing sequence of integers. In particular, we study the Lebesgue measure and find bounds on the Hausdorff dimension of the set of points $γ\in [0,1)$ which are well approximated by points in the sequence $(q_ny)_{n=1}^\infty$. The bounds on Hausdorff dimension are valid for almost every $y$ in the support of a measure of positive Fourier dimension. When the required rate of approximation is very good or if our sequence is sufficiently rapidly growing, our dimension bounds are sharp. If the measure of positive Fourier dimension is itself Lebesgue measure, our measure bounds are also sharp for a very large class of sequences. We also give an application to inhomogeneous Littlewood type problems.
This paper introduces a novel framework for DNA sequence generation, comprising two key components: DiscDiff, a Latent Diffusion Model (LDM) tailored for generating discrete DNA sequences, and Absorb-Escape, a post-training algorithm designed to refine these sequences. Absorb-Escape enhances the realism of the generated sequences by correcting `round errors' inherent in the conversion process between latent and input spaces. Our approach not only sets new standards in DNA sequence generation but also demonstrates superior performance over existing diffusion models, in generating both short and long DNA sequences. Additionally, we introduce EPD-GenDNA, the first comprehensive, multi-species dataset for DNA generation, encompassing 160,000 unique sequences from 15 species. We hope this study will advance the generative modelling of DNA, with potential implications for gene therapy and protein production.