This work focuses on training graph foundation models (GFMs) that have strong generalization ability in graph-level tasks such as graph classification. Effective GFM training requires capturing information consistent across different domains. We discover that graph structures provide more consistent cross-domain information compared to node features and graph labels. However, traditional GFMs primarily focus on transferring node features from various domains into a unified representation space but often lack structural cross-domain generalization. To address this, we introduce GraphProp, which emphasizes structural generalization. The training process of GraphProp consists of two main phases. First, we train a structural GFM by predicting graph invariants. Since graph invariants are properties of graphs that depend only on the abstract structure, not on particular labellings or drawings of the graph, this structural GFM has a strong ability to capture the abstract structural information and provide discriminative graph representations comparable across diverse domains. In the second phase, we use the representations given by the structural GFM as positional encodings to train a com
The notion of a 12-representable graph was introduced by Jones et al.. This notion generalizes the notions of the much studied permutation graphs and co-interval graphs. It is known that any 12-representable graph is a comparability graph, and also that a tree is 12-representable if and only if it is a double caterpillar. Moreover, Jones et al.\ initiated the study of 12-representability of induced subgraphs of a grid graph, and asked whether it is possible to characterize such graphs. This question in is meant to be about induced subgraphs of a grid graph that consist of squares, which we call square grid graphs. However, an induced subgraph in a grid graph does not have to contain entire squares, and we call such graphs line grid graphs. In this paper we answer the question of Jones et al.\ by providing a complete characterization of $12$-representable square grid graphs in terms of forbidden induced subgraphs. Moreover, we conjecture such a characterization for the line grid graphs and give a number of results towards solving this challenging conjecture. Our results are a major step in the direction of characterization of all 12-representable graphs since beyond our characteriza
Graph representation learning models aim to represent the graph structure and its features into low-dimensional vectors in a latent space, which can benefit various downstream tasks, such as node classification and link prediction. Due to its powerful graph data modelling capabilities, various graph embedding models and libraries have been proposed to learn embeddings and help researchers ease conducting experiments. In this paper, we introduce a novel graph representation framework covering various graph embedding models, ranging from shallow to state-of-the-art models, namely Connector. First, we consider graph generation by constructing various types of graphs with different structural relations, including homogeneous, signed, heterogeneous, and knowledge graphs. Second, we introduce various graph representation learning models, ranging from shallow to deep graph embedding models. Finally, we plan to build an efficient open-source framework that can provide deep graph embedding models to represent structural relations in graphs. The framework is available at https://github.com/NSLab-CUK/Connector.
Training on large-scale graphs has achieved remarkable results in graph representation learning, but its cost and storage have raised growing concerns. As one of the most promising directions, graph condensation methods address these issues by employing gradient matching, aiming to condense the full graph into a more concise yet information-rich synthetic set. Though encouraging, these strategies primarily emphasize matching directions of the gradients, which leads to deviations in the training trajectories. Such deviations are further magnified by the differences between the condensation and evaluation phases, culminating in accumulated errors, which detrimentally affect the performance of the condensed graphs. In light of this, we propose a novel graph condensation method named \textbf{C}raf\textbf{T}ing \textbf{R}ationa\textbf{L} trajectory (\textbf{CTRL}), which offers an optimized starting point closer to the original dataset's feature distribution and a more refined strategy for gradient matching. Theoretically, CTRL can effectively neutralize the impact of accumulated errors on the performance of condensed graphs. We provide extensive experiments on various graph datasets an
The recent GRAPH-BERT model introduces a new approach to learning graph representations merely based on the attention mechanism. GRAPH-BERT provides an opportunity for transferring pre-trained models and learned graph representations across different tasks within the same graph dataset. In this paper, we will further investigate the graph-to-graph transfer of a universal GRAPH-BERT for graph representation learning across different graph datasets, and our proposed model is also referred to as the G5 for simplicity. Many challenges exist in learning G5 to adapt the distinct input and output configurations for each graph data source, as well as the information distributions differences. G5 introduces a pluggable model architecture: (a) each data source will be pre-processed with a unique input representation learning component; (b) each output application task will also have a specific functional component; and (c) all such diverse input and output components will all be conjuncted with a universal GRAPH-BERT core component via an input size unification layer and an output representation fusion layer, respectively. The G5 model removes the last obstacle for cross-graph representation
An orientation of a graph is semi-transitive if it is acyclic, and for any directed path $v_0\rightarrow v_1\rightarrow \cdots\rightarrow v_k$ either there is no arc between $v_0$ and $v_k$, or $v_i\rightarrow v_j$ is an arc for all $0\leq i<j\leq k$. An undirected graph is semi-transitive if it admits a semi-transitive orientation. Semi-transitive graphs generalize several important classes of graphs and they are precisely the class of word-representable graphs studied extensively in the literature. Determining if a triangle-free graph is semi-transitive is an NP-hard problem. The existence of non-semi-transitive triangle-free graphs was established via Erdős' theorem by Halldórsson and the authors in 2011. However, no explicit examples of such graphs were known until recent work of the first author and Saito who have shown computationally that a certain subgraph on 16 vertices of the triangle-free Kneser graph $K(8,3)$ is not semi-transitive, and have raised the question on the existence of smaller triangle-free non-semi-transitive graphs. In this paper we prove that the smallest triangle-free 4-chromatic graph on 11 vertices (the Grötzsch graph) and the smallest triangle-free
Denoising-based models, including diffusion and flow matching, have led to substantial advances in graph generation. Despite this progress, such models remain constrained by two fundamental limitations: a computational cost that scales quadratically with the number of nodes and a large number of function evaluations required during generation. In this work, we introduce a novel hierarchical generative framework that reduces the number of node pairs that must be evaluated and adopts discrete flow matching to significantly decrease the number of denoising iterations. We empirically demonstrate that our approach more effectively captures graph distributions while substantially reducing generation time.
We consider embeddings of 3-regular graphs into 3-dimensional Cartesian coordinates, in such a way that two vertices are adjacent if and only if two of their three coordinates are equal (that is, if they lie on an axis-parallel line) and such that no three points lie on the same axis-parallel line; we call a graph with such an embedding an xyz graph}. We describe a correspondence between xyz graphs and face-colored embeddings of the graph onto two-dimensional manifolds, and we relate bipartiteness of the xyz graph to orientability of the underlying topological surface. Using this correspondence, we show that planar graphs are xyz graphs if and only if they are bipartite, cubic, and three-connected, and that it is NP-complete to determine whether an arbitrary graph is an xyz graph. We also describe an algorithm with running time O(n 2^{n/2}) for testing whether a given graph is an xyz graph.
In order to advance the state of the art in graph learning algorithms, it is necessary to construct large real-world datasets. While there are many benchmark datasets for homogeneous graphs, only a few of them are available for heterogeneous graphs. Furthermore, the latter graphs are small in size rendering them insufficient to understand how graph learning algorithms perform in terms of classification metrics and computational resource utilization. We introduce, PDNS-Net, the largest public heterogeneous graph dataset containing 447K nodes and 897K edges for the malicious domain classification task. Compared to the popular heterogeneous datasets IMDB and DBLP, PDNS-Net is 38 and 17 times bigger respectively. We provide a detailed analysis of PDNS-Net including the data collection methodology, heterogeneous graph construction, descriptive statistics and preliminary graph classification performance. The dataset is publicly available at https://github.com/qcri/PDNS-Net. Our preliminary evaluation of both popular homogeneous and heterogeneous graph neural networks on PDNS-Net reveals that further research is required to improve the performance of these models on large heterogeneous gr
This extended abstract introduces a class of graph learning applicable to cases where the underlying graph has polytopic uncertainty, i.e., the graph is not exactly known, but its parameters or properties vary within a known range. By incorporating this assumption that the graph lies in a polytopic set into two established graph learning frameworks, we find that our approach yields better results with less computation.
The \textit{generalized Turán number} $\mathrm{ex}(n, T, F)$ is the maximum possible number of copies of $T$ in an $F$-free graph on $n$ vertices for any two graphs $T$ and $F$. For the book graph $B_t$, there is a close connection between $\ex(n,K_3,B_t)$ and the Ruzsa-Szemerédi triangle removal lemma. Motivated by this, in this paper, we study the generalized Turán problem for generalized theta graphs, a natural extension of book graphs. Our main result provides a complete characterization of the magnitude of $\ex(n,K_3,H)$ when $H$ is a generalized theta graph, indicating when it is quadratic, when it is nearly quadratic, and when it is subquadratic. Furthermore, as an application, we obtain the exact value of $\ex(n, K_r, kF)$, where $F$ is an edge-critical generalized theta graph, and $3\le r\le k+1$, extending several recent results.
Graph representation learning, involving both node features and graph structures, is crucial for real-world applications but often encounters pervasive noise. State-of-the-art methods typically address noise by focusing separately on node features with large language models (LLMs) and on graph structures with graph structure learning models (GSLMs). In this paper, we introduce LangGSL, a robust framework that integrates the complementary strengths of pre-trained language models and GSLMs to jointly enhance both node feature and graph structure learning. In LangGSL, we first leverage LLMs to filter noise in the raw data and extract valuable cleaned information as features, enhancing the synergy of downstream models. During the mutual learning phase in LangGSL, the core idea is to leverage the relatively small language model (LM) to process local attributes and generate reliable pseudo-labels and informative node embeddings, which are then integrated into the GSLM's prediction phase. This approach enriches the global context and enhances overall performance. Meanwhile, GSLM refines the evolving graph structure constructed from the LM's output, offering updated labels back to the LM a
The rapid expansion of e-commerce platforms generates vast amounts of unstructured product data, creating significant challenges for information retrieval, recommendation systems, and data analytics. Knowledge Graphs (KGs) offer a structured, interpretable format to organize such data, yet constructing product-specific KGs remains a complex and manual process. This paper introduces a fully automated, AI agent-driven framework for constructing product knowledge graphs directly from unstructured product descriptions. Leveraging Large Language Models (LLMs), our method operates in three stages using dedicated agents: ontology creation and expansion, ontology refinement, and knowledge graph population. This agent-based approach ensures semantic coherence, scalability, and high-quality output without relying on predefined schemas or handcrafted extraction rules. We evaluate the system on a real-world dataset of air conditioner product descriptions, demonstrating strong performance in both ontology generation and KG population. The framework achieves over 97\% property coverage and minimal redundancy, validating its effectiveness and practical applicability. Our work highlights the poten
In 1960 Nash-Williams proved that an edge-connectivity of 2k is sufficient for a finite graph to have a k-arc-connected orientation. He then conjectured that the same is true for infinite graphs. In 2016, Thomassen, using his own results on the auxiliary lifting graph, proved that 8k-edge-connected infinite graphs admit a $k$-arc connected orientation. Here we improve this result for the class of $1$-ended locally-finite graphs and show that an edge-connectivity of 4k is enough in that case. Crucial to this improvement are results presented in a separate paper, by the same author of this paper, on the key concept of the lifting graph, extending results by Ok, Richter, and Thomassen.
Graph unlearning emerges as a crucial advancement in the pursuit of responsible AI, providing the means to remove sensitive data traces from trained models, thereby upholding the \textit{right to be forgotten}. It is evident that graph machine learning exhibits sensitivity to data privacy and adversarial attacks, necessitating the application of graph unlearning techniques to address these concerns effectively. In this comprehensive survey paper, we present the first systematic review of graph unlearning approaches, encompassing a diverse array of methodologies and offering a detailed taxonomy and up-to-date literature overview to facilitate the understanding of researchers new to this field. To ensure clarity, we provide lucid explanations of the fundamental concepts and evaluation measures used in graph unlearning, catering to a broader audience with varying levels of expertise. Delving into potential applications, we explore the versatility of graph unlearning across various domains, including but not limited to social networks, adversarial settings, recommender systems, and resource-constrained environments like the Internet of Things, illustrating its potential impact in safeg
Tuza famously conjectured in 1981 that in a graph without k+1 edge-disjoint triangles, it suffices to delete at most 2k edges to obtain a triangle-free graph. The conjecture holds for graphs with small treewidth or small maximum average degree, including planar graphs. However, for dense graphs that are neither cliques nor 4-colorable, only asymptotic results are known. Here, we confirm the conjecture for threshold graphs, i.e. graphs that are both split graphs and cographs, and for co-chain graphs with both sides of the same size divisible by 4.
For a graph $G$ and a list assignment $L$ with $|L(v)|=k$ for all $v$, an $L$-packing consists of $L$-colorings $\varphi_1,\cdots,\varphi_k$ such that $\varphi_i(v) e\varphi_j(v)$ for all $v$ and all distinct $i,j\in\{1,\ldots,k\}$. Let $χ^{\star}_{\ell}(G)$ denote the smallest $k$ such that $G$ has an $L$-packing for every $L$ with $|L(v)|=k$ for all $v$. Let $\mathcal{P}_k$ denote the set of all planar graphs with girth at least $k$. We show that (i) $χ^{\star}_{\ell}(G)\le 8$ for all $G\in \mathcal{P}_3$ and (ii) $χ^{\star}_{\ell}(G)\le 5$ for all $G\in \mathcal{P}_4$ and (iii) $χ^{\star}_{\ell}(G)\le 4$ for all $G\in \mathcal{P}_5$. Part (i) makes progress on a problem of Cambie, Cames van Batenburg, Davies, and Kang. We also construct outerplanar graphs $G$ such that $χ^{\star}_{\ell}(G)=4$, which matches the known upper bound $χ^{\star}_{\ell}(G)\le 4$ for all outerplanar graphs. Finally, we consider the analogue of $χ^{\star}_{\ell}$ for correspondence coloring, $χ^{\star}_c$. In fact, all bounds stated above for $χ^{\star}_{\ell}$ also hold for $χ^{\star}_c$.
The entropy of a graph was first introduced by Rashevsky \cite{Rashevsky} and Trucco \cite{Trucco} to interpret as the structural information content of the graph and serve as a complexity measure. In this paper, we first state a number of definitions of graph entropy measures and generalized graph entropies. Then we survey the known results about them from the following three respects: inequalities and extremal properties on graph entropies, relationships between graph structures, graph energies, topological indices and generalized graph entropies, complexity for calculation of graph entropies. Various applications of graph entropies together with some open problems and conjectures are also presented for further research.
The 3-coloring of hereditary graph classes has been a deeply-researched problem in the last decade. A hereditary graph class is characterized by a (possibly infinite) list of minimal forbidden induced subgraphs $H_1,H_2,\ldots$; the graphs in the class are called $(H_1,H_2,\ldots)$-free. The complexity of 3-coloring is far from being understood, even for classes defined by a few small forbidden induced subgraphs. For $H$-free graphs, the complexity is settled for any $H$ on up to seven vertices. There are only two unsolved cases on eight vertices, namely $2P_4$ and $P_8$. For $P_8$-free graphs, some partial results are known, but to the best of our knowledge, $2P_4$-free graphs have not been explored yet. In this paper, we show that the 3-coloring problem is polynomial-time solvable on $(2P_4,C_5)$-free graphs.
Enterprises often maintain multiple databases for storing critical business data in siloed systems, resulting in inefficiencies and challenges with data interoperability. A key to overcoming these challenges lies in integrating disparate data sources, enabling businesses to unlock the full potential of their data. Our work presents a novel approach for integrating multiple databases using knowledge graphs, focusing on the application of large language models as semantic agents for mapping and connecting structured data across systems by leveraging existing vocabularies. The proposed methodology introduces a semantic layer above tables in relational databases, utilizing a system comprising multiple LLM agents that map tables and columns to Schema.org terms. Our approach achieves a mapping accuracy of over 90% in multiple domains.