共找到 20 条结果
Machine learning is rapidly making its pathway across all of the natural sciences, including physical sciences. The rate at which ML is impacting non-scientific disciplines is incomparable to that in the physical sciences. This is partly due to the uninterpretable nature of deep neural networks. Symbolic machine learning stands as an equal and complementary partner to numerical machine learning in speeding up scientific discovery in physics. This perspective discusses the main differences between the ML and scientific approaches. It stresses the need to develop and apply symbolic machine learning to physics problems equally, in parallel to numerical machine learning, because of the dual nature of physics research.
Symbolic regression (SR) has emerged as a powerful method for uncovering interpretable mathematical relationships from data, offering a novel route to both scientific discovery and efficient empirical modelling. This article introduces the Special Issue on Symbolic Regression for the Physical Sciences, motivated by the Royal Society discussion meeting held in April 2025. The contributions collected here span applications from automated equation discovery and emergent-phenomena modelling to the construction of compact emulators for computationally expensive simulations. The introductory review outlines the conceptual foundations of SR, contrasts it with conventional regression approaches, and surveys its main use cases in the physical sciences, including the derivation of effective theories, empirical functional forms and surrogate models. We summarise methodological considerations such as search-space design, operator selection, complexity control, feature selection, and integration with modern AI approaches. We also highlight ongoing challenges, including scalability, robustness to noise, overfitting and computational complexity. Finally we emphasise emerging directions, particula
In recent decades, the relevance of polarimetry in planetary sciences and astronomy has increased rapidly. Polarization is a fundamental property of light and can be modified by any scattering event. As such, polarization yields additional information that cannot be obtained by only assessing light's scalar properties. For instance, the polarization state of starlight scattered by planetary surfaces can provide useful insights on the composition, size, morphology, and porosity of regolith particles and might even indicate the presence of life. Beside being useful for characterization, polarimetry can also greatly enhance the detection of exoplanets. Here, polarization can be harnessed to enhance the contrast between the bright light of a star, which can be considered to be fully unpolarized, and the very dim but polarized light reflected by an exoplanet. In this paper, we discuss and review the current developments and advances in optical polarimetry and polarimetric instrumentation in Switzerland within the framework of the National Centre of Competence in Research PlanetS. We focus on their implications for the vast range of science cases that polarimetry can address within the r
This chapter examines the policy implications of behavioural sciences insights for the regulation of privacy on the Internet, by focusing in particular on behavioural targeting. This marketing technique involves tracking people's online behaviour to use the collected information to show people individually targeted advertisements. Enforcing data protection law may not be enough to protect privacy in this area. I argue that, if society is better off when certain behavioural targeting practices do not happen, policymakers should consider banning them.
Labeling or classifying time series is a persistent challenge in the physical sciences, where expert annotations are scarce, costly, and often inconsistent. Yet robust labeling is essential to enable machine learning models for understanding, prediction, and forecasting. We present the \textit{Clustering and Indexation Pipeline with Human Evaluation for Recognition} (CIPHER), a framework designed to accelerate large-scale labeling of complex time series in physics. CIPHER integrates \textit{indexable Symbolic Aggregate approXimation} (iSAX) for interpretable compression and indexing, density-based clustering (HDBSCAN) to group recurring phenomena, and a human-in-the-loop step for efficient expert validation. Representative samples are labeled by domain scientists, and these annotations are propagated across clusters to yield systematic, scalable classifications. We evaluate CIPHER on the task of classifying solar wind phenomena in OMNI data, a central challenge in space weather research, showing that the framework recovers meaningful phenomena such as coronal mass ejections and stream interaction regions. Beyond this case study, CIPHER highlights a general strategy for combining sy
Artificial intelligence (AI) has recently seen transformative breakthroughs in the life sciences, expanding possibilities for researchers to interpret biological information at an unprecedented capacity, with novel applications and advances being made almost daily. In order to maximise return on the growing investments in AI-based life science research and accelerate this progress, it has become urgent to address the exacerbation of long-standing research challenges arising from the rapid adoption of AI methods. We review the increased erosion of trust in AI research outputs, driven by the issues of poor reusability and reproducibility, and highlight their consequent impact on environmental sustainability. Furthermore, we discuss the fragmented components of the AI ecosystem and lack of guiding pathways to best support Open and Sustainable AI (OSAI) model development. In response, this perspective introduces a practical set of OSAI recommendations directly mapped to over 300 components of the AI ecosystem. Our work connects researchers with relevant AI resources, facilitating the implementation of sustainable, reusable and transparent AI. Built upon life science community consensus
Word embeddings are an essential instrument in many NLP tasks. Most available resources are trained on general language from Web corpora or Wikipedia dumps. However, word embeddings for domain-specific language are rare, in particular for the social science domain. Therefore, in this work, we describe the creation and evaluation of word embedding models based on 37,604 open-access social science research papers. In the evaluation, we compare domain-specific and general language models for (i) language coverage, (ii) diversity, and (iii) semantic relationships. We found that the created domain-specific model, even with a relatively small vocabulary size, covers a large part of social science concepts, their neighborhoods are diverse in comparison to more general models. Across all relation types, we found a more extensive coverage of semantic relationships.
This text provides with an introduction to the modern approach of artificiality and simulation in social sciences. It presents the relationship between complexity and artificiality, before introducing the field of artificial societies which greatly benefited from the computer power fast increase, gifting social sciences with formalization and experimentation tools previously owned by "hard" sciences alone. It shows that as "a new way of doing social sciences", artificial societies should undoubtedly contribute to a renewed approach in the study of sociality and should play a significant part in the elaboration of original theories of social phenomena.
We define deriving semantic class targets as a novel multi-modal task. By doing so, we aim to improve classification schemes in the physical sciences which can be severely abstracted and obfuscating. We address this task for upcoming radio astronomy surveys and present the derived semantic radio galaxy morphology class targets.
Do different fields of knowledge require different research strategies? A numerical model exploring different virtual knowledge landscapes, revealed two diverging optimal search strategies. Trend following is maximized when the popularity of new discoveries determine the number of individuals researching it. This strategy works best when many researchers explore few large areas of knowledge. In contrast, individuals or small groups of researchers are better in discovering small bits of information in dispersed knowledge landscapes. Bibliometric data of scientific publications showed a continuous bipolar distribution of these strategies, ranging from natural sciences, with highly cited publications in journals containing a large number of articles, to the social sciences, with rarely cited publications in many journals containing a small number of articles. The natural sciences seem to adapt their research strategies to landscapes with large concentrated knowledge clusters, whereas social sciences seem to have adapted to search in landscapes with many small isolated knowledge clusters. Similar bipolar distributions were obtained when comparing levels of insularity estimated by indic
Various research initiatives try to utilize the operational principles of organisms and brains to develop alternative, biologically inspired computing paradigms and artificial cognitive systems. This paper reviews key features of the standard method applied to complexity in the cognitive and brain sciences, i.e. decompositional analysis or reverse engineering. The indisputable complexity of brain and mind raise the issue of whether they can be understood by applying the standard method. Actually, recent findings in the experimental and theoretical fields, question central assumptions and hypotheses made for reverse engineering. Using the modeling relation as analyzed by Robert Rosen, the scientific analysis method itself is made a subject of discussion. It is concluded that the fundamental assumption of cognitive science, i.e. complex cognitive systems can be analyzed, understood and duplicated by reverse engineering, must be abandoned. Implications for investigations of organisms and behavior as well as for engineering artificial cognitive systems are discussed.
This White Paper summarises potential key science topics to be achieved with Thai National Radio Telescope (TNRT). The commissioning phase has started in mid 2022. The key science topics consist of "Pulsars and Fast Radio Bursts (FRBs)", "Star Forming Regions (SFRs)", "Galaxy and Active Galactic Nuclei (AGNs)", "Evolved Stars", "Radio Emission of Chemically Peculiar (CP) Stars", and "Geodesy", covering a wide range of observing frequencies in L/C/X/Ku/K/Q/W-bands (1-115 GHz). As a single-dish instrument, TNRT is a perfect tool to explore time domain astronomy with its agile observing systems and flexible operation. Due to its ideal geographical location, TNRT will significantly enhance Very Long Baseline Interferometry (VLBI) arrays, such as East Asian VLBI Network (EAVN), Australia Long Baseline Array (LBA), European VLBI Network (EVN), in particular via providing a unique coverage of the sky resulting in a better complete "uv" coverage, improving synthesized-beam and imaging quality with reducing side-lobes. This document highlights key science topics achievable with TNRT in single-dish mode and in collaboration with VLBI arrays.
Software testing relates to the process of accessing the functionality of a program against some defined specifications. To ensure conformance, test engineers often generate a set of test cases to validate against the user requirements. Owing to the growing complexity of software and its increasing diffusion into various application domains, it is no longer unusual for a software project to have testing teams in more than one location or even distributed over many continents. Owing to the intertwined dependencies of many software development activities and their geographical and temporal issues, there are potentially many overlapping test cases which can cause unwarranted redundancies across the shared modules (i.e. a test for one requirement may be covered by more than one test). In this paper, we explore the application of our newly developed hyperheuristic, called Fuzzy Inference Selection (FIS), for addressing test redundancy reduction problem. This paper presents the supplementary results for the paper : An Experimental Study of Hyper-Heuristic Selection and Acceptance Mechanism for Combinatorial t way Test Suite Generation published in Information Sciences.
Are the sciences not advancing at an ever increasing speed? We contrast this popular perspective with the view that science funding may actually see diminishing returns, at least regarding established fields. In order to stimulate a larger discussion, we investigate two exemplary cases, the linear increase in human life expectancy over the last 170 years and the advances in the reliability of numerical short and medium term weather predictions during the last 50 years. We argue that the outcome of science and technology (S&T) funding in terms of measurable results is a highly sub-linear function of the amount of resources committed. Supporting a range of small to medium size research projects, instead of a few large ones, will be, as a corollary, a more efficient use of resources for science funding agencies.
This account of the Matthew effect is another small exercise in the psychosociological analysis of the workings of science as a social institution. The initial problem is transformed by a shift in theoretical perspective. As originally identified, the Matthew effect was construed in terms of enhancement of the position of already eminent scientists who are given disproportionate credit in cases of collaboration or of independent multiple discoveries. Its significance was thus confined to its implications for the reward system of science. By shifting the angle of vision, we note other possible kinds of consequences, this time for the communication system of science. The Matthew effect may serve to heighten the visibility of contributions to science by scientists of acknowledged standing and to reduce the visibility of contributions by authors who are less well known. We examine the psychosocial conditions and mechanisms underlying this effect and find a correlation between the redundancy function of multiple discoveries and the focalizing function of eminent men of science—a function which is reinforced by the great value these men place upon finding basic problems and by their self-assurance. This self-assurance, which is partly inherent, partly the result of experiences and associations in creative scientific environments, and partly a result of later social validation of their position, encourages them to search out risky but important problems and to highlight the results of their inquiry. A macrosocial version of the Matthew principle is apparently involved in those processes of social selection that currently lead to the concentration of scientific resources and talent ( 50 ).
This study presents the development of the PsyCogMetrics AI Lab (psycogmetrics.ai), an integrated, cloud-based platform that operationalizes psychometric and cognitive-science methodologies for Large Language Model (LLM) evaluation. Framed as a three-cycle Action Design Science study, the Relevance Cycle identifies key limitations in current evaluation methods and unfulfilled stakeholder needs. The Rigor Cycle draws on kernel theories such as Popperian falsifiability, Classical Test Theory, and Cognitive Load Theory to derive deductive design objectives. The Design Cycle operationalizes these objectives through nested Build-Intervene-Evaluate loops. The study contributes a novel IT artifact, a validated design for LLM evaluation, benefiting research at the intersection of AI, psychology, cognitive science, and the social and behavioral sciences.
This study examines the role of top-tier conference publications in Hungarian computer science research. We show that the national scientometric practice, which is currently journal-oriented, diverges from international norms, creating incentive distortions in researcher evaluation. By linking multiple databases (iCore, DBLP, MTMT, MTA-ATT), we mapped Hungarian-affiliated CORE A* and A conference papers, their temporal and thematic distribution, and author trajectories. Our results indicate that, in theoretical fields, publishing at international conferences became common earlier than in applied fields. At the same time, in applied fields, successful researchers are more likely to continue their careers in foreign institutions or in industry positions. Overall, a substantial share of the already established, internationally most successful researchers are now affiliated with institutions abroad. We recommend recognizing CORE A* papers as equivalent to D1 and CORE A papers as equivalent to Q1 journals in national evaluation systems.
The exploration of planetary bodies in our Solar system and beyond relies on the processing and interpretation of large, spatio-temporally inconsistent, and heterogeneous datasets. Recent advances in machine learning (ML) provide unprecedented opportunities to address many fundamental challenges posed by these heterogeneous and hyper-dimensional datasets. This review chapter highlights innovative ML methodologies that were developed and used by NCCR PlanetS members to address three overarching challenges in (exo)planetary science. The first challenge is sequence modelling, which encompasses the intricate analysis of one-dimensional data such as time series of radial velocities and light curves, among other examples. Secondly, there is pattern recognition that involves studying correlations, leveraging convolutional neural networks for feature extraction, mapping and cross correlation among other examples., anomaly detection through variational autoencoders, and unsupervised clustering of mass spectrometric data. Lastly, there are generative models and emulation-based Bayesian analysis, which encompass the development of predictive models for planetary interior structure, employing
Solar radio emissions offer unique diagnostic insights into the solar corona. However, their dynamic and multiscale nature, along with several orders of magnitude variations in intensity, pose significant observational challenges. To date, at gigahertz frequencies, MeerKAT stands out globally with high potential of producing high-fidelity, spectroscopic snapshot images of the Sun, enabled by its dense core, high sensitivity, and broad frequency coverage. Yet, as a telescope originally designed for observing faint galactic and extragalactic sources, observing the Sun at the boresight of the telescope requires customized observing strategies and calibration methods. This work demonstrates the technical readiness of MeerKAT for solar observations at the boresight of the telescope in the UHF (580-1015 MHz) and L-band (900-1670 MHz) frequency ranges, including optimized modes, a dedicated calibration scheme, and a tailored, entirely automated calibration and imaging pipeline. The quality of solar images is validated through morphological comparisons with the solar images at other wavelengths. Several unique early science results showcase the potential of this new capability of MeerKAT.
The technology industry offers exciting and diverse career opportunities, ranging from traditional software development to emerging fields such as artificial intelligence, cybersecurity, and data science. Career fairs play a crucial role in helping Computer Science (CS) students understand the various career pathways available to them in the industry. However, limited research exists on how CS students experience and benefit from these events. Through a survey of 86 students, we investigate their motivations for attending, preparation strategies, and learning outcomes, including exposure to new career paths and technologies. We envision our findings providing valuable insights for career services professionals, educators, and industry leaders in improving the career development processes of CS students.