共找到 20 条结果
Effective instruction in tutoring requires promptly providing instructional materials that match the needs of each student (e.g., in response to questions). In this study, we introduce an agent that automatically delivers supplementary materials on demand during one-on-one tutoring sessions. Our agent uses a multimodal large language model to analyze spoken dialogue between the instructor and the student, automatically generate search queries, and retrieve relevant Web images. Evaluation experiments demonstrate that our agent reduces the average image retrieval time by 44.4 s compared to cases without support and successfully provides images of acceptable quality in 85.7% of trials. These results indicate that our agent effectively supports instructors during tutoring sessions.
In this paper we build a case for providing job completion time predictions to cloud users, similar to the delivery date of a package or arrival time of a booked ride. Our analysis reveals that providing predictability can come at the expense of performance and fairness. Existing cloud scheduling systems optimize for extreme points in the trade-off space, making them either extremely unpredictable or impractical. To address this challenge, we present PCS, a new scheduling framework that aims to provide predictability while balancing other traditional objectives. The key idea behind PCS is to use Weighted-Fair-Queueing (WFQ) and find a suitable configuration of different WFQ parameters (e.g., class weights) that meets specific goals for predictability. It uses a simulation-aided search strategy, to efficiently discover WFQ configurations that lie on the Pareto front of the trade-off space between these objectives. We implement and evaluate PCS in the context of DNN job scheduling on GPUs. Our evaluation, on a small scale GPU testbed and larger-scale simulations, shows that PCS can provide accurate completion time estimates while marginally compromising on performance and fairness.
As autonomous systems become more complex and integral in our society, the need to accurately model and safely control these systems has increased significantly. In the past decade, there has been tremendous success in using deep learning techniques to model and control systems that are difficult to model using first principles. However, providing safety assurances for such systems remains difficult, partially due to the uncertainty in the learned model. In this work, we aim to provide safety assurances for systems whose dynamics are not readily derived from first principles and, hence, are more advantageous to be learned using deep learning techniques. Given the system of interest and safety constraints, we learn an ensemble model of the system dynamics from data. Leveraging ensemble uncertainty as a measure of uncertainty in the learned dynamics model, we compute a maximal robust control invariant set, starting from which the system is guaranteed to satisfy the safety constraints under the condition that realized model uncertainties are contained in the predefined set of admissible model uncertainty. We demonstrate the effectiveness of our method using a simulated case study with
Despite the growing use of large language models (LLMs) for providing feedback, limited research has explored how to achieve high-quality feedback. This case study introduces an evaluation framework to assess different zero-shot prompt engineering methods. We varied the prompts systematically and analyzed the provided feedback on programming errors in R. The results suggest that prompts suggesting a stepwise procedure increase the precision, while omitting explicit specifications about which provided data to analyze improves error identification.
In their recent paper, Rosen, Takeyama, Tasaka, and Yamamoto constructed recurrent sequences providing a decomposition law of primes in a Galois extension. In this paper, we reconstruct their sequences via representation theory of finite groups and obtain an explicit description of the sequences.
Recent advancements in large language models, such as ChatGPT, have demonstrated significant potential to impact various aspects of human life. However, ChatGPT still faces challenges in providing reliable and accurate answers to user questions. To better understand the model's particular weaknesses in providing truthful answers, we embark an in-depth exploration of open-domain question answering. Specifically, we undertake a detailed examination of ChatGPT's failures, categorized into: comprehension, factuality, specificity, and inference. We further pinpoint factuality as the most contributing failure and identify two critical abilities associated with factuality: knowledge memorization and knowledge recall. Through experiments focusing on factuality, we propose several potential enhancement strategies. Our findings suggest that augmenting the model with granular external knowledge and cues for knowledge recall can enhance the model's factuality in answering questions.
An emerging definition of fairness in machine learning requires that models are oblivious to demographic user information, e.g., a user's gender or age should not influence the model. Personalized recommender systems are particularly prone to violating this definition through their explicit user focus and user modelling. Explicit user modelling is also an aspect that makes many recommender systems incapable of providing hitherto unseen users with recommendations. We propose novel approaches for mitigating discrimination in Variational Autoencoder-based recommender systems by limiting the encoding of demographic information. The approaches are capable of, and evaluated on, providing users that are not represented in the training data with fair recommendations.
Smart home technology is part of our everyday lives, and this technology is fast-evolving compared to other technologies. The user's feedback is gathered in this paper by conducting expert interviews on how collecting the feedback from the smart home devices will be helpful to improve the devices. We are yet to know about the feedback system of the smart home devices and how provided feedback will support increasing the devices' requirements. Today, we present our analysis from our exploratory interview method with the student of a certain group, and we try to study the attitude of providing feedback. The results suggested that the users are ready to give their feedback very actively to better their usage as every user has their own needs to fulfill.
For effective collaboration between humans and intelligent agents that employ machine learning for decision-making, humans must understand what agents can and cannot do to avoid over/under-reliance. A solution to this problem is adjusting human reliance through communication using reliance calibration cues (RCCs) to help humans assess agents' capabilities. Previous studies typically attempted to calibrate reliance by continuously presenting RCCs, and when an agent should provide RCCs remains an open question. To answer this, we propose Pred-RC, a method for selectively providing RCCs. Pred-RC uses a cognitive reliance model to predict whether a human will assign a task to an agent. By comparing the prediction results for both cases with and without an RCC, Pred-RC evaluates the influence of the RCC on human reliance. We tested Pred-RC in a human-AI collaboration task and found that it can successfully calibrate human reliance with a reduced number of RCCs.
Recent research in the social sciences has identified situations in which small changes in the way that information is provided to consumers can have large aggregate effects on behavior. This has been promoted in popular media in areas of public health and wellness, but its application to other areas has not been broadly studied. This paper presents a simple model which expresses the effect of providing commuters with carefully-curated information regarding aggregate traffic "slowdowns" on the various roads in a transportation network. Much of the work on providing information to commuters focuses specifically on travel-time information. However, the model in the present paper allows a system planner to provide slowdown information as well; that is, commuters are additionally told how much slower each route is as compared to its uncongested state. We show that providing this additional information can improve equilibrium routing efficiency when compared to the case when commuters are only given information about travel time, but that these improvements in congestion are not universal. That is, transportation networks exist on which any provision of slowdown information can harm equ
The limit order book mechanism has been the core trading mechanism of the modern financial market. In the cryptocurrency market, centralized exchanges also adopt this limit order book mechanism and a centralized matching engine dynamically connects the traders to the orders of market makers. Recently, decentralized exchanges have been introduced and received considerable attention in the cryptocurrency community. A decentralized exchange typically adopts an automated market maker, which algorithmically arbitrates the trades between liquidity providers and traders through a pool of crypto assets. Meanwhile, the liquidity of the exchange is the most important factor when traders choose an exchange. However, the amount of liquidity provided by the liquidity providers in decentralized exchanges is insufficient when compared to centralized exchanges. This is because the liquidity providers in decentralized exchanges suffer from the risk of divergence loss inherent to the automated market making system. To this end, we introduce a new concept called margin liquidity and leverage this concept to propose a highly profitable margin liquidity-providing position. Then, we extend this margin l
Residential Thermostatically Controlled Loads (TCLs) such as Air Conditioners (ACs), heat pumps, water heaters, and refrigerators have an enormous thermal storage potential for providing regulation reserve to the grid. In this paper, we study the potential resource and economic analysis of TCLs providing frequency regulation service. In particular, we show that the potential resource of TCLs in California is more than enough for both current and predicted near-future regulation requirements for the California power system. Moreover, we estimate the cost and revenue of TCLs, discuss the qualification requirements, recommended policy changes, and participation incentive methods, and compare TCLs with other energy storage technologies. We show that TCLs are potentially more cost-effective than other energy storage technologies such as flywheels, Li-ion, advanced lead acid, and Zinc Bromide batteries.
The goal of this paper is to investigate the importance of providing visual "big pictures" in the teaching of economics. The plurality and variety of concepts, variables, diagrams, and models involved in economics can be a source of confusion for many economics students. However, reviewing the existing literature on the importance of providing visual "big pictures" in the process of learning suggests that furnishing students with a visual "big picture" that illustrates the ways through which those numerous, diverse concepts are connected to each other could be an effective solution to clear up the mentioned mental chaos. As a practical example, this paper introduces a "big picture" that can be used as a good resource in intermediate macroeconomics classes. This figure presents twenty-seven commonly-discussed macroeconomic diagrams in the intermediate macroeconomics course, and gives little detail on some of these diagrams, aiming at helping students to get the whole picture at once on a single piece of paper. This macroeconomics big picture mostly focuses on the routes through which common diagrams in macroeconomics are connected to each other, and finally introduces the general ma
Providing public access to unprotected digital data can pose a threat of unwanted disclosing the restricted information. The problem of protecting such information can be divided into two main subclasses, namely, individual and group data anonymity. By group anonymity we define protecting important data patterns, distributions, and collective features which cannot be determined through analyzing individual records only. An effective and comparatively simple way of solving group anonymity problem is doubtlessly applying wavelet transform. It's easy-to-implement, powerful enough, and might produce acceptable results if used properly. In the paper, we present a novel method of using wavelet transform for providing group anonymity; it is gained through redistributing wavelet approximation values, along with simultaneous fixing data mean value and leaving wavelet details unchanged (or proportionally altering them). Moreover, we provide a comprehensive example to illustrate the method.
Recent enhancements have been proposed to the ATM Unspecified Bit Rate (UBR) service that guarantee a minimum rate at the frame level to the UBR VCs. These enhancements have been called Guaranteed Frame Rate (GFR). In this paper, we discuss the motivation, design and implementation issues for GFR. We present the design of buffer management and policing mechanisms to implement GFR. We study the effects of policing, per-VC buffer allocation, and per-VC queuing on providing GFR to TCP/IP traffic. We conclude that per-VC scheduling is necessary to provide minimum rate guarantees to TCP traffic. We examine the role of frame tagging in the presence of scheduling and buffer management for providing minumum rate guarantees. The use of GFR to support the Internet Controlled Load Service is also discussed.
High-quality computer science education is limited by the difficulty of providing instructor feedback to students at scale. While this feedback could in principle be automated, supervised approaches to predicting the correct feedback are bottlenecked by the intractability of annotating large quantities of student code. In this paper, we instead frame the problem of providing feedback as few-shot classification, where a meta-learner adapts to give feedback to student code on a new programming question from just a few examples annotated by instructors. Because data for meta-training is limited, we propose a number of amendments to the typical few-shot learning framework, including task augmentation to create synthetic tasks, and additional side information to build stronger priors about each task. These additions are combined with a transformer architecture to embed discrete sequences (e.g. code) to a prototypical representation of a feedback class label. On a suite of few-shot natural language processing tasks, we match or outperform state-of-the-art performance. Then, on a collection of student solutions to exam questions from an introductory university course, we show that our app
Given a point query Q in multi-dimensional space, K-Nearest Neighbor (KNN) queries return the K closest answers according to given distance metric in the database with respect to Q. In this scenario, it is possible that a majority of the answers may be very similar to some other, especially when the data has clusters. For a variety of applications, such homogeneous result sets may not add value to the user. In this paper, we consider the problem of providing diversity in the results of KNN queries, that is, to produce the closest result set such that each answer is sufficiently different from the rest. We first propose a user-tunable definition of diversity, and then present an algorithm, called MOTLEY, for producing a diverse result set as per this definition. Through a detailed experimental evaluation on real and synthetic data, we show that MOTLEY can produce diverse result sets by reading only a small fraction of the tuples in the database. Further, it imposes no additional overhead on the evaluation of traditional KNN queries, thereby providing a seamless interface between diversity and distance.
Dynamic Information Flow Tracking (DIFT) is a technique to track potential security vulnerabilities in software and hardware systems at run time. The last fifteen years have seen a lot of research work on DIFT, including both hardware-based and software-based implementations for different types of processor architectures. This survey briefly reviews some hardware architectures that provide DIFT support. Starting from introducing different approaches for hardware based DIFT, this survey focuses on integrated/in-core architectures. Protection schemes, including tagging system, tag propagation, and tag checking for each architecture will be discussed. The survey is organized in such a way that it illustrates the evolution of integrated DIFT architectures, each architecture tries to improve the precious proposed architectures generality/versatility weaknesses. However, improving security while providing generality and versatility is kind of trade-offs. This survey compares the architectures from different aspects to show the trade-offs clearer.
The demand for a broadband wireless connection is nowadays no longer limited to stationary situations, but also required while traveling. Therefore, there exist combined efforts to provide wireless access also on High Speed Trains (HSTs), in order to add to the attractiveness of this means for transportation. Installing an additional relay on the train, to facilitate the communication, is an approach that has already been extensively discussed in literature. The possibility of a direct communication between the base station and the passenger has been neglected until now, despite it having numerous advantages. Therefore, a comparison between these two opposing approaches is presented in this paper, accompanied by a detailed discussion of the related aspects. The focus is set on the feasibility of the direct link approach, including simulation results. Further technical issues are also presented, especially regarding the interdependencies of the different aspects and providing a view of mobile- and train-operators on the topic.
An LLM is stable if it reaches the same conclusion when asked the identical question multiple times. We find leading LLMs like gpt-4o, claude-3.5, and gemini-1.5 are unstable when providing answers to hard legal questions, even when made as deterministic as possible by setting temperature to 0. We curate and release a novel dataset of 500 legal questions distilled from real cases, involving two parties, with facts, competing legal arguments, and the question of which party should prevail. When provided the exact same question, we observe that LLMs sometimes say one party should win, while other times saying the other party should win. This instability has implications for the increasing numbers of legal AI products, legal processes, and lawyers relying on these LLMs.