We consider a logical puzzle which we call double pouring problem, which was original defined for $k=3$ vessels. We generalize this definition to $k \ge 2 $ as follows. Each of the $k$ vessels contains an integer amount of water, called its value, where the values are $a_i$ for $i=1,2,\dots,k$ and the sum of values is $n$. A pouring step means pouring water from one vessel with value $a_i$ to another vessel with value $a_j$, where $ 1 \le i ot= j \le k $ and $a_i \le a_j $. After this pouring step the first vessel has value $2a_i$ and the second one value $a_j-a_i$. Now the pouring problem is to find as few pourings steps as possible to empty at least one vessel, or to show that such an emptying is not possible (which is possible only in the case $k=2$). For $k=2$ each pouring step is unique. We give a necessary and sufficient condition, when for a given $ (a_1,a_2)$ with $a_1+a_2=n$ the pouring problem is solvable. For $k=3$ we improve the upper bound of the pouring problem for some special cases. For $k \ge 4 $ we extend the known lower bound for $k=3$ and improve the known upper bound $\mathcal{O}((\log n)^2)$ for $k=3$ to $\mathcal{O}(\log n\log\log n)$. Finally, for $k \ge 3$
Transparent liquid manipulation in robotic pouring remains challenging for perception systems: specular/refraction effects and lighting variability degrade visual cues, undermining reliable level estimation. To address this challenge, we introduce RadarEye, a real-time mmWave radar signal processing pipeline for robust liquid level estimation and tracking during the whole pouring process. RadarEye integrates (i) a high-resolution range-angle beamforming module for liquid level sensing and (ii) a physics-informed mid-pour tracker that suppresses multipath to maintain lock on the liquid surface despite stream-induced clutter and source container reflections. The pipeline delivers sub-millisecond latency. In real-robot water-pouring experiments, RadarEye achieves a 0.35 cm median absolute height error at 0.62 ms per update, substantially outperforming vision and ultrasound baselines.
We study the connection between audio-visual observations and the underlying physics of a mundane yet intriguing everyday activity: pouring liquids. Given only the sound of liquid pouring into a container, our objective is to automatically infer physical properties such as the liquid level, the shape and size of the container, the pouring rate and the time to fill. To this end, we: (i) show in theory that these properties can be determined from the fundamental frequency (pitch); (ii) train a pitch detection model with supervision from simulated data and visual data with a physics-inspired objective; (iii) introduce a new large dataset of real pouring videos for a systematic study; (iv) show that the trained model can indeed infer these physical properties for real data; and finally, (v) we demonstrate strong generalization to various container shapes, other datasets, and in-the-wild YouTube videos. Our work presents a keen understanding of a narrow yet rich problem at the intersection of acoustics, physics, and learning. It opens up applications to enhance multisensory perception in robotic pouring.
This paper explores how deliberate mutations of reward function in reinforcement learning can produce diversified skill variations in robotic manipulation tasks, examined with a liquid pouring use case. To this end, we developed a new reward function mutation framework that is based on applying Gaussian noise to the weights of the different terms in the reward function. Inspired by the cost-benefit tradeoff model from human motor control, we designed the reward function with the following key terms: accuracy, time, and effort. The study was performed in a simulation environment created in NVIDIA Isaac Sim, and the setup included Franka Emika Panda robotic arm holding a glass with a liquid that needed to be poured into a container. The reinforcement learning algorithm was based on Proximal Policy Optimization. We systematically explored how different configurations of mutated weights in the rewards function would affect the learned policy. The resulting policies exhibit a wide range of behaviours: from variations in execution of the originally intended pouring task to novel skills useful for unexpected tasks, such as container rim cleaning, liquid mixing, and watering. This approach
In this paper, we focus on the challenging perception problem in robotic pouring. Most of the existing approaches either leverage visual or haptic information. However, these techniques may suffer from poor generalization performances on opaque containers or concerning measuring precision. To tackle these drawbacks, we propose to make use of audio vibration sensing and design a deep neural network PouringNet to predict the liquid height from the audio fragment during the robotic pouring task. PouringNet is trained on our collected real-world pouring dataset with multimodal sensing data, which contains more than 3000 recordings of audio, force feedback, video and trajectory data of the human hand that performs the pouring task. Each record represents a complete pouring procedure. We conduct several evaluations on PouringNet with our dataset and robotic hardware. The results demonstrate that our PouringNet generalizes well across different liquid containers, positions of the audio receiver, initial liquid heights and types of liquid, and facilitates a more robust and accurate audio-based perception for robotic pouring.
In everyday life, we perform tasks (e.g., cooking or cleaning) that involve a large variety of objects and goals. When confronted with an unexpected or unwanted outcome, we take corrective actions and try again until achieving the desired result. The reasoning performed to identify a cause of the observed outcome and to select an appropriate corrective action is a crucial aspect of human reasoning for successful task execution. Central to this reasoning is the assumption that a factor is responsible for producing the observed outcome. In this paper, we investigate the use of probabilistic actual causation to determine whether a factor is the cause of an observed undesired outcome. Furthermore, we show how the actual causation probabilities can be used to find alternative actions to change the outcome. We apply the probabilistic actual causation analysis to a robot pouring task. When spillage occurs, the analysis indicates whether a task parameter is the cause and how it should be changed to avoid spillage. The analysis requires a causal graph of the task and the corresponding conditional probability distributions. To fulfill these requirements, we perform a complete causal modeling
Pouring is one of the most commonly executed tasks in humans' daily lives, whose accuracy is affected by multiple factors, including the type of material to be poured and the geometry of the source and receiving containers. In this work, we propose a self-supervised learning approach that learns the pouring dynamics, pouring motion, and outcomes from unsupervised demonstrations for accurate pouring. The learned pouring model is then generalized by self-supervised practicing to different conditions such as using unaccustomed pouring cups. We have evaluated the proposed approach first with one container from the training set and four new but similar containers. The proposed approach achieved better pouring accuracy than a regular human with a similar pouring speed for all five cups. Both the accuracy and pouring speed outperform state-of-the-art works. We have also evaluated the proposed self-supervised generalization approach using unaccustomed containers that are far different from the ones in the training set. The self-supervised generalization reduces the pouring error of the unaccustomed containers to the desired accuracy level.
With the growing emphasis on the development and integration of service robots within household environments, we will need to endow robots with the ability to reliably pour a variety of liquids. However, liquid handling and pouring is a challenging task due to the complex dynamics and varying properties of different liquids, the exacting precision required to prevent spills and ensure accurate pouring, and the necessity for robots to adapt seamlessly to a multitude of containers in real-world scenarios. In response to these challenges, we propose a novel autonomous robotics pipeline that empowers robots to execute precision pouring tasks, encompassing both carbonated and non-carbonated liquids, as well as opaque and transparent liquids, into a variety of transparent containers. Our proposed approach maximizes the potential of RGB input alone, achieving zero-shot capability by harnessing existing pre-trained vision segmentation models. This eliminates the need for additional data collection, manual image annotations, or extensive training. Furthermore, our work integrates ChatGPT, facilitating seamless interaction between individuals without prior expertise in robotics and our pouri
Liquids and granular media are pervasive throughout human environments, yet remain particularly challenging for robots to sense and manipulate precisely. In this work, we present a systematic approach at integrating capacitive sensing within robotic end effectors to enable robust sensing and precise manipulation of liquids and granular media. We introduce the parallel-jaw RoboCAP Gripper with embedded capacitive sensing arrays that enable a robot to directly sense the materials and dynamics of liquids inside of diverse containers, including some visually opaque. When coupled with model-based control, we demonstrate that the proposed system enables a robotic manipulator to achieve state-of-the-art precision pouring accuracy for a range of substances with varying dynamics properties. Code, designs, and build details are available on the project website.
Liquid perception is critical for robotic pouring tasks. It usually requires the robust visual detection of flowing liquid. However, while recent works have shown promising results in liquid perception, they typically require labeled data for model training, a process that is both time-consuming and reliant on human labor. To this end, this paper proposes a simple yet effective framework PourIt!, to serve as a tool for robotic pouring tasks. We design a simple data collection pipeline that only needs image-level labels to reduce the reliance on tedious pixel-wise annotations. Then, a binary classification model is trained to generate Class Activation Map (CAM) that focuses on the visual difference between these two kinds of collected data, i.e., the existence of liquid drop or not. We also devise a feature contrast strategy to improve the quality of the CAM, thus entirely and tightly covering the actual liquid regions. Then, the container pose is further utilized to facilitate the 3D point cloud recovery of the detected liquid region. Finally, the liquid-to-container distance is calculated for visual closed-loop control of the physical robot. To validate the effectiveness of our pr
As service robots begin to be deployed to assist humans, it is important for them to be able to perform a skill as ubiquitous as pouring. Specifically, we focus on the task of pouring an exact amount of water without any environmental instrumentation, that is, using only the robot's own sensors to perform this task in a general way robustly. In our approach we use a simple PID controller which uses the measured change in weight of the held container to supervise the pour. Unlike previous methods which use specialized force-torque sensors at the robot wrist, we use our robot joint torque sensors and investigate the added benefit of tactile sensors at the fingertips. We train three estimators from data which regress the poured weight out of the source container and show that we can accurately pour within 10 ml of the target on average while being robust enough to pour at novel locations and with different grasps on the source container.
Liquid state estimation is important for robotics tasks such as pouring; however, estimating the state of transparent liquids is a challenging problem. We propose a novel segmentation pipeline that can segment transparent liquids such as water from a static, RGB image without requiring any manual annotations or heating of the liquid for training. Instead, we use a generative model that is capable of translating images of colored liquids into synthetically generated transparent liquid images, trained only on an unpaired dataset of colored and transparent liquid images. Segmentation labels of colored liquids are obtained automatically using background subtraction. Our experiments show that we are able to accurately predict a segmentation mask for transparent liquids without requiring any manual annotations. We demonstrate the utility of transparent liquid segmentation in a robotic pouring task that controls pouring by perceiving the liquid height in a transparent cup. Accompanying video and supplementary materials can be found
Cooking robots have long been desired by the commercial market, while the technical challenge is still significant. A major difficulty comes from the demand of perceiving and handling liquid with different properties. This paper presents a robot system that mixes batter and makes pancakes out of it, where understanding and handling the viscous liquid is an essential component. The system integrates Haptic Sensing and control algorithms to autonomously stir flour and water to achieve the desired batter uniformity, estimate the batter's properties such as the water-flour ratio and liquid level, as well as perform precise manipulations to pour the batter into any specified shape. Experimental results show the system's capability to always produce batter of desired uniformity, estimate water-flour ratio and liquid level precisely, and accurately pour it into complex shapes. This research showcases the potential for robots to assist in kitchens and step towards commercial culinary automation.
Pouring is the second most frequently executed motion in cooking scenarios. In this work, we present our system of accurate pouring that generates the angular velocities of the source container using recurrent neural networks. We collected demonstrations of human pouring water. We made a physical system on which the velocities of the source container were generated at each time step and executed by a motor. We tested our system on pouring water from containers that are not used for training and achieved an error of as low as 4 milliliters. We also used the system to pour oil and syrup. The accuracy achieved with oil is slightly lower than but comparable with that of water.
Robust and accurate estimation of liquid height lies as an essential part of pouring tasks for service robots. However, vision-based methods often fail in occluded conditions while audio-based methods cannot work well in a noisy environment. We instead propose a multimodal pouring network (MP-Net) that is able to robustly predict liquid height by conditioning on both audition and haptics input. MP-Net is trained on a self-collected multimodal pouring dataset. This dataset contains 300 robot pouring recordings with audio and force/torque measurements for three types of target containers. We also augment the audio data by inserting robot noise. We evaluated MP-Net on our collected dataset and a wide variety of robot experiments. Both network training results and robot experiments demonstrate that MP-Net is robust against noise and changes to the task and environment. Moreover, we further combine the predicted height and force data to estimate the shape of the target container.
Human does their daily activity and cooking by teaching and imitating with the help of their vision and understanding of the difference between materials. Teaching a robot to do coking and daily work is difficult because of variation in environment, handling objects at different states etc. Pouring is a simple human daily life activity. In this paper, an approach to get pouring sequences were analyzed for determining the velocity of pouring and weight of the container. Then recurrent neural network (RNN) was used to build a neural network to learn that complex sequence and predict for unseen pouring sequences. Dynamic time warping (DTW) was used to evaluate the prediction performance of the trained model.
To accurately pour drinks into various containers is an essential skill for service robots. However, drink pouring is a dynamic process and difficult to model. Traditional deep imitation learning techniques for implementing autonomous robotic pouring have an inherent black-box effect and require a large amount of demonstration data for model training. To address these issues, an Explainable Hierarchical Imitation Learning (EHIL) method is proposed in this paper such that a robot can learn high-level general knowledge and execute low-level actions across multiple drink pouring scenarios. Moreover, with EHIL, a logical graph can be constructed for task execution, through which the decision-making process for action generation can be made explainable to users and the causes of failure can be traced out. Based on the logical graph, the framework is manipulable to achieve different targets while the adaptability to unseen scenarios can be achieved in an explainable manner. A series of experiments have been conducted to verify the effectiveness of the proposed method. Results indicate that EHIL outperforms the traditional behavior cloning method in terms of success rate, adaptability, ma
One of the most commonly performed manipulation in a human's daily life is pouring. Many factors have an effect on target accuracy, including pouring velocity, rotation angle, geometric of the source, and the receiving containers. This paper presents an approach to increase the repeatability and accuracy of the robotic manipulator by estimating the change in the amount of water of the pouring cup to a sequence of pouring actions using multiple layers of the deep recurrent neural network, especially gated recurrent units (GRU). The proposed GRU model achieved a validation mean squared error as low as 1e-4 (lbf) for the predicted value of weight f(t). This paper contains a comprehensive evaluation and analysis of numerous experiments with various designs of recurrent neural networks and hyperparameters fine-tuning.
Humans have the amazing ability to perform very subtle manipulation task using a closed-loop control system with imprecise mechanics (i.e., our body parts) but rich sensory information (e.g., vision, tactile, etc.). In the closed-loop system, the ability to monitor the state of the task via rich sensory information is important but often less studied. In this work, we take liquid pouring as a concrete example and aim at learning to continuously monitor whether liquid pouring is successful (e.g., no spilling) or not via rich sensory inputs. We mimic humans' rich sensories using synchronized observation from a chest-mounted camera and a wrist-mounted IMU sensor. Given many success and failure demonstrations of liquid pouring, we train a hierarchical LSTM with late fusion for monitoring. To improve the robustness of the system, we propose two auxiliary tasks during training: inferring (1) the initial state of containers and (2) forecasting the one-step future 3D trajectory of the hand with an adversarial training procedure. These tasks encourage our method to learn representation sensitive to container states and how objects are manipulated in 3D. With these novel components, our meth
Empirical studies have demonstrated that humans possess the remarkable capacity to distinguish whether a glass of water is hot or cold solely by the sound of pouring it. However, the underlying physical mechanisms governing the disparities in the acoustic signatures of hot versus cold water remain to be deciphered. In this paper, we conducted a series of experiments to extract the intrinsic features of pouring sounds at contrasting temperatures. The results of our spectral analysis revealed that the sound of pouring hot water exhibited more pronounced low-frequency components and diminished high-frequency components relative to cold water. High-speed photographic evidence elucidated that pouring hot water could generate larger air bubbles in greater abundance. We conjecture that the Minnaert resonance arising from these larger entrained bubbles in hot water produces a lower-frequency acoustic signature, thereby constituting the foundational mechanistic explanation for the auditory distinction between pouring hot and cold water.