共找到 20 条结果
Surgical tool detection is a fundamental task for understanding egocentric open surgery videos. However, detecting surgical tools presents significant challenges due to their highly imbalanced class distribution, similar shapes and similar textures, and heavy occlusion. The lack of a comprehensive large-scale dataset compounds these challenges. In this paper, we introduce EgoSurgery-Tool, an extension of the existing EgoSurgery-Phase dataset, which contains real open surgery videos captured using an egocentric camera attached to the surgeon's head, along with phase annotations. EgoSurgery-Tool has been densely annotated with surgical tools and comprises over 49K surgical tool bounding boxes across 15 categories, constituting a large-scale surgical tool detection dataset. EgoSurgery-Tool also provides annotations for hand detection with over 46K hand-bounding boxes, capturing hand-object interactions that are crucial for understanding activities in egocentric open surgery. EgoSurgery-Tool is superior to existing datasets due to its larger scale, greater variety of surgical tools, more annotations, and denser scenes. We conduct a comprehensive analysis of EgoSurgery-Tool using nine p
Surgical phase recognition has gained significant attention due to its potential to offer solutions to numerous demands of the modern operating room. However, most existing methods concentrate on minimally invasive surgery (MIS), leaving surgical phase recognition for open surgery understudied. This discrepancy is primarily attributed to the scarcity of publicly available open surgery video datasets for surgical phase recognition. To address this issue, we introduce a new egocentric open surgery video dataset for phase recognition, named EgoSurgery-Phase. This dataset comprises 15 hours of real open surgery videos spanning 9 distinct surgical phases all captured using an egocentric camera attached to the surgeon's head. In addition to video, the EgoSurgery-Phase offers eye gaze. As far as we know, it is the first real open surgery video dataset for surgical phase recognition publicly available. Furthermore, inspired by the notable success of masked autoencoders (MAEs) in video understanding tasks (e.g., action recognition), we propose a gaze-guided masked autoencoder (GGMAE). Considering the regions where surgeons' gaze focuses are often critical for surgical phase recognition (e.g
We introduce SurgAtlas, the largest surgical video-language dataset to date, comprising 15,291 videos (2,391 hours) spanning 18 surgical specialties and over 5,000 procedure types, sourced entirely from publicly available YouTube content. SurgAtlas is also the first surgical video-language dataset to include open surgery at scale, with 6,182 open procedure videos alongside over 9,000 minimally invasive recordings, and the first to establish standardized benchmarks for open-surgery video understanding. We additionally provide an expert-validated subset with verified visual question-answer pairs across diverse open and minimally invasive procedures, serving as a clinically grounded benchmark for surgical reasoning. Compared with existing surgical video-language datasets, SurgAtlas provides one of the most diverse annotation schemas, combining segment-level captions, step- and phase-level descriptions, video-level surgical descriptions, and reasoning-oriented question-answer pairs organized within a hierarchical taxonomy. These annotations are constructed through an automated multi-tier pipeline with LLM-based enrichment and a staged VQA generation framework with explicit groundedness
Every year approximately 234 million major surgeries are performed, leading to plentiful, highly diverse data. This is accompanied by a matching number of novel algorithms for the surgical domain. To garner all benefits of surgical data science it is necessary to have an unambiguous, shared understanding of algorithms and data. This includes inputs and outputs of algorithms and thus their function, but also the semantic content, i.e. meaning of data such as patient parameters. We therefore propose the establishment of a new ontology for data and algorithms in surgical data science. Such an ontology can be used to provide common data sets for the community, encouraging sharing of knowledge and comparison of algorithms on common data. We hold that this is a necessary foundation towards new methods for applications such as semantic-based content retrieval and similarity measures and that it is overall vital for the future of surgical data science.
Contribution: This article analyzes the learning effectiveness of a virtual educational escape room for teaching software engineering and compares this activity with traditional teaching through a randomized controlled trial. Background: Educational escape rooms have been used across a wide variety of disciplines at all levels of education and they are becoming increasingly popular among teachers. Nevertheless, there is a clear general need for more robust empirical evidence on the learning effectiveness of these novel activities and, particularly, on their application in software engineering education. Research Questions: Is game-based learning using educational escape rooms more effective than traditional lectures for teaching software engineering? What are the perceptions of software engineering students toward game-based learning using educational escape rooms? Methodology: The study presented in this article is a randomized controlled trial with a pre-and post-test design that was completed by a total of 326 software engineering students. The 164 students belonging to the experimental group learned software modeling by playing an educational escape room whereas the 162 student
Astrobiology is the field of science devoted to searching for life elsewhere in the Universe. It is inherently interdisciplinary, integrating results from multiple fields of science, and in this respect has strong synergies with 'big history'. I argue that big history and astrobiology are both acting to widen human perspectives in intellectually and socially beneficial directions, especially by enhancing public awareness of cosmic and evolutionary worldviews. I will further argue that these perspectives have important implications for the social and political organisation of humanity, including the eventual political unification of our planet. Astrobiology and big history are also concerned with the future of humanity, and I will argue that this future will be culturally and intellectually enriched if it includes the exploration of the universe around us.
Despite the availability of computer-aided simulators and recorded videos of surgical procedures, junior residents still heavily rely on experts to answer their queries. However, expert surgeons are often overloaded with clinical and academic workloads and limit their time in answering. For this purpose, we develop a surgical question-answering system to facilitate robot-assisted surgical scene and activity understanding from recorded videos. Most of the existing VQA methods require an object detector and regions based feature extractor to extract visual features and fuse them with the embedded text of the question for answer generation. However, (1) surgical object detection model is scarce due to smaller datasets and lack of bounding box annotation; (2) current fusion strategy of heterogeneous modalities like text and image is naive; (3) the localized answering is missing, which is crucial in complex surgical scenarios. In this paper, we propose Visual Question Localized-Answering in Robotic Surgery (Surgical-VQLA) to localize the specific surgical area during the answer prediction. To deal with the fusion of the heterogeneous modalities, we design gated vision-language embedding
Objective: Integrating EHR data with other resources is essential in rare disease research due to low disease prevalence. Such integration is dependent on the alignment of ontologies used for data annotation. The International Classification of Diseases (ICD) is used to annotate clinical diagnoses; the Human Phenotype Ontology (HPO) to annotate phenotypes. Although these ontologies overlap in biomedical entities described, the extent to which they are interoperable is unknown. We investigate how well aligned these ontologies are and whether such alignments facilitate EHR data integration. Materials and Methods: We conducted an empirical analysis of the coverage of mappings between ICD and HPO. We interpret this mapping coverage as a proxy for how easily clinical data can be integrated with research ontologies such as HPO. We quantify how exhaustively ICD codes are mapped to HPO by analyzing mappings in the UMLS Metathesaurus. We analyze the proportion of ICD codes mapped to HPO within a real-world EHR dataset. Results and Discussion: Our analysis revealed that only 2.2% of ICD codes have direct mappings to HPO in UMLS. Within our EHR dataset, less than 50% of ICD codes have mapping
This paper looks at the increasing popularity of massive open and online courses (MOOCs) and open educational resources (OERs) offered in Singapore. Despite being a relatively new phenomenon, the Singapore government has collaborated with different organizations to improve the quality and accessibility of MOOCs, and many institutions of higher learning (IHLs) are spearheading efforts to improve OERs to facilitate greater public access to educational resources. It will also explore the benefits and potential problems that MOOCs and OERs face. For example, both MOOCs and OERs are able to lower the costs of university-level education and increase public access to such courses. They also provide skills and job training for members of the public as well as encourage lifelong learning. However, both MOOCs and OERs may not be sustainable in the long run, as the financial gains of both may not be able to cover the costs of mounting them. Each system also has its own set of problems. For example, formal structures to guarantee the quality of MOOCs offered remain lacking. MOOCs also tend to have low completion rates and there have been issues regarding plagiarism with the use of MOOCs as lea
Robot-assisted surgery has become progressively more and more popular due to its clinical advantages. In the meanwhile, the artificial intelligence and augmented reality in robotic surgery are developing rapidly and receive lots of attention. However, current methods have not discussed the coherent integration of AI and AR in robotic surgery. In this paper, we develop a novel system by seamlessly merging artificial intelligence module and augmented reality visualization to automatically generate the surgical guidance for robotic surgery education. Specifically, we first leverage reinforcement leaning to learn from expert demonstration and then generate 3D guidance trajectory, providing prior context information of the surgical procedure. Along with other information such as text hint, the 3D trajectory is then overlaid in the stereo view of dVRK, where the user can perceive the 3D guidance and learn the procedure. The proposed system is evaluated through a preliminary experiment on surgical education task peg-transfer, which proves its feasibility and potential as the next generation of robot-assisted surgery education solution.
Medical visual question answering (VQA) bridges the gap between visual information and clinical decision-making, enabling doctors to extract understanding from clinical images and videos. In particular, surgical VQA can enhance the interpretation of surgical data, aiding in accurate diagnoses, effective education, and clinical interventions. However, the inability of VQA models to visually indicate the regions of interest corresponding to the given questions results in incomplete comprehension of the surgical scene. To tackle this, we propose the surgical visual question localized-answering (VQLA) for precise and context-aware responses to specific queries regarding surgical images. Furthermore, to address the strong demand for safety in surgical scenarios and potential corruptions in image acquisition and transmission, we propose a novel approach called Calibrated Co-Attention Gated Vision-Language (C$^2$G-ViL) embedding to integrate and align multimodal information effectively. Additionally, we leverage the adversarial sample-based contrastive learning strategy to boost our performance and robustness. We also extend our EndoVis-18-VQLA and EndoVis-17-VQLA datasets to broaden the
A previous study of symmetric collisions of massive nuclei has shown that current models of multi-nucleon transfer (MNT) reactions do not adequately describe the transfer product yields. To gain further insight into this problem, we have measured the yields of MNT products in the interaction of 977 (E/A = 4.79 MeV) and 1143 MeV (E/A = 5.60 MeV) $^{204}$Hg with $^{208}$Pb. We find that the yield of multi-nucleon transfer products are similar in these two reactions and are substantially lower than those observed in the reaction of 1257 MeV (E/A = 6.16 MeV) $^{204}$Hg + $^{198}$Pt. We compare our measurements with the predictions of the GRAZING-F, di-nuclear systems (DNS) and improved quantum molecular dynamics (ImQMD) models. For the observed isotopes of the elements Au, Hg, Tl, Pb and Bi, the measured values of the MNT cross sections are orders of magnitude larger than the predicted values. Furthermore, the various models predict the formation of nuclides near the N=126 shell, which are not observed.
Surgical procedures are often not "standardised" (i.e., defined in a unique and unambiguous way), but rather exist as implicit knowledge in the minds of the surgeon and the surgical team. This reliance extends to pre-surgery planning and effective communication during the procedure. We introduce a novel approach for the formal and automated analysis of surgical procedures, which we model as security ceremonies, leveraging well-established techniques developed for the analysis of such ceremonies. Mutations of a procedure are used to model variants and mistakes that members of the surgical team might make. Our approach allows us to automatically identify violations of the intended properties of a surgical procedure.
Phosphorus (P) is considered to be one of the key elements for life, making it an important element to look for in the abundance analysis of spectra of stellar systems. Yet, there exists only a handful of spectroscopic studies to estimate the P abundances and investigate its trend across a range of metallicities. We have observed full HK band spectra at a spectral resolving power of R=45,000 with IGRINS instrument. Abundances are determined using SME in combination with 1D MARCS stellar atmosphere models. The investigated sample of stars have reliable stellar parameters estimated using optical FIES spectra (GILD; Jönsson et al. in prep.). In order to determine the P abundances from the 16482.92 Angstrom P line, we take special care of the CO($ν=7-4$) blend. We determine the C, N, O abundances from atomic carbon and a range of non-blended molecular lines (CO, CN, OH) which are aplenty in the H band region of K giant stars, assuring an appropriate modelling of the blending CO($ν=7-4$) line. We present [P/Fe] vs [Fe/H] trend for 38 K giant stars in the metallicity range of -1.2 dex $<$ [Fe/H] $<$ 0.4 dex. We find that our trend matches well with the compiled literature sample of
Quaternions, discovered by Sir William Rowan Hamilton in the 19th century, are a significant extension of complex numbers and a profound tool for understanding three-dimensional rotations. This work explores the quaternion's history, algebraic structure, and educational implications. We begin with the historical context of quaternions, highlighting Hamilton's contributions and the development of quaternion theory. This sets the stage for a detailed examination of quaternion algebra, including their representations as complex numbers, matrices, and non-commutative nature. Our research presents some advancements compared to previous educational studies by thoroughly examining quaternion applications in rotations. We differentiate between left and right rotations through detailed numerical examples and propose a general approach to rotations via a theorem, clearly defining the associated morphism. This framework enhances the understanding of the algebraic structure of quaternions. A key innovation is presenting a three-dimensional example illustrating the rotation of a frame with strings, connecting quaternions to the quaternion group, half-integer spin phenomena, and Pauli matrices.
The gap between theory and practice is well-documented in educational research. Physics teachers' willingness to apply research findings in practice may be influenced by a sceptical attitude towards science education research. This study explores physics teachers' perspectives on science education research, with a particular focus on potential scepticism towards the discipline. A two-step mixed-methods approach was employed: (1) Interviews with a purposeful sample of 13 experienced physics teachers for a first exploration of attitudes towards physics education research, and (2) a quantitative survey of 174 physics teachers to examine, among other aspects, the previously observed attitudes in a larger sample and to identify teacher profiles using latent profile analysis. The interview study revealed both sceptical and non-sceptical attitudes towards physics education research, including some that fundamentally questioned its practical value. Based on the survey data and latent profile analysis, four distinct teacher profiles differing in their level of scepticism towards science education research were identified. While one profile is highly sceptical, the other three exhibit a mix
Cybersecurity professionals need hands-on training to prepare for managing the current advanced cyber threats. To practice cybersecurity skills, training participants use numerous software tools in computer-supported interactive learning environments to perform offensive or defensive actions. The interaction involves typing commands, communicating over the network, and engaging with the training environment. The training artifacts (data resulting from this interaction) can be highly beneficial in educational research. For example, in cybersecurity education, they provide insights into the trainees' learning processes and support effective learning interventions. However, this research area is not yet well-understood. Therefore, this paper surveys publications that enhance cybersecurity education by leveraging trainee-generated data from interactive learning environments. We identified and examined 3021 papers, ultimately selecting 35 articles for a detailed review. First, we investigated which data are employed in which areas of cybersecurity training, how, and why. Second, we examined the applications and impact of research in this area, and third, we explored the community of res
Background Analyzing kinematic and video data can help identify potentially erroneous motions that lead to sub-optimal surgeon performance and safety-critical events in robot-assisted surgery. Methods We develop a rubric for identifying task and gesture-specific Executional and Procedural errors and evaluate dry-lab demonstrations of Suturing and Needle Passing tasks from the JIGSAWS dataset. We characterize erroneous parts of demonstrations by labeling video data, and use distribution similarity analysis and trajectory averaging on kinematic data to identify parameters that distinguish erroneous gestures. Results Executional error frequency varies by task and gesture, and correlates with skill level. Some predominant error modes in each gesture are distinguishable by analyzing error-specific kinematic parameters. Procedural errors could lead to lower performance scores and increased demonstration times but also depend on surgical style. Conclusions This study provides insights into context-dependent errors that can be used to design automated error detection mechanisms and improve training and skill assessment.
Educational technology has attained significant importance as a mechanism for supporting experiential learning of science concepts. However, the growth of this mechanism is limited by the significant time and technical expertise needed to develop such products, particularly in specialized fields of science. We sought to test whether interactive, educational, online software modules can be developed effectively by students as a curriculum component of an advanced science course. We discuss a set of fifteen such modules developed by Harvard University graduate students to demonstrate various concepts related to astronomy and physics. Their successful development of these modules demonstrates that online software tools for education and outreach on specialized topics can be produced while simultaneously fulfilling project-based learning objectives. We describe a set of technologies suitable for module development and present in detail four examples of modules developed by the students. We offer recommendations for incorporating educational software development within a graduate curriculum and conclude by discussing the relevance of this novel approach to new online learning environmen
We propose a novel multi-modal and multi-task architecture for simultaneous low level gesture and surgical task classification in Robot Assisted Surgery (RAS) videos.Our end-to-end architecture is based on the principles of a long short-term memory network (LSTM) that jointly learns temporal dynamics on rich representations of visual and motion features, while simultaneously classifying activities of low-level gestures and surgical tasks. Our experimental results show that our approach is superior compared to an ar- chitecture that classifies the gestures and surgical tasks separately on visual cues and motion cues respectively. We train our model on a fixed random set of 1200 gesture video segments and use the rest 422 for testing. This results in around 42,000 gesture frames sampled for training and 14,500 for testing. For a 6 split experimentation, while the conventional approach reaches an Average Precision (AP) of only 29% (29.13%), our architecture reaches an AP of 51% (50.83%) for 3 tasks and 14 possible gesture labels, resulting in an improvement of 22% (21.7%). Our architecture learns temporal dynamics on rich representations of visual and motion features that compliment e