The overwhelming majority of homeless individuals are jobless, despite many expressing a willingness to work. While this strong individual-level link between homelessness and unemployment is well-documented, the broader impact of labor market dynamics on homelessness remains largely unexplored. To fill this gap, this paper investigates the impact of local labor market conditions on the duration of homelessness, using individuals' homeless shelter usage records as a proxy for measuring their homelessness duration. Specifically, drawing on Canada's National Homelessness Information System data from 2014 to 2017, we analyze how local employment growth and changes in the local employment rate affect shelter usage duration. Our findings reveal that a 1% increase in local employment is associated with a 0.11-quarter (approximately 0.33-month) reduction in the average duration of shelter usage, while a 1% rise in the local employment rate leads to a 0.34-quarter (approximately 1.02-month) reduction. These changes correspond to decreases of 2.9% and 8.9%, respectively, in the average duration of shelter stays. The findings underscore the critical role of employment opportunities in reducin
Homelessness in American cities is becoming an ever more prominent issue, but its causes remain contested, ranging from mental health and substance abuse to housing affordability and local labor markets. To shed light on this issue, I construct a novel MSA-level national panel of homelessness counts using data from the U.S. Department of Housing and Urban Development. Using a long-differencing regression specification with the changes in rent entered in piecewise linear form, I find that rent increases predict large increases in homelessness rates, but decreases have little to no effect. The same conclusions are reached when I use a quasi-differencing moment condition, assuming a multiplicative mean specification. Then, I propose a theoretical model of the low-end housing market that explains the asymmetry I find in the data. Finally, I outline an IV strategy that instruments rent changes with a Bartik instrument of predicted employment growth interacted with local housing-supply elasticities. My findings suggest that homelessness is a housing problem; however, because the response is sticky downward, effective policy must complement housing-market interventions with measures that
Within the field of media framing, homelessness has been a historically under-researched topic. Framing theory states that the media's method of presenting information plays a pivotal role in controlling public sentiment toward a topic. The sentiment held towards homeless individuals influences their ability to access jobs, housing, and resources as a result of discrimination. This study analyzes the topic and sentiment trends in related media articles to validate framing theory within the scope of homelessness. It correlates these shifts in media reporting with public sentiment. We examine state-level trends in California, Florida, Washington, Oregon, and New York from 2015 to 2023. We utilize the GDELT 2.0 Global Knowledge Graph (GKG) database to gather article data and use X to measure public sentiment towards homeless individuals. Additionally, to identify if there is a correlation between media reporting and public policy, we examine the media's impact on state-level legislation. Our research uses Granger-causality tests and vector autoregressive (VAR) models to establish a correlation between media framing and public sentiment. We also use latent Dirichlet allocation (LDA) an
When governments mandate collaboration, shared data systems can serve both as tools for coordination and instruments of control. This study examines U.S. homelessness service networks, where Continuums of Care (CoCs) coordinate service providers through the federally mandated Homeless Management Information System (HMIS). With client consent, providers enter data into HMIS and access cross-provider service histories to support coordinated care. At the same time, HMIS embeds standards and governance rules that shape who can collect, access, interpret, and act on data, and thus who holds decision authority. Using qualitative interviews with six experts, we show that standardization can facilitate collaboration and shared learning. However, unequal resources, analytic capacity, and authority limit equitable participation and often shift some participants toward compliance-focused roles. We contribute to public-interest design research on civic data infrastructures by illustrating how mandated data sharing can simultaneously enable coordination and accountability while reproducing power asymmetries in data interpretation and decision-making.
Homelessness systems in North America adopt coordinated data-driven approaches to efficiently match support services to clients based on their assessed needs and available resources. AI tools are increasingly being implemented to allocate resources, reduce costs and predict risks in this space. In this study, we conducted an ethnographic case study on the City of Toronto's homelessness system's data practices across different critical points. We show how the City's data practices offer standardized processes for client care but frontline workers also engage in heuristic decision-making in their work to navigate uncertainties, client resistance to sharing information, and resource constraints. From these findings, we show the temporality of client data which constrain the validity of predictive AI models. Additionally, we highlight how the City adopts an iterative and holistic client assessment approach which contrasts to commonly used risk assessment tools in homelessness, providing future directions to design holistic decision-making tools for homelessness.
People experiencing homelessness (PEH) face substantial barriers to accessing timely, accurate information about community services. DreamKG addresses this through a knowledge graph-augmented conversational system that grounds responses in verified, up-to-date data about Philadelphia organizations, services, locations, and hours. Unlike standard large language models (LLMs) prone to hallucinations, DreamKG combines Neo4j knowledge graphs with structured query understanding to handle location-aware and time-sensitive queries reliably. The system performs spatial reasoning for distance-based recommendations and temporal filtering for operating hours. Preliminary evaluation shows 59% superiority over Google Search AI on relevant queries and 84% rejection of irrelevant queries. This demonstration highlights the potential of hybrid architectures that combines LLM flexibility with knowledge graph reliability to improve service accessibility for vulnerable populations effectively.
The relationship between housing costs and homelessness has important implications for the way that city and county governments respond to increasing homeless populations. Though many analyses in the public policy literature have examined inter-community variation in homelessness rates to identify causal mechanisms of homelessness (Byrne et al., 2013; Lee et al., 2003; Fargo et al., 2013), few studies have examined time-varying homeless counts within the same community (McCandless et al., 2016). To examine trends in homeless population counts in the 25 largest U.S. metropolitan areas, we develop a dynamic Bayesian hierarchical model for time-varying homeless count data. Particular care is given to modeling uncertainty in the homeless count generating and measurement processes, and a critical distinction is made between the counted number of homeless and the true size of the homeless population. For each metro under study, we investigate the relationship between increases in the Zillow Rent Index and increases in the homeless population. Sensitivity of inference to potential improvements in the accuracy of point-in-time counts is explored, and evidence is presented that the inferred
The global rise in homelessness calls for urgent and alternative policy solutions. Non-profits and governmental organizations alert about the many challenges faced by people experiencing homelessness (PEH), which include not only the lack of shelter but also the lack of opportunities for personal development. In this context, the capability approach (CA), which underpins the United Nations Sustainable Development Goals (SDGs), provides a comprehensive framework to assess inequity in terms of real opportunities. This paper explores how the CA can be combined with agent-based modelling and reinforcement learning. The goals are: (1) implementing the CA as a Markov Decision Process (MDP), (2) building on such MDP to develop a rich decision-making model that accounts for more complex motivators of behaviour, such as values and needs, and (3) developing an agent-based simulation framework that allows to assess alternative policies aiming to expand or restore people's capabilities. The framework is developed in a real case study of health inequity and homelessness, working in collaboration with stakeholders, non-profits and domain experts. The ultimate goal of the project is to develop a
Recent studies suggest social media activity can function as a proxy for measures of state-level public health, detectable through natural language processing. We present results of our efforts to apply this approach to estimate homelessness at the state level throughout the US during the period 2010-2019 and 2022 using a dataset of roughly 1 million geotagged tweets containing the substring ``homeless.'' Correlations between homelessness-related tweet counts and ranked per capita homelessness volume, but not general-population densities, suggest a relationship between the likelihood of Twitter users to personally encounter or observe homelessness in their everyday lives and their likelihood to communicate about it online. An increase to the log-odds of ``homeless'' appearing in an English-language tweet, as well as an acceleration in the increase in average tweet sentiment, suggest that tweets about homelessness are also affected by trends at the nation-scale. Additionally, changes to the lexical content of tweets over time suggest that reversals to the polarity of national or state-level trends may be detectable through an increase in political or service-sector language over the
The growing homelessness crisis in the U.S. presents complex social, economic, and public health challenges, straining shelters, healthcare, and social services while limiting effective interventions. Traditional assessment methods struggle to capture its dynamic, dispersed nature, highlighting the need for scalable, data-driven detection. This survey explores computational approaches across four domains: (1) computer vision and deep learning to identify encampments and urban indicators of homelessness, (2) air quality sensing via fixed, mobile, and crowdsourced deployments to assess environmental risks, (3) IoT and edge computing for real-time urban monitoring, and (4) pedestrian behavior analysis to understand mobility patterns and interactions. Despite advancements, challenges persist in computational constraints, data privacy, accurate environmental measurement, and adaptability. This survey synthesizes recent research, identifies key gaps, and highlights opportunities to enhance homelessness detection, optimize resource allocation, and improve urban planning and social support systems for equitable aid distribution and better neighborhood conditions.
Homelessness in the United States has surged to levels unseen since the Great Depression. However, existing methods for monitoring it, such as point-in-time (PIT) counts, have limitations in terms of frequency, consistency, and spatial detail. This study proposes a new approach using publicly available, crowdsourced data, specifically 311 Service Calls and street-level imagery, to track and forecast homeless tent trends in San Francisco. Our predictive model captures fine-grained daily and neighborhood-level variations, uncovering patterns that traditional counts often overlook, such as rapid fluctuations during the COVID-19 pandemic and spatial shifts in tent locations over time. By providing more timely, localized, and cost-effective information, this approach serves as a valuable tool for guiding policy responses and evaluating interventions aimed at reducing unsheltered homelessness.
This paper uses privacy preserving methods to link over 235,000 records in the housing and homelessness system of care (HHSC) of a major North American city. Several machine learning pairwise linkage and two clustering algorithms are evaluated for merging the profiles for latent individuals in the data. Importantly, these methods are evaluated using both traditional machine learning metrics and HHSC system use metrics generated using the linked data. The results demonstrate that privacy preserving linkage methods are an effective and practical method for understanding how a single person interacts with multiple agencies across an HHSC. They also show that performance differences between linkage techniques are amplified when evaluated using HHSC domain specific metrics like number of emergency homeless shelter stays, length of time interacting with an HHSC and number of emergency shelters visited per person.
Scopus is increasingly regarded as a high-quality and reliable data source for research and evaluation of scientific and scholarly activity. However, a puzzling phenomenon has been discovered occasionally: millions of records with author affiliation information collected in Scopus are oddly labeled as "country-undefined" by Scopus which is rarely to be detected in its counterpart Web of Science. This huge number of "homeless" records in Scopus is unacceptable for a widely used high-quality bibliographic database. By using data from the past 124 years, this brief communication tries to probe these affiliated but country-undefined records in Scopus. Our analysis identifies four primary causes for these "homeless" records: incomplete author affiliation addresses, Scopus' inability to recognize different variants of country/territory names, misspelled country/territory names in author affiliation addresses, and Scopus' insufficiency in correctly split and identify the clean affiliation addresses. To address this pressing issue, we put forward several recommendations to relevant stakeholders, with the aim of resettling millions of "homeless" records in Scopus and reducing its potential
This paper presents a large anonymized dataset of homelessness alongside insights into the data-driven rehabilitation of homeless people. The dataset was gathered by a large nonprofit organization working on rehabilitating the homeless for twenty years. This is the first dataset that we know of that contains rich information on thousands of homeless individuals seeking rehabilitation. We show how data analysis can help to make the rehabilitation of homeless people more effective and successful. Thus, we hope this paper alerts the data science community to the problem of homelessness.
Homelessness among US veterans remains a critical public health challenge, yet risk prediction offers a pathway for proactive intervention. In this retrospective prognostic study, we analyzed electronic health record (EHR) data from 4,276,403 Veterans Affairs patients during a 2016 observation period to predict first-episode homelessness occurring 3-12 months later in 2017 (prevalence: 0.32-1.19%). We constructed static and time-varying EHR representations, utilizing clinician-informed logic to model the persistence of clinical conditions and social risks over time. We then compared the performance of classical machine learning, transformer-based masked language models, and fine-tuned large language models (LLMs). We demonstrate that incorporating social and behavioral factors into longitudinal models improved precision-recall area under the curve (PR-AUC) by 15-30%. In the top 1% risk tier, models yielded positive predictive values ranging from 3.93-4.72% at 3 months, 7.39-8.30% at 6 months, 9.84-11.41% at 9 months, and 11.65-13.80% at 12 months across model architectures. Large language models underperformed encoder-based models on discrimination but showed smaller performance di
Homelessness is a social and health problem with great repercussions in Europe. Many non-governmental organisations help homeless people by collecting and analysing large amounts of information about them. However, these tasks are not always easy to perform, and hinder other of the organisations duties. The SINTECH project was created to tackle this issue proposing two different tools: a mobile application to quickly and easily collect data; and a software based on artificial intelligence which obtains interesting information from the collected data. The first one has been distributed to some Spanish organisations which are using it to conduct surveys of homeless people. The second tool implements different feature selection and association rules mining methods. These artificial intelligence techniques have allowed us to identify the most relevant features and some interesting association rules from previously collected homeless data.
Artificial intelligence researchers have proposed various data-driven algorithms to improve the processes that match individuals experiencing homelessness to scarce housing resources. It remains unclear whether and how these algorithms are received or adopted by practitioners and what their corresponding consequences are. Through semi-structured interviews with 13 policymakers in homeless services in Los Angeles, we investigate whether such change-makers are open to the idea of integrating AI into the housing resource matching process, identifying where they see potential gains and drawbacks from such a system in issues of efficiency, fairness, and transparency. Our qualitative analysis indicates that, even when aware of various complicating factors, policymakers welcome the idea of an AI matching tool if thoughtfully designed and used in tandem with human decision-makers. Though there is no consensus as to the exact design of such an AI system, insights from policymakers raise open questions and design considerations that can be enlightening for future researchers and practitioners who aim to build responsible algorithmic systems to support decision-making in low-resource scenario
Rental assistance programs provide individuals with financial assistance to prevent housing instabilities caused by evictions and avert homelessness. Since these programs operate under resource constraints, they must decide who to prioritize. Typically, funding is distributed by a reactive or first-come-first serve allocation process that does not systematically consider risk of future homelessness. We partnered with Allegheny County, PA to explore a proactive allocation approach that prioritizes individuals facing eviction based on their risk of future homelessness. Our ML system that uses state and county administrative data to accurately identify individuals in need of support outperforms simpler prioritization approaches by at least 20% while being fair and equitable across race and gender. Furthermore, our approach would identify 28% of individuals who are overlooked by the current process and end up homeless. Beyond improvements to the rental assistance program in Allegheny County, this study can inform the development of evidence-based decision support tools in similar contexts, including lessons about data needs, model design, evaluation, and field validation.
The social networks of people experiencing homelessness are an understudied but vital aspect of their lives, offering access to information, support, and safety. In 2023, the U.S. Department of Housing and Urban Development reported 653,100 people experiencing homelessness on any given night -- a 23% rise since 2022, though likely an undercount. This paper examines a unique three-year dataset (2022-2024) of survey responses from over 3,000 unhoused individuals in King County, WA, collected via network-based sampling methods to estimate the unsheltered population. Our study analyzes the networks of the unsheltered population, focusing on acquaintance, close friendship, kinship, and peer referral networks. Findings reveal a decline in social connectivity over time. The average number of acquaintances dropped from 80 in 2023 to 40 in 2024. Close friendship levels remained stable at 2.5, but given the growth in the homeless population, this suggests decreased network connectivity. Kinship networks expanded, indicating that more family members of unhoused individuals are also experiencing homelessness. These trends suggest increasing social disconnection, possibly driven by displacement
Warning: Contents of this paper may be upsetting. Public attitudes towards key societal issues, expressed on online media, are of immense value in policy and reform efforts, yet challenging to understand at scale. We study one such social issue: homelessness in the U.S., by leveraging the remarkable capabilities of large language models to assist social work experts in analyzing millions of posts from Twitter. We introduce a framing typology: Online Attitudes Towards Homelessness (OATH) Frames: nine hierarchical frames capturing critiques, responses and perceptions. We release annotations with varying degrees of assistance from language models, with immense benefits in scaling: 6.5x speedup in annotation time while only incurring a 3 point F1 reduction in performance with respect to the domain experts. Our experiments demonstrate the value of modeling OATH-Frames over existing sentiment and toxicity classifiers. Our large-scale analysis with predicted OATH-Frames on 2.4M posts on homelessness reveal key trends in attitudes across states, time periods and vulnerable populations, enabling new insights on the issue. Our work provides a general framework to understand nuanced public at