Study finds big data alone won't solve public health problems, but pairing street view images with expertise holds promise
Big data and artificial intelligence are transforming how we think about health, from detecting diseases and spotting patterns to predicting outcomes and speeding up response times.
In a new study analyzing two million Google Street View images from New York City streets, a team of New York University researchers evaluated the utility of this digital data in informing public health decision-making. Their findings, published in the Proceedings of the National Academy of Sciences (PNAS), show how relying on street view images alone may lead to inaccuracies and misguided interventions, but combining it with other knowledge grows its potential.
"There's a lot of excitement around leveraging new data sources to gain a holistic view of health, including bringing in machine learning and data science methods to extract new insights," said Rumi Chunara, associate professor of biostatistics at NYU School of Global Public Health, associate professor of computer science at NYU Tandon School of Engineering, and the study's senior author.
"Our study highlights the potential of digital data sources such as street view images in enhancing public health research, while also pointing out the limitations of data and the complex dynamics between the environment, individual behavior, and health outcomes," said Miao Zhang, a PhD student at NYU Tandon School of Engineering and the study's first author.
A street-level view of health
In recent years, researchers have begun using street view images to connect an area's environment and infrastructure with outcomes such as mental health, infectious diseases, or obesity-a task that would be challenging to measure by hand.
"We know that a city's built environment can shape our health, whether it's the availability of sidewalks and greenspaces for walking, or groceries stores carrying healthy foods," said Chunara. "Some studies show that the availability of sidewalks correlates with lower obesity rates-but is that the whole story?"
"Our motivation for this study was to dive deeper into these associations to see if there are potential factors driving them," said Zhang.
Chunara, Zhang, and their colleagues analyzed more than two million Google Street View images of every New York City street, using artificial intelligence to assess the availability of sidewalks and crosswalks in the images. They then compared this information with localized data on obesity, diabetes, and physical activity from the Centers for Disease Control and Prevention to see if the built environment predicted health outcomes.
The researchers found that neighborhoods with more crosswalks had lower rates of obesity and diabetes. However, no significant link was found between sidewalks and health outcomes, in contrast to earlier research.
"This may be because a lot of the sidewalks in New York City are in places that people don't use-along a highway, on a bridge, or in a tunnel-so sidewalk density may not reflect neighborhood walkability as accurately as crosswalks," said Zhang.
They also surfaced issues with the accuracy of the AI-generated labels for the street view images, cautioning that they may not match the "ground truth" and be a reliable measure on their own. When comparing existing data on New York City's sidewalk availability with the labeled street view images, they found that many were incorrectly labeled as having or not having sidewalks, which may have been due to cars or shade obscuring them in photos.
If you build it, will they come?
While crosswalks were linked to lower rates of obesity and diabetes, the researchers applied a public health lens to determine what could explain this association. Their analyses of the CDC data revealed that physical activity-not just crosswalks, as measured in street view images-accounted for the decrease in obesity and diabetes. In one test, they found that increasing physical activity could result in a four times larger decrease in obesity and 17 times greater decrease in diabetes than could be achieved through installing more crosswalks.
"We saw that physical activity delivers the benefits of crosswalks, so it's important to take such mechanisms into account, especially when they act on different levels like the built environment versus individuals," said Zhang.
Based on their findings, the researchers conclude that public health decision-making shouldn't rely on new data sources alone, but must also consider domain knowledge. When analyzing street view images, incorporating computer science knowledge-for instance, how image processing techniques can improve accuracy or how to correct for bias in algorithms-and public health knowledge-what drives the associations between built environment and health outcomes-are critical. Layering this expertise over big data can inform how programs are designed and implemented to improve public health.
In this case, building more sidewalks and crosswalks would be less effective at improving health outcomes than the same increase in physical activity, such as through local exercise classes for the community.
"While growing amounts of digital data can be useful in informing decision-making, our results show that simply using associations from new data sources may not lead to the most useful interventions or best allocation of resources," added Chunara. "A more nuanced approach using big data in conjunction with expertise is needed to make the best use of this new data."
Salman Rahman and Vishwali Mhasawade of NYU Tandon were also study authors. The research was supported by the National Science Foundation (award 1845487).