Network Analysis on Open Data and City Agency Operation


In 2012, the Bloomberg administration passed Local Law 11, which went on to spawn the NYC Open Data program. Having decided that publishing city agency data increased transparency and fostered interdepartmental communication, the lawmakers’ decision to pursue an open data platform has been framed as a program that “will make the operation of city government more transparent, effective and accountable to the public.” The law intended to accomplish two goals: making city data available online using open access standards to increase transparency and accountability to the public while improving intra-agency and inter-agency communication. This intention of the law has accomplished the former of the two goals by creating an environment of transparency and improved public accountability, but it is still unclear whether the project has improved information sharing within and between different agencies.

In light of this potential shortcoming of the implementation of Local Law 11, this project utilized network analysis to answer a different question: can operational relationships between city agencies, in fact, be identified based upon the types of datasets that they are posting, and can those relationships identify groups of agencies that should establish or strengthen inter-operational connections with each other based on data overlap?




Using open data to enable engagement and improve operational efficiencies in government is not a new approach. A number of different research papers argue that information sharing between government agencies and citizens can result in profound improvements in the public’s opinion of government. A previous study by Barbosa (2014) explores the overall landscape and potential integration of open urban data regarding volume, type, and format in different cities within the U.S. The study indicates that NYC open datasets are the most robust with respect to volume and variation, acting as the standard bearer for other governments seeking to create similar open data platforms.

Organizationally, the city government has a hierarchical structure with the mayor’s office serving as the central agency branching out to various city departments, each with their own sub- divisions. However, such documentation capturing governmental structure does not fully represent the operational network or the intra-agency organizational structure. In this case, the NYC Open Data website is the platform that serves as the centralized data warehouse stipulated in the law. Alongside the hundreds of other datasets present on this website exists a dataset called the NYC Open Data Catalog, which lists all datasets that have been posted, organized by attributes such as name, category, agency and update frequency. This catalog indicates the data contribution, activity, and responsibility of each agency, which may reveal the more nuanced operational network of NYC on the basis of shared information across various city agencies. It is here one identifies how this study aims to explore the operational network of NYC departments based on NYC Open Data and reveal the hidden hierarchies among agencies.


(Organizationally, the city government has a hierarchical structure.)


(The landscape of NYC Open Data.)






As impressive as it is, the NYC Open Data repository highlights some of the problems that we see in big data initiatives. For example, urban planning has to date not incorporated health care data to fully exploit the data’s potential to inform public health policies. As shown in the figure above, health data are directly shared only between the Health and Hospitals Corporation, the Department of City Planning, and the Department of Health and Mental Hygiene. Noticeably missing are direct data-sharing connections between these organizations and the New York Department of Sanitation, Department of Homeless Services, Department for the Aging, and the Office of Emergency Management. The association map illustrates that organizations that would benefit from direct multidisciplinary collaboration appear to be operating in informational siloes, which is a recurring theme among big data projects.

(Visualization of NYC Open Data, figure courtesy of Yuan Lai. Visualization by Gephi and raw data is from NYC Open Data Portal. Published on Journal of Medical Internet Research, Paper titled “Beyond Open Big Data: Addressing Unreliable Research” by Edward T Moseley, BS, Douglas J Hsu, MD, David J Stone, MD, and Leo Anthony Celi, MPH, MS, MD. Available from )

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s