As an ARIA recipient this summer, I worked with Professor Fabian Lange from the Department of Economics and one of his PhD students, Hai Ma, on the project “Building a Database on Special Economic Zones (SEZ) in Vietnam since 2000”. I particularly wanted to participate in this project because I am interested in economics research, especially about my home country Vietnam. Furthermore, I wanted to apply what I have learned in both of my majors (Economics and Computer Science) on a hands-on project.
My main responsibilities in this internship included looking for data from various online sources about the SEZs' location, official government documents, and specialized industries. After that, I would format these data into a database. Therefore, before starting the project, I hoped to improve my research skills, such as validating the credibility of websites from which I collect data and documenting systematically the data sources. I also wanted to apply multiple information technology tools learned in Computer Science to help me complete data-related tasks quickly. Now that the internship is coming to an end, I can confirm that the above objectives have been successfully met.
For example, I practiced a lot in verifying data sources when I had to scrape websites for information on about 850 SEZs. This task of gathering data from many different places required me to think critically about whether a particular website is credible. This is done by carefully considering whether the website’s owner is a private enterprise or a public institution, as well as the owner’s intention when creating that website. While government websites are usually credible as they reflect information from official documents, private data sources are often less trustworthy. For example, the website of an industrial land broker might exaggerate the information on the progress of the infrastructure development in an SEZ to attract buyers, which makes the data published by that broker not credible. Thus, I tried to use government websites as much as possible. However, the Vietnamese government does not have a systematic information portal for the majority of SEZs. Therefore, I had to rely on a small number of private sources. I confirm their trustworthiness by comparing their published information with that from the government for several random SEZs. A handful of private websites passed this verification, and they are a great addition to the data sources available for my research.
Another difficulty that I encountered was when I analyzed data after their collection. This task was primarily about annotating text data on SEZs’ characteristics to figure out which production industries are focused on. To accomplish this, I used Doccano, a data labeling application. Setting up this tool requires multiple technical steps, such as using Docker (an important tool in the tech industry that I have little experience with) and the command lines to install Doccano, setting up cloud storage, and getting the input data in the right format (.txt and .json files). As a Computer Science student, I am no stranger to figuring out how to use new tools, but it is always a learning curve to embark on new technologies, such as Docker. I overcame my inexperience with this tool by reading its online documentation and using the try-and-error method. In the end, I was able to get Doccano running on Docker. After having annotated the data, I used another set of technological tools to summarize and merge the data into the existing database. This work was accomplished using the programming language Python and one of its most popular libraries for data wrangling, pandas. Thus, thanks to this ARIA project, I am confident that I am now familiar with many important tools in information technology.
It can be said that this ARIA project has provided me with important skills for working in both economics and the tech industry. For the former, I am better equipped with research skills, and for the latter, I learned a new important tool as well as built up my general programming capabilities. My aspiration after graduation is to contribute to economics research, work for international development projects, or start as a programmer. Thus, this ARIA internship is in every way very useful for the start of my career after my study at Â鶹AV.
To conclude the report, I would like to take this opportunity to express my greatest gratitude to Mr. Mark W. Gallop for his generosity in funding my project. Without his incredible support, my ARIA internship would not be as successful as it is now.