Anuva Ghosalkar's Paper

Newest Student Article
Tracking the Trends of the Most Popular Programming Languages Over the Past Decade Using Github's API
Anuva Ghosalkar


Published:
09 November 2022 | Download
Student's Published Works


Abstract

The following research was designed to use experimental data to propose what the most popular programming languages are, as well as offer insight and discussion into the rise and fall of these languages. While programming is a relatively new field, the number of options for languages to code in has grown exponentially over the decades to account for new needs in various fields. However, some programming languages’ versatility helps popularize them internationally, and as their usability becomes more established, their learner and user counts grow.

To compute the most used programming language, the Github API was used in conjunction with Powershell. Together these were able to return two types of data for the 200,000+ files found: the last updated date and the file type. With this data, the amount of files that were found for a language were quantified per year, and growth along those languages was tracked. These trends indicated when the language was the most popular, and also whether its growth or decline is oncoming.

Results show that while new, quickly growing languages were hypothesized to be the most popular, in actuality, older languages that remain versatile today have far more users than that of new languages. An example of this is the difference between Ruby, which came out in 1995, and that of a more recent language, such as Swift, which was released in 2014. Ruby was the first rated language in popularity, with 4,501 uses found in the last decade, while Swift does not even make these charts. This implies that whole programming languages can quickly be picked up and used for various kinds of development, the most populous languages are those which have long been established. Even though learning new programming languages generally becomes easier with skill, languages still need time to establish themselves on a global, industrialized sphere.


Keywords: Computer science, Programming Languages, Github, Popularity, Ruby

Introduction

As programming languages rapidly fall in and out of favor, choosing to work with the most versatile language that suits your project’s needs is an essential part of every programmer’s journey. However, every programming language has different strengths, but based on the popularity of the language, assumptions can be made about its functionality.

In this paper, the popularity trends of the top programming languages, according to data taken from Github are summarized and graphed, and inferences are drawn based on these findings. I hypothesize that languages such as Java, Javascript, and Python will be at the top of the list; However, according to Columbia Engineering[1], lesser known languages like Swift, Pascal, and Kotlin may not have a large user base, but could be on the rise in current programming spheres.

Materials and Method

To draw my conclusions, I used the Github API, or Application Programming Interface, to collect information about these files from certain repositories. Additionally, a Powershell script was needed to go through repositories and find file information that would then have to be post-processed. Microsoft Excel later processed this data, as well as to create the final charts and graphs.

Results

By using a Powershell script and the Github API, I was able to get the last updated date and extension type for 234,502 different files from various repositories and users. From there, I analyzed 2 variables - (i) the programming language(s) in the repository and (ii) the last date that the files were edited. The number of files of each type was quantified by totaling all of the files that had a specific file extension, indicating the language that the file was originally written in. After post processing these files, which primarily consisted of formatting all of the data as a CSV (Comma Separated Values) file, I then opened it in Microsoft Excel and saved it as an .xlsx file to reduce storage space on my harddrive. All of the files were individual rows in a table that was later converted to a Pivot Table that summed up the number of files found. From there, results were filtered so that only languages with over 800 uses from 2012 - 2022 were shown to eliminate data that had very few appearances in this decade. This is mainly meant to take care of the large number of languages that had between 1 and 10 uses; 800 was chosen as the cutoff because of a break in the data that formed a natural separation. Languages that were not programming languages (ex: markup languages) were also filtered out (see Figure 3). The number of files of each language remaining was quantified based on the year, and the results were plotted in a clustered column chart using Microsoft Excel. By performing this experiment, the most widely used programming language was established, and its growth was able to be tracked and assessed.

These statistics provide an area for analysis. The most used programming language was Ruby[4] (.rb), which was first released in 1995. However, data verifies that the popularity of this language has increased significantly, and that this is still a popular choice to code in. Other languages also follow a similar, yet less pronounced pattern, such as Go, C#, C, and Javascript. Languages that have not been as popular for as long, or have a smaller range of uses, display much less stable growth (e.g., Python).

 Discussion

The results were not as expected, but definitely worth some insight. Ruby being at the top of this list established that despite its age, it has managed to keep up with current demand for programming features. This brings up interesting discussion points as to what makes Ruby so popular. Proposed answers may be its established popularity throughout time, as well as its power and ease.

However, other proposed languages to top this list, such as Java, appear far below Ruby despite their popularity. Additionally, quickly rising languages such as Python have shown some growth, but not enough to be comparable with that of Ruby and Go. This could imply that while programming languages rapidly fall in and out of favor, they still need time to grow to considerable heights before they gain widespread popularity. It also raises discussion of the speed that programming languages popularize at. Despite the ease of learning new programming languages in the digital age, languages still need time to acquire a following that lets them be recognized for the functionality they offer.

Languages such as Javascript, C#, C, Go, and PHP, however, are commonly held as some of the most popular languages, and the results of this study corroborate with these findings. Northeastern University Graduate Programs[2] lists all five of these in their web page when listing some of the most popular programming languages, and Berkeley Extensions[5] also makes reference to a few of these in their own version of a similar article. While the presence of Ruby is surprising using such sources, the languages that reappear also have similarities, if not in their individual languages, then in their applications, usability, and versatility. This may imply that for a programming language to be popular (which is necessarily “useful” or ‘good” in all situations), certain factors that these languages possess must be addressed. Not all languages that work like these ones will succeed, but a starting point for the requirements of a “good” programming language can be found in all of these.

Then comes the importance of analyzing trends. For instance, even though Ruby is far ahead of any other language, its usage has been steadily declining over the 5 years. However, languages like Go and, C#, and Javascript, have shown a general increase, or at least less of a drastic decrease. The obvious outliers to this statement are 2020, which could be accounted for by stating the impacts of the COVID-19 pandemic, and 2022, which at this time has not yet completed, and statistics for these languages may increase in the coming months. Additional research would be required to confirm the above statement. Regardless of these outliers, the success of these languages increases in their popularity, and also explains why even today, there is such an emphasis to achieve mastery in them among computer scientists.

While these results also may be by fluke of the random repositories selected, the number of files indicates a margin of error of only 0.2%. Another factor that could have impacted these results was that all of a user’s repositories and files were analyzed before the program moved to the next repository and/or user, which meant that multiple large repositories containing only one file type because of the user’s preference would have heavily skewed the data. Thus, being able to access all files on Github, which is likely the most accessible source for programing metadata, would provide much more sound findings. Github, however, also only has a fraction of the international programming community; learners and hobbyists may not have been represented in this data because Github is used industrially as one option for a Version Control System. To process more files in an attempt to produce more accurate results, substantial time and power will be needed that I did not have access to, as the file we ran had limited calls even with authentication, and the computer the program was executed on did not have the power to go through all users’ files.

 Conclusion:

In order to find the most widely used programming language and track its popularity over the past decade, I used the Github API to iterate through repositories on Github and record all of the file extensions and updated dates present in a CSV. This data was later processed in Microsoft Excel, where only languages which had over 800 total uses were analyzed, limiting the number of languages that would be processed after this point. Next, languages that were not programming languages were filtered out as to not deter from the goal of the experiment, and this determined the top 14 languages which were later graphed. While it could have been a result of the repositories used, older languages such as Ruby and C# were seen as far more popular than newer languages, implying the time languages need to gain popularity is significantly higher than the time it takes for them to be learned by any number of users. Even in a rapidly expanding field such as Computer Science, time will tell which languages are suited for success based on their versatility and usability. 

Acknowledgment

I thank Harvard’s Learn with Leaders for the guidance they provided throughout the course of this publication.

Figure 1: Processed data of the top programming languages and their use over time

Figure 2: Data from Figure 1 in a Clustered Bar Chart

Figure 3: A list of programming languages that were taken out. These file types include those that were erroneously entered, languages are not for programming (ex: .png, .jpg), files for markup (.xml, .html), languages that are continuations of other, more well-known programming languages (.h), etc.

Figure 4: KellyAnn Fitzpatrick’s 2021 data taken from Redmonk. This correlates with my findings in regards of high popularity programming languages

References

Columbia Engineering. (2021, December 3). 11 new programming languages to learn in 2022. Columbia Engineering Boot Camps. Retrieved August 13, 2022, from https://bootcamp.cvn.columbia.edu/blog/new-programming-languages/

Eastwood, B. (2022, January 5). The 10 most popular programming languages to learn in 2021. Northeastern University Graduate Programs. Retrieved August 13, 2022, from https://www.northeastern.edu/graduate/blog/most-popular-programming-languages/

Fitzpatrick, K. A. (2021, March 9). Redmonk Top 20 languages over time: January 2021. RedMonk. Retrieved August 13, 2022, from https://redmonk.com/kfitzpatrick/2021/03/02/redmonk-top-20-languages-over-time-january-2021/

Ruby. Ruby Programming Language. (n.d.). Retrieved August 13, 2022, from https://www.ruby-lang.org/en/

Trilogy Education Services. (2021, December 22). 11 most in-demand programming languages in 2022. Berkeley Boot Camps. Retrieved August 13, 2022, from https://bootcamp.berkeley.edu/blog/most-in-demand-programming-languages/#:~:text=According%20to%20Stack%20Overflow's%202020,(PDF%2C%202.4%20MB).