GitHub, the world’s most widely used platform for open-source software development, is facing an escalating problem: the misuse of its star system. Designed to signal popularity and quality, these stars are now being exploited to artificially inflate the reputation of repositories, many of which harbor malware or engage in other malicious activities.
Researchers from Carnegie Mellon University, Socket, and North Carolina State University conducted a study exposing the scale and implications of this fraudulent behavior. (via Bleepingcomputer)
They identified over 4.5 million fake stars associated with 15,835 repositories between 2019 and 2024, shedding light on an alarming trend that undermines trust in the platform and jeopardizes the open-source ecosystem.
Related: GitHub Comments Used to Spread Credential-Stealing Lumma Malware
Implications for Developers and Organizations
The misuse of GitHub stars has significant implications for developers, organizations, and the broader software supply chain. Stars are often used as a quick heuristic for evaluating the quality of a repository, particularly by developers looking for open-source components to integrate into their projects.
However, as the study revealed, 15.8% of repositories receiving 50 or more stars in July 2024 were linked to fake star campaigns. This distortion undermines the credibility of GitHub’s star system and highlights the risks of relying on single metrics for decision-making.
The researchers emphasized the importance of a more holistic approach to evaluating repositories. They stated, “Star count is an unreliable signal of quality and should not be used for high-stakes decisions, at least not by itself. It is vital to evaluate other signals to avoid overestimating popularity or reputation, which may lead to security risks.”
They encourage developers and organizations to look beyond star counts and assess additional factors, such as documentation, pull requests, and the activity of reputable contributors, to make informed decisions.
Related: Over 3,000 GitHub Accounts Used in Stargazer Goblin’s Malware Campaign
The Security Risks of Fake Stars
One of the most concerning aspects of fake star campaigns is their connection to malware distribution. Many flagged repositories were short-lived projects masquerading as pirated software, game cheats, or cryptocurrency bots.
These repositories often contained hidden malware designed to steal sensitive data or cryptocurrencies from unsuspecting users. The researchers explained, “These campaigns frequently promote short-lived phishing malware repositories that disguise themselves as pirated software or other appealing tools to lure unsuspecting users.”
The findings highlight vulnerabilities in GitHub’s metrics and moderation systems. While GitHub has acted to remove many flagged repositories, the platform faces significant challenges in linking malicious accounts to their activities.
The researchers suggested that GitHub implement weighted metrics that consider user reputation and activity patterns, reducing the impact of fraudulent interactions. They also recommended greater transparency and collaboration with the open-source community to develop tools and guidelines for identifying fraudulent activities.
Related: Microsoft Battles Cybersecurity Issues on GitHub with AI Solutions
StarScout: A Tool for Identifying Fake Stars
To address this growing threat, the research team developed and released StarScout, an advanced detection tool that operates at scale to uncover suspicious GitHub stars.
StarScout uses a Python-based framework requiring Python 3.12 and has been tested on Ubuntu 22.04. It employs two primary detection heuristics: the low-activity heuristic and the clustering heuristic.
These techniques identify patterns of fraudulent activity, such as accounts that engage minimally with GitHub beyond starring repositories or coordinated groups of accounts acting in concert to inflate metrics.
Setting up StarScout involves creating a Python environment and configuring various credentials, including MongoDB, Google Cloud, and GitHub API tokens. The tool is designed for researchers and analysts familiar with large-scale data processing, as running the detection scripts involves reading over 20 terabytes of data.
As described by the researchers, “the BigQuery queries won’t take more than a few minutes, but the script will also fetch GitHub API to collect certain information. Expect it to be slower and output a lot of error messages (because many of the fake star repositories have been deleted).”
Detecting Fake Star Campaigns: The Process
StarScout’s workflow begins with running the low-activity heuristic, which analyzes GitHub data from specified timeframes and identifies anomalies indicative of fake stars. The results are stored in MongoDB and exported to local CSV files.
This step is followed by the clustering heuristic, which uses the CopyCatch algorithm to detect coordinated activities over six-month intervals. Due to the complexity of these operations, the clustering heuristic can take up to a week to process data, consuming over 40 terabytes of storage. Once complete, the results are exported and aggregated into a dataset of suspected fake stars.
The dataset is updated quarterly, reflecting the most recent findings of the research team. Notably, the researchers caution that the dataset contains suspected cases and may include false positives.
They explained, “The individual repositories and users in our dataset may be false positives. The main purpose of our dataset is for statistical analyses (which tolerates noises reasonably well), not for publicly shaming individual repositories.” Ethical considerations are a critical component of this work, as the research aims to highlight broader trends rather than target specific projects or developers.
The Role of StarScout in Shaping the Future
The development of StarScout represents a significant advancement in the fight against fraudulent activities on GitHub. By leveraging data-driven techniques, the tool provides a scalable solution for identifying fake star campaigns.
The researchers explained, “StarScout demonstrates how data-driven tools can be used to identify and mitigate fraudulent activities on online platforms. Our findings underscore the importance of developing scalable solutions to protect users and maintain trust in the software ecosystem.” As GitHub continues to grow, tools like StarScout will be essential in addressing emerging threats and ensuring the platform’s sustainability.
A Call to Strengthen Open-Source Integrity
The findings of this study highlight the urgent need for systemic change within the open-source community. As reliance on open-source components continues to grow, ensuring their security and reliability is paramount. By prioritizing transparency, accountability, and robust metrics, the open-source community can build a more resilient ecosystem that benefits developers, businesses, and users alike.
While the challenges posed by fake star campaigns are significant, they also present an opportunity to strengthen the foundation of open-source development. By working together, platform providers, developers, and organizations can address these threats and ensure that GitHub remains a trusted resource for innovation and collaboration.