GitHub reconnaissance is an important aspect of attack surface management, particularly for organizations and individuals who rely heavily on software development and open-source code.
Here’s why it is crucial:
Discovery of Sensitive Information
Developers sometimes inadvertently push sensitive information e.g. hardcoded credentials, API keys, access tokens and Configuration Files containing sensitive informationIdentification of Security Vulnerabilities
GitHub repos may contain outdated libraries or dependencies with known vulnerabilities, analysis of source code could lead to identification of critical vilunerbilities e.g. injjection attacksUnderstanding the Development Practices
Analyzing repositories can provide insights into the organization’s coding practices and patch mangegment process.Third-party Component Tracking
Supply chain attacks has become common these days , github repos can be used to identiy dependancies of third parties and their respective vulnerabilities.Historical Analysis
Commit History: Past commits can reveal changes in security practices and expose periods where the codebase was vulnerable.Information for Targeted Attacks
Github repos leaks employee information e.g. emails, user id , and password patternCompliance and Legal Risks
License Violations: Ensuring compliance with open-source licenses is crucial to avoid legal issues.
Data Exposure: Unintentional exposure of personal data can lead to compliance issues with regulations like GDPR or HIPAA.Open-source Intelligence (OSINT)
GitHub repos can also provide competitive intelligence about an organization’s projects and focus areas.
Understanding GitHub Recon
GitHub Recon is a multifaceted process that involves discovering and analyzing information related to individuals, organizations, and projects on the GitHub platform. It typically involves:
- Profile Analysis: Examining user profiles for personal and professional information, such as email addresses, linked social media accounts, and the repositories they contribute to. This information can be valuable for building a target’s profile or identifying potential attack vectors.
- Repository Scanning: Searching for repositories containing sensitive data, configuration files, and potential vulnerabilities. This can help identify security weaknesses, misconfigurations, or hidden backdoors.
- Codebase Assessment: Analyzing the code within repositories to identify coding standards, software vulnerabilities, and possible exploits. This is particularly useful for security researchers and penetration testers.
- Insights Gathering: Extracting trends, technologies, or software libraries frequently used by a target organization, which can provide valuable insights for competitive analysis and threat intelligence.
Why GitHub Recon Matters
GitHub Recon is relevant for various use cases and professions:
- Cybersecurity: Security professionals can use GitHub Recon to identify potential security weaknesses and vulnerabilities within code repositories, allowing for proactive threat mitigation.
- Red Teaming: Red teamers can leverage GitHub Recon to gather intelligence about their target organization’s technology stack, coding practices, and key individuals to better plan their attack strategies.
- Competitive Analysis: Businesses can gain insights into their competitors’ technology choices, software libraries, and open-source contributions to make informed decisions and gain a competitive edge.
- Research and Development: Developers and researchers can benefit from GitHub Recon by finding valuable open-source projects, libraries, or code snippets to enhance their work or leverage open-source solutions.
We will be covering two ways of GitHub Recon :
- Manual (Code Search OR GitHub Dorking)
- Automated (Using Tools)
Manual – Code Search || GitHub Dorking
To find sensitive information such as passwords, API keys, database files, etc., code search is just the application of certain keywords.
You are able to do a global search on GitHub for code. Additionally, you can look for code inside a specific organization or repository. You must have a GitHub account and be logged in in order to search for code in all public repositories. GitHub provides “rich code searching” that scans public github repositories.
How to do a recon on GitHub ?
- When looking for a specific company, you can use basic search terms like facebook.com, google.com, etc.
2. We can also use multi-word strings like “Authorization: Bearer”
We now have to open a repository and look for any sensitive data, such as a password or authorization token.
3. we can search for specific file names like “filename:vim_settings.xml”
4. we can search for specific languages like “language:PHP”
This covered the fundamentals of github dorking, but you can also combine queries like “facebook.com filename:vim_settings.xml” to obtain all of the vim_settings.xml files associated with a specific Facebook company. In the same manner, we can run various query searches.
There is a concept called GitHub Dorking that lessens the effort involved in manually searching for sensitive information on github. Finding sensitive information on github requires a lot of time and checking every repository belonging to a specific company.
In addition to repositories, you can search for users, wikis, code, commits, issues, discussions, packages, marketplaces, and topics. .
Apart from using GitHub Dorks, we can directly search for the source. For doing that you need to find your target company’s github page and from there you can find all their developers and monitor their accounts.
Once you find your target company’s github page you just need to check the list of people that are associated with your target company. This can be done by clicking on the “people” tab.
Now you will need to manually go through each one and look for exposures and this will take long time. You should be looking for urls, api keys, usernames, passwords etc. It might be possible that someone has uploaded something sensitive here.
GitHub Dork List :
# | GitHub Dorks for Finding Files | # | GitHub Dorks for Finding Languages |
1 | filename:manifest.xml | 44 | language:python username |
2 | filename:travis.yml | 45 | language:php username |
3 | filename:vim_settings.xml | 46 | language:sql username |
4 | filename:database | 47 | language:html password |
5 | filename:prod.exs NOT prod.secret.exs | 48 | language:perl password |
6 | filename:prod.secret.exs | 49 | language:shell username |
7 | filename:.npmrc _auth | 50 | language:java api |
8 | filename:.dockercfg auth | 51 | HOMEBREW_GITHUB_API_TOKEN language:shell |
9 | filename:WebServers.xml | # | GiHub Dorks for Finding API Keys, Tokens and Passwords |
10 | filename:.bash_history <Domain name> | 52 | api_key |
11 | filename:sftp-config.json | 53 | “api keys” |
12 | filename:sftp.json path:.vscode | 54 | authorization_bearer: |
13 | filename:secrets.yml password | 55 | oauth |
14 | filename:.esmtprc password | 56 | auth |
15 | filename:passwd path:etc | 57 | authentication |
16 | filename:dbeaver-data-sources.xml | 58 | client_secret |
17 | path:sites databases password | 59 | api_token: |
18 | filename:config.php dbpasswd | 60 | “api token” |
19 | filename:prod.secret.exs | 61 | client_id |
20 | filename:configuration.php JConfig password | 62 | password |
21 | filename:.sh_history | 63 | user_password |
22 | shodan_api_key language:python | 64 | user_pass |
23 | filename:shadow path:etc | 65 | passcode |
24 | JEKYLL_GITHUB_TOKEN | 66 | client_secret |
25 | filename:proftpdpasswd | 67 | secret |
26 | filename:.pgpass | 68 | password hash |
27 | filename:idea14.key | 69 | OTP |
28 | filename:hub oauth_token | 70 | user auth |
29 | HEROKU_API_KEY language:json | # | GitHub Dorks for Finding Usernames |
30 | HEROKU_API_KEY language:shell | 71 | user:name (user:admin) |
31 | SF_USERNAME salesforce | 72 | org:name (org:google type:users) |
32 | filename:.bash_profile aws | 73 | in:login (<username> in:login) |
33 | extension:json api.forecast.io | 74 | in:name (<username> in:name) |
34 | filename:.env MAIL_HOST=smtp.gmail.com | 75 | fullname:firstname lastname (fullname:<name> <surname>) |
35 | filename:wp-config.php | 76 | in:email (data in:email) |
36 | extension:sql mysql dump | # | GitHub Dorks for Finding Information using Dates |
37 | filename:credentials aws_access_key_id | 77 | created:<2012–04–05 |
38 | filename:id_rsa or filename:id_dsa | 78 | created:>=2011–06–12 |
GitHub Dorks for Finding Information using Extension | |||
39 | extension:pem private | 79 | extension:json mongolab.com |
40 | extension:ppk private | 80 | extension:yaml mongolab.com |
41 | extension:sql mysql dump | 81 | [WFClient] Password= extension:ica |
42 | extension:sql mysql dump password | 82 | extension:avastlic “support.avast.com” |
43 | extension:json api.forecast.io | 83 | extension:json googleusercontent client_secret |
So this was all about manual technique to find sensitive information on github, let’s move to some automated technique.
Automated Technique – Using Tools
However, automation makes the process easy and fast but it also has its own drawback of false-positive results. Not every time the result is false-positive but sometimes it may happen.
TruggleHog
It is easy to use. It searches through git repositories for secrets, digging deep into commit history and branches. This is effective at finding secrets accidentally committed.
How to use it ?
- Go to https://github.com/dxa4481/truffleHog and clone it (download it)
- Use to below given command to find for sensitive information
Command : python3 trufflehog.py –regex –entropy=False https://github.com/<yourTargetRepo>
Pros:
- Trufflehog is a free and open-source tool.
- It is easy to use and can be run with a simple command.
- Trufflehog can detect a wide variety of different data types
Cons:
- Trufflehog is a passive tool and cannot detect data that is not publicly accessible.
- Trufflehog can generate a large number of false positives, which can be time-consuming to filter through.
- Trufflehog is not always accurate and can sometimes identify data that is not sensitive.
Github-Dorks
It is a simple python tool that can search through your repository or your organization/user repositories.
How to use it ?
- Go to https://github.com/techgaun/github-dorks and clone it (download it
Install all the given requirements - Use the below given command to search for all the repositories of a single user
Command : python github-dork.py -u <username>
Pros:
- github-dorks is a free and open-source tool.
- It is easy to use and can be run with a simple command
- It can be used to search all repos of a user or organization.
Cons:
- It can be slow because it waits for the api rate limit to be reset.
- The output formatting is not great compared to truffehog
Nightfall
AI-powered scanner to detect API keys, secrets, sensitive information. Nightfall Radar API lets you integrate with GitHub public or private repository, AWS, GitLab, Twilio, etc. The scan results are available on a web interface or CLI output. You can read more about it here : https://radar.nightfall.ai/docs#get-results. Basically it is a web application that helps you to scan github repositories.
How to use it ?
- Go to https://radar.nightfall.ai/ and login with your github account.
- Simply add your github’s target URL on the left top section for scanning
3. After the scan is completed, click on results to view the information and you’ll be redirected to another page like below one
4. Now click on GitHub to see the leaked information on github
Pros:
- Comprehensive: Nightfall Radar can uncover a wide range of exposed assets, including subdomains, AWS buckets, Git repositories, and more.
- Easy to use: Nightfall Radar has a user-friendly interface and can be operated with simple commands.
- Regular updates: Nightfall Radar is actively maintained with frequent updates to address new vulnerabilities and enhance its capabilities.
Cons:
- Commercial tool: Nightfall Radar is a paid tool, requiring a license for its use
- Potential false positives: Nightfall Radar’s scans may generate false positives, which can be time-consuming to verify.
Conclusion
In conclusion, GitHub reconnaissance plays a pivotal role in attack surface management by providing a comprehensive view of potential vulnerabilities and exposure points within an organization’s public and private code repositories. It’s a critical step in identifying security weaknesses that could be exploited by attackers, as well as in safeguarding sensitive information that might be inadvertently exposed.
Through the meticulous examination of repositories, commit histories, code snippets, configuration files, and developer discussions, organizations can uncover hidden risks ranging from hard-coded secrets to outdated dependencies with known vulnerabilities. This process not only helps in preemptively addressing security loopholes but also enhances the overall security posture by informing better coding practices and tighter access controls.
Furthermore, GitHub reconnaissance extends beyond mere vulnerability identification. It offers insights into the development culture and practices of an organization, paving the way for more informed and strategic security decision-making. By understanding how and why certain security flaws are introduced, organizations can implement more effective security training and awareness programs for their developers.
Firecompass CART Platform utilizes AI powered engines to run active probing on github and continuously discover common misconfigurations, code leaks , hardcoded credentials and much more.
Some other automated tools for scanning GitHub Repositories :
https://github.com/BishopFox/GitGot
https://github.com/Talkaboutcybersecurity/GitMonitor
https://github.com/michenriksen/gitrob
https://github.com/tillson/git-hound
https://github.com/kootenpv/gittyleaks
https://github.com/awslabs/git-secrets https://git-secret.io/
Author: Vishal Vishwakarma
Guide: Sanket Kakde
About FireCompass:
FireCompass is a SaaS platform for Continuous Automated Pen Testing, Red Teaming and External Attack Surface Management (EASM). FireCompass continuously indexes and monitors the deep, dark and surface webs using nation-state grade reconnaissance techniques. The platform automatically discovers an organization’s digital attack surface and launches multi-stage safe attacks, mimicking a real attacker, to help identify breach and attack paths that are otherwise missed out by conventional tools.
Feel free to get in touch with us to get a better view of your attack surface.