So we can see by increasing malware size you can help prevent detection in some cases (the size normally needs to be over 100MB for these cases to be true), but that is not the reason this malware sample is so large.
We continue our initial analysis by inspecting the imports of the sample. Surely with such a large file it has an immense amount of functionality and imports just about every Windows DLL available right? But we see it only imports 3 DLLs and they are fairly standard with the exception of WS2_32.dll which is used for networking.
So we have a large malware sample that seems to have limited functionality. At this stage we would assume the malware would be packed and contain embedded executables/libraries that it would drop to the system once the program is run. This is partially correct. Further inspection of the strings in the file reveals evidence of Python error messages and DLLs. This leads us to the conclusion that this is most likely a Python script that has been converted into an EXE.
A Brief History
Python seems like a logical choice for rapidly developing malware due to its ease of use, vast community support, and extensive availability of 3rd party modules/libraries. However, there are a few issues that have prevented interpreted languages like Python from being used for malware development in the past.
The biggest problem is that if the language is not installed on the system the program won't run. This has become less of an issue as programs were developed that allowed these scripts to be converted into standalone EXEs. Problem solved...right? Well, the issue with converting the scripts into standalone EXEs is that the resulting executable has to embed many Python libraries which significantly increases the size of the program.
For example, a C program compiled to an EXE may be 300KB, but a Python script with similar functionality that is converted into an EXE may be over 10MB. That is an enormous difference in size and subsequently performance. However, with advances in technology speed/memory constraints aren't much of a hinderance to malware campaigns these days and they can get away with running larger/slower code on users faster machines.
So, Where's The Python?
There are two main programs used to convert Python scripts to EXEs and they are PyInstaller and Py2Exe (there are more than these two, but these are the main ones). As with most conversions there are tools to reverse the process and they are pyinstxtractor.py and python-exe-unpacker. And finally once we have extracted the binary ".pyc" Python files we can decompile them into actual Python code with a tool called Uncompyle6. The process for extraction is depicted below:
PyInstaller -> pyinstxtractor.py -> Uncompyle6
Py2Exe -> python-exe-unpacker -> Uncompyle6
We can see from the terminal output that it gives us possible suggestions for which of the output binary Python files could be the program entry point. This hint is useful because there are numerous files that are output as part of the Python standard library and don't necessarily have anything to do with the actual malware code.
One file that looks of particular interest is "token_grabber.pyc". If we run this file through our Uncompyle6 program we can convert the binary Python file back to the original Python source code...kind of.
Malware authors know this tool is used and there are numerous ways to sabotage the decompilation process, so you may not always get nice and pretty Python code, but you will get a decent representation of what the source was. We can see that from our sample we get a mixture of legitimate Python source and IR code that couldn't fully be decompiled back to source:
We Have Source. Now What?
Now you can analyze the code as you would normal malware, but this article was mainly aimed at how to triage/analyze Python based malware and will not go into a full analysis of this sample. However, there are some tidbits that we would like to point out about this sample.
What Is It?
This sample is part of an open source malware family known as AnarchyGrabber and some of its source code can be found on GitHub. Its purpose is to steal Discord credentials and billing information if available. It also attempts to spread itself through your Discord contact list. It will then send any information it found to the malware author's private Discord server via a webhook API. We happened to find the author's webhook in the decompiled code:
We were unsure whether a WebHook could be used for reading data but found that it could only be used for sending data to the server. If it were able to be used for reading we were going to setup a bot to poll the server and read back what the malware author was receiving. However, even if we can't read data, we can send data. That means, if one were so inclined, they could setup a bot, or a series of bots to spam the malware author's server with false records...but of course we wouldn't do that...
Anti-Analysis
Anyway, the next part of this malware we would like to point out is that it employs a couple anti-analysis techniques mainly in the form of DNS lookups. It reaches out to 3 legitimate websites for different reasons:
pastebin.com
api.ipify.org
discordapp.com
Each one of these lookups are continuously compared to loopback, localhost, and the network name of the computer. If the websites resolve to any of these names then the malware will exit. This is an anti-analysis technique to try and avoid network monitoring tools:
Conclusion
Python malware is on the rise. Gone are the days where malware author's try to develop small/stealthy programs that can be downloaded via limited bandwidth connections and run on a toaster. The name of the game now is development speed and Python's easy development framework makes pumping out programs easier than ever. With its obtuse analysis angles the conversion of Python to EXE also makes for slightly tougher malware to analyze and surely won't be going away any time soon. Happy hunting.