Evaluating Crypto Misuses in Python: Insights from GitHub Projects and MicroPython

cover
6 May 2024

Authors:

(1) Anna-Katharina Wickert, Technische Universität Darmstadt, Darmstadt, Germany (wickert@cs.tu-darmstadt.de);

(2) Lars Baumgärtner, Technische Universität Darmstadt, Darmstadt, Germany (baumgaertner@cs.tu-darmstadt.de);

(3) Florian Breitfelder, Technische Universität Darmstadt, Darmstadt, Germany (florian.breitfelder@tu-darmstadt.de);

(4) Mira Mezini, Technische Universität Darmstadt, Darmstadt, Germany (mezini@cs.tu-darmstadt.de).

Abstract and 1 Introduction

2 Background

3 Design and Implementation of Licma and 3.1 Design

3.2 Implementation

4 Methodology and 4.1 Searching and Downloading Python Apps

4.2 Comparison with Previous Studies

5 Evaluation and 5.1 GitHub Python Projects

5.2 MicroPython

6 Comparison with previous studies

7 Threats to Validity

8 Related Work

9 Conclusion, Acknowledgments, and References

5 EVALUATION

In this Section, we present our evaluation of crypto misuses in real-world Python applications from GitHub and MicroPython projects.

5.1 GitHub Python Projects

Overall, LICMA identified 1,501 possible misuses in our data set of Python applications attributed to 81 repositories. Thus, 52.26 % of the 155 analyzed Python applications that contain crypto usages have at least one misuse.

As discussed in Section 3.1, we distinguish between potential and definite misuses. While a potential misuse requires a manual inspection to decide whether it is harmful, a definite misuse indicates that the analysis was able to resolve the respective crypto parameter. Thus, we know that a rule of LICMA is definitely violated by the respective API call. We identified 85 definite misuses which could be identified within one class file and thus are local. The remaining 1,416 misuses are potential misuses.

5.1.1 Dependencies. From the 1,501 misuses, only 7.00 % are within the application code and not in dependencies. These misuses are within 14.81 % of the applications with at least one misuse. The remaining misuses are found in dependencies and can be reduced to 290 unique misuses. Thus, developers introduce most of their misuses by using dependencies rather than using the respective crypto library directly. In total, only 12 projects are affected by misuses in the application code itself.

To understand the influence of dependencies within the applications with the most misuses, we inspected the 10 Python projects with more than 30 misuses. Figure 2 confirms the previous observation that most of the misuses are in dependencies and only a few projects use a crypto library directly. The Scapy repository[6] is an exception as all misuses are in its code. Our investigation reveals that this repository is often used as a dependency by other projects. Thus, these findings can be attributed to dependencies as well.

While, the previous results focused on the projects, we also inspected the dependencies causing most of the misuses. In total, 5 of the observed dependencies are responsible for a misuse in more than 10 different projects. For 34 projects we observe a misuse within the repository Scapy which was the only analyzed repository in Figure 2 without any misuse in its dependencies. Thus, confirming our previous observation about this project.

Figure 2: Python projects with 30 or more misuses.

5.1.2 Rules and Python Cryptographic Libraries. In order to get a better understanding of the underlying reasons of the misuses, we evaluated how often a misuse per rule and library occurs (Fig. 3). Our analysis reveals that most of the misuses are related to the use of different block modes, §1 and §2, of the M2Crypto library, and constant encryption keys, §3, for the cryptography library. We assume that the few numbers of misuses of §1 and §2 of cryptography are due to the design of the library. The library suggests to use a high-level symmetric encryption class, called Fernet, instead of the low-level symmetric encryption classes which would enable the respective misuses. Most of the misuses due to insecure PBE configurations, §4 and §5, are by developers using the library PyCrypto. While, none of the 3 previously mentioned libraries make it impossible to produce a misuse of one of our 5 rules, the library PyNaCl completely prevents misuses for §1, §2, and §5. In our study, we found only 2 instances of a misuse due to a constant encryption key for PyNaCl.

5.1.3 Reasons for Definite Misuses. Among the definite misuses, there is at least one misuse for all previously discussed rules. We identified 13 definite misuses in 5 different projects which use

Figure 3: The number of misuses found per rule for the Python libraries cryptography, M2Crypto, PyCrypto, PyNaCl. The later library avoids per design misuses for §1, §2 and §5.

the ECB encryption mode (§1). In all cases, the mode is passed explicitly with the parameter and not implicitly as in Java [4]. For 8 misuses we observed that a static IV is used, e.g. zero bytes, and thus resulting in an insecure encryption with the block mode CBC (§2). Furthermore, we identified that the scapy project which is also commonly used as a dependency uses a constant encryption key resulting in 14 misuses (§3). For example, we found a zero byte-array as key.

For password-based encryption, we identified 18 misuses within 14 projects which pass a static salt instead of a randomly generated one (§4). In total, we identify 32 misuses which are due to requesting only 1 iteration instead of an value greater than 1,000 as recommended (§5). Thus, the process of generating a password is faster but very insecure, e.g., due to dictionary attacks.

This paper is available on arxiv under CC BY 4.0 DEED license.


[6] https://github.com/secdev/scapy