Authors:
(1) Anna-Katharina Wickert, Technische Universität Darmstadt, Darmstadt, Germany (wickert@cs.tu-darmstadt.de);
(2) Lars Baumgärtner, Technische Universität Darmstadt, Darmstadt, Germany (baumgaertner@cs.tu-darmstadt.de);
(3) Florian Breitfelder, Technische Universität Darmstadt, Darmstadt, Germany (florian.breitfelder@tu-darmstadt.de);
(4) Mira Mezini, Technische Universität Darmstadt, Darmstadt, Germany (mezini@cs.tu-darmstadt.de).
Table of Links
3 Design and Implementation of Licma and 3.1 Design
4 Methodology and 4.1 Searching and Downloading Python Apps
4.2 Comparison with Previous Studies
5 Evaluation and 5.1 GitHub Python Projects
6 Comparison with previous studies
9 Conclusion, Acknowledgments, and References
ABSTRACT
Background: Previous studies have shown that up to 99.59 % of the Java apps using crypto APIs misuse the API at least once. However, these studies have been conducted on Java and C, while empirical studies for other languages are missing. For example, a controlled user study with crypto tasks in Python has shown that 68.5 % of the professional developers write a secure solution for a crypto task.
Aims: To understand if this observation holds for real-world code, we conducted a study of crypto misuses in Python.
Method: We developed a static analysis tool that covers common misuses of 5 different Python crypto APIs. With this analysis, we analyzed 895 popular Python projects from GitHub and 51 MicroPython projects for embedded devices. Further, we compared our results with the findings of previous studies.
Results: Our analysis reveals that 52.26 % of the Python projects have at least one misuse. Further, some Python crypto libraries’ API design helps developers from misusing crypto functions, which were much more common in studies conducted with Java and C code. Conclusion: We conclude that we can see a positive impact of the good API design on crypto misuses for Python applications. Further, our analysis of MicroPython projects reveals the importance of hybrid analyses.
1 INTRODUCTION
Cryptography, hereafter crypto, is widely used nowadays to protect our data and ensure confidentiality. For example, without crypto, we would not be able to securely use online banking or do online shopping. Unfortunately, previous research results show that crypto is often used in an insecure way [3, 4, 7, 9, 11]. One such problem is the choice of an insecure parameter, like an insecure block mode, for crypto primitives like encryption. Many static analysis tools exist to identify these misuses such as CryptoREX [13], CryptoLint [4], CogniCryptSAST [8], and Cryptoguard [12].
While these tools and the respective in-the-wild studies concentrate on Java and C, user studies suggest that the existing Python APIs reduce the number of crypto misuses. Acar et al. [2] conducted an experiment with 307 GitHub users which had to solve 3 crypto-related development tasks. They observed that 68.5 % of the professional developers wrote a secure solution in Python for the given task. Within a controlled experiment with 256 Python developers that tried to solve simple crypto tasks, Acar et al. [1] identified that a simple API design, like the Python library cryptography, supports developers in writing secure code. However, no empirical in-the-wild study has yet confirmed that crypto misuses in Python occur less frequently than in Java or C.
To empirically evaluate crypto misuses in Python, we introduce LICMA, a multi-language analysis framework with support for 5 different Python crypto APIs and Java’s JCA API. We provide 5 different rules [4] for all Python APIs and 6 different rules [4] for JCA to detect the most common crypto misuses. With LICMA, we analyzed 895 popular Python apps from GitHub and 51 MicroPython projects to gain insights into misuses in Python. We identified that 52.26 % of the Python GitHub apps with crypto usages have at least one misuse causing 1,501 misuses. In total, only 7 % of the misuses are within the application code itself, while the remaining misuses are introduced by dependencies. Further, our study of MicroPython projects reveals that developers in the embedded domain tend to use crypto via C code. Thus, revealing the importance of hybrid static analyses, which can track program information, e.g., a call graph, across multiple languages [5, 10].
To further improve our understanding whether Python APIs are less prone to crypto misuses, we make the following contributions:
• A novel, multi-language analysis tool to detect crypto misuses in Python and Java. For Python we cover crypto misuses for 5 common Python crypto APIs and for Java the standard API JCA.
• An empirical study of crypto misuses in the 895 most popular Python applications on GitHub revealing 1,501 misuses.
• A comparison of our findings in Python applications with previous studies about crypto misuses in-the-wild for Android Apps and firmware images in C. We observed that most Python applications are more secure and the distribution between the concrete types of misuses differ a lot.
• An empirical study of crypto misuses in MicroPython projects which reveals the importance of hybrid static analyses.
• A replication package including both data sets used for our study, the results of our analysis, and the code of LICMA[1].
This paper is available on arxiv under CC BY 4.0 DEED license.
[1] dx.doi.org/10.6084/m9.figshare.16499085