PyCG: Practical Call Graph Generation in Python | IEEE Conference Publication | IEEE Xplore

PyCG: Practical Call Graph Generation in Python


Abstract:

Call graphs play an important role in different contexts, such as profiling and vulnerability propagation analysis. Generating call graphs in an efficient manner can be a...Show More

Abstract:

Call graphs play an important role in different contexts, such as profiling and vulnerability propagation analysis. Generating call graphs in an efficient manner can be a challenging task when it comes to high-level languages that are modular and incorporate dynamic features and higher-order functions. Despite the language's popularity, there have been very few tools aiming to generate call graphs for Python programs. Worse, these tools suffer from several effectiveness issues that limit their practicality in realistic programs. We propose a pragmatic, static approach for call graph generation in Python. We compute all assignment relations between program identifiers of functions, variables, classes, and modules through an inter-procedural analysis. Based on these assignment relations, we produce the resulting call graph by resolving all calls to potentially invoked functions. Notably, the underlying analysis is designed to be efficient and scalable, handling several Python features, such as modules, generators, function closures, and multiple inheritance. We have evaluated our prototype implementation, which we call PyCG, using two benchmarks: a micro-benchmark suite containing small Python programs and a set of macro-benchmarks with several popular real-world Python packages. Our results indicate that PyCG can efficiently handle thousands of lines of code in less than a second (0.38 seconds for 1k LoC on average). Further, it outperforms the state-of-the-art for Python in both precision and recall: PyCG achieves high rates of precision ~99.2% and adequate recall ~69.9%. Finally, we demonstrate how PyCG can aid dependency impact analysis by showcasing a potential enhancement to GitHub's "security advisory" notification service using a real-world example.
Date of Conference: 22-30 May 2021
Date Added to IEEE Xplore: 07 May 2021
Print ISBN:978-1-6654-0296-5
Print ISSN: 1558-1225
Conference Location: Madrid, ES
References is not available for this document.

I. Introduction

A call graph depicts calling relationships between subroutines in a computer program. Call graphs can be employed to perform a variety of tasks, such as profiling [1], vulnerability propagation [2], and tool-supported refactoring [3].

Select All
1.
Valgrind, “Callgrind: a call-graph generating cache and branch prediction profiler,” 2020. [Online]. Available: http://valgrind.org/docs/manual/cl-manual.html
2.
H. Shahriar and M. Zulkernine, “Mitigating program security vulnerabilities: Approaches and challenges,” ACM Comput. Surv., vol. 44, no. 3, Jun. 2012.
3.
A. Feldthaus, T. Millstein, A. Møller, M. Schäfer, and F. Tip, “Tool-supported refactoring for JavaScript,” in Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, ser. OOPSLA ’11. New York, NY, USA : Association for Computing Machinery, 2011, pp. 119–138.
4.
J. Hejderup, A. van Deursen, and G. Gousios, “Software ecosystem call graph for dependency management,” in Proceedings of the 40th International Conference on Software Engineering: New Ideas and Emerging Results, ser. ICSE-NIER ’18. New York, NY, USA : ACM, 2018, pp. 101–104.
5.
R. Kikas, G. Gousios, M. Dumas, and D. Pfahl, “Structure and evolution of package dependency networks,” in Proceedings of the 14th International Conference on Mining Software Repositories, ser. MSR ’17. IEEE Press, 2017, pp. 102–112.
6.
( 2016 ) The npm blog: changes to npm’s unpublish policy. [Online; accessed 26 - July - 2020 ]. [Online]. Available: https://blog.npmjs.org/post/141905368000/changes-to-npms-unpublish-policy
7.
( 2020 ) npm(1)—a JavaScript package manager. [Online; accessed 26 - July - 2020 ]. [Online]. Available: https://github.com/npm/cli
8.
( 2020 ) pip 20.0.2: The PyPA recommended tool for installing Python packages. [Online; accessed 26 - July - 2020 ]. [Online]. Available: https://pypi.org/project/pip/
9.
S. H. Jensen, A. Møller, and P. Thiemann, “Type analysis for JavaScript,” in International Static Analysis Symposium. Springer, 2009, pp. 238–255.
10.
H. Lee, S. Won, J. Jin, J. Cho, and S. Ryu, “SAFE: Formal specification and implementation of a scalable analysis framework for ECMAScript,” in FOOL 2012: 19th International Workshop on Foundations of Object-Oriented Languages. Citeseer, 2012, p. 96.
11.
V. Kashyap, K. Dewey, E. A. Kuefner, J. Wagner, K. Gibbons, J. Sarracino, B. Wiedermann, and B. Hardekopf, “JSAI: A static analysis platform for JavaScript,” in Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2014. New York, NY, USA : Association for Computing Machinery, 2014, pp. 121–132.
12.
Y. Ko, H. Lee, J. Dolby, and S. Ryu, “Practically tunable static analysis framework for large-scale JavaScript applications,” in Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE ’15. IEEE Press, 2015, pp. 541–551.
13.
M. Madsen, B. Livshits, and M. Fanning, “Practical static analysis of javascript applications in the presence of frameworks and libraries,” in Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2013. New York, NY, USA : Association for Computing Machinery, 2013, pp. 499–509.
14.
A. Feldthaus, M. Schäfer, M. Sridharan, J. Dolby, and F. Tip, “Efficient construction of approximate call graphs for JavaScript IDE services,” in Proceedings of the 2013 International Conference on Software Engineering, ser. ICSE ’13. IEEE Press, 2013, pp. 752–761.
15.
T. Sotiropoulos and B. Livshits, “Static analysis for asynchronous JavaScript programs,” in 33rd European Conference on Object-Oriented Programming (ECOOP 2019), ser. Leibniz International Proceedings in Informatics (LIPIcs), A. F. Donaldson, Ed., vol. 134. Dagstuhl, Germany : Schloss Dagstuhl—Leibniz-Zentrum fuer Informatik, 2019, pp. 8:1–8:30. [Online]. Available: http://drops.dagstuhl.de/opus/volltexte/2019/10800
16.
M. Madsen, F. Tip, and O. Lhoták, “Static analysis of event-driven node.js JavaScript applications,” SIGPLAN Not., vol. 50, no. 10, pp. 505–519, Oct. 2015.
17.
GitHub, “The state of the octoverse,” https://octoverse.github.com/, 2019, [Online; accessed 09 - January - 2020 ].
18.
D. Fraser, E. Horner, J. Jeronen, and P. Massot, “Pyan3: Offline call graph generator for Python 3,” https://github.com/davidfraser/pyan, 2018, [Online; accessed 09 - January - 2020 ].
19.
G. Gharibi, R. Tripathi, and Y. Lee, “Code2graph: Automatic generation of static call graphs for Python source code,” in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ser. ASE 2018. New York, NY, USA : Association for Computing Machinery, 2018, pp. 880–883.
20.
G. Gharibi, R. Alanazi, and Y. Lee, “Automatic hierarchical clustering of static call graphs for program comprehension,” in IEEE International Conference on Big Data, Big Data 2018, Seattle, WA, USA, December 10-13, 2018. IEEE, 2018, pp. 4016–4025.
21.
G. Zhang and J. Wuxia, “Depends is a fast, comprehensive code dependency analysis tool,” https://github.com/multilang-depends/depends, 2018, [Online; accessed 04 - August - 2020 ].
22.
N. Milojkovic, M. Ghafari, and O. Nierstrasz, “It’s duck (typing) season! ” in 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC), May 2017, pp. 312–315.
23.
M. Felleisen, R. B. Findler, and M. Flatt, Semantics engineering with PLT Redex. Mit Press, 2009.
24.
M. Madsen, O. Lhoták, and F. Tip, “A model for reasoning about JavaScript promises,” Proc. ACM Program. Lang., vol. 1, no. OOPSLA, Oct. 2017. [Online]. Available: https://doi.org/10.1145/3133910
25.
S. Guarnieri and B. Livshits, “GATEKEEPER: Mostly static enforcement of security and reliability policies for JavaScript code,” in Proceedings of the 18th Conference on USENIX Security Symposium, ser. SSYM’09. USA : USENIX Association, 2009, pp. 151–168.
26.
C.-A. Staicu, M. Pradel, and B. Livshits, “SYNODE: Understanding and automatically preventing injection attacks on Node. js.” in NDSS, 2018.
27.
( 2020 ) symtable. [Online; accessed 20 - July - 2020 ]. [Online]. Available: https://docs.python.org/3/library/symtable.html
28.
( 2020 ) AST in Python. [Online; accessed 20 - July - 2020 ]. [Online]. Available: https://docs.python.org/3/library/ast.html
29.
M. Reif, F. Kübler, M. Eichberg, D. Helm, and M. Mezini, “Judge: Identifying, understanding, and evaluating sources of unsoundness in call graphs,” in Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2019. New York, NY, USA : Association for Computing Machinery, 2019, pp. 251–261.
30.
A. Rahman, C. Parnin, and L. Williams, “The seven sins: Security smells in infrastructure as code scripts,” in Proceedings of the 41st International Conference on Software Engineering, ser. ICSE ’19. IEEE Press, 2019, pp. 164–175. [Online]. Available: https://doi.org/10.1109/ICSE.2019.00033

Contact IEEE to Subscribe

References

References is not available for this document.