Featured Publications

A core goal of the XCodeMind team is to push the field of programming language analysis with natural language processing forward through cutting-edge research. Below is a list of our featured publications..

To cite NaturalCC, please cite.

              @inproceedings{wan2022naturalcc,
              author    = {Yao Wan and
                           Yang He and
                           Zhangqian Bi and
                           Jianguo Zhang and
                           Yulei Sui and
                           Hongyu Zhang and
                           Kazuma Hashimoto and
                           Hai Jin and
                           Guandong Xu and
                           Caiming Xiong and
                           Philip S. Yu},
              title     = {NaturalCC: An Open-Source Toolkit for Code Intelligence},
              booktitle   = {Proceedings of 44th International Conference on Software Engineering, Companion Volume},
              publisher = ,
              year      = {2022}
            }

NaturalCC: An Open-Source Toolkit for Code Intelligence Yao Wan, Yang He, Zhangqian Bi, Jianguo Zhang, Yulei Sui, Hongyu Zhang, Kazuma Hashimoto, Hai Jin, Guandong Xu, Caiming Xiong, Philip S. Yu ICSE 2022 Demo Track
NaturalCC is a sequence modeling toolkit that allows researchers and developers to train custom models for many software engineering tasks, e.g., code summarization, code generation, code retrieval, code clone detection, and so on. Our vision is to bridge the gap between programming language and natural language through some machine learning techniques.
What Do They Capture? - A Structural Analysis of Pre-Trained Language Models for Source Code Yao Wan, Wei Zhao, Hongyu Zhang, Yulei Sui, Guandong Xu and Hai Jin ICSE 2022
In this paper, we conduct a thorough structural analysis aiming to provide an interpretation of pre-trained language models for source code (e.g., CodeBERT, and GraphCodeBERT) from three distinctive perspectives: (1) attention analysis, (2) probing on the word embedding, and (3) syntax tree induction.
Improving Automatic Source Code Summarization via Deep Reinforcement Learning Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, Philip S. Yu ASE 2018
In this paper, we incorporate an abstract syntax tree structure as well as sequential content of code snippets into a deep reinforcement learning framework (i.e., actor-critic network) for the task of source code summarization.
Multi-Modal Attention Network Learning for Semantic Source Code Retrieval Yao Wan, Jingdong Shu, Yulei Sui, Guandong Xu, Zhou Zhao, Jian Wu, Philip S. Yu ASE 2019
We propose a novel multi-modal attention network for semantic source code retrieval. A comprehensive multi-modal representation is developed for representing unstructured and structured features of source code, with one LSTM for the sequential tokens of code, a Tree-LSTM for the AST of code and a GGNN (Gated Graph Neural Network) for the CFG of code.