Leaderboard



The leaderboard shows the results of different models of four tasks: Code Retrieval, Code Summarization, Code Completion, and Type inference.

If you would like to report your results here, please submit an issue to CodeMind GitHub repository. All results will be updated if they pass our check.





MRR of our model and baseline methods for the task of code retrieval over CodeSearchNet dataset. (Best scores are in boldface.)

Rank Model Go Java JS PHP Python Ruby

1

2022
cpt-code M

Code Retrieval baseline

97.5 94.4 86.5 97.2 99.9 85.5

2

2023
CodeT5+ 770M

Code Retrieval baseline

92.7 76.2 71.3 70.1 75.8 78

3

2023
CodeT5+ 220M

Code Retrieval baseline

92.4 76.1 70.8 69.8 75.6 77.7

4

2020
GraphCodeBERT

Code Retrieval baseline

84.1 75.7 71.1 72.5 87.9 73.2

5

2020
CodeBERT

Code Retrieval baseline

69.3 86.8 74.8 70.6 84 70.6

6

2021
SelfAttn

Code Retrieval baseline

78.45 66.55 50.38 65.78 79.09 47.96

7

2021
Conv1D

Code Retrieval baseline

70.87 60.49 38.81 61.92 67.29 36.53

8

2021
NBOW

Code Retrieval baseline

66.59 59.92 47.15 54.75 63.33 42.86

9

2021
BiRNN

Code Retrieval baseline

65.8 48.6 23.23 51.36 48.28 19.35

Performance of our model and baseline methods for the task of code summarization over Python-Doc dataset. (Best scores are in boldface.)

Rank Model BLEU-4 METEOR BOUGE-L

1

07/01/2021
PLBART

Code Summarization baseline

32.71 18.13 46.05

2

07/01/2021
Transformer + BPE

Code Summarization baseline

31.57 17.74 45.18

3

07/01/2021
Transformer

Code Summarization baseline

30.64 17.65 44.59

4

07/01/2021
Seq2Seq + Attn

Code Summarization baseline

25.57 14.40 39.41

5

07/01/2021
Tree2Seq + Attn

Code Summarization baseline

23.35 12.59 36.49

MRR of our model and baseline methods for the task of code completion over Py150 dataset. (Attr: attribute; Num: numeric constant; Name: variable, module; Func: function parameter name; Token: all tokens. Best scores are in boldface.)

Rank Model Accuracy
Attr Num Name Param Token

1

2021
TravTrans

Code Completion baseline

72.08 68.55 76.33 71.08 83.17

2

2021
GPT-2

Code Completion baseline

70.37 62.20 63.84 73.54 82.17

3

2022
PyCoder

Code Completion baseline

\ \ \ \ 76.93

4

2021
LSTM

Code Completion baseline

51.67 47.45 46.52 66.06 73.73

Accuracy of our model and baseline methods for the task of type inference over Py150 dataset.

Rank Model All types Any types
Acc@1 Acc@5 Acc@1 Acc@5

1

07/01/2021
DeepTyper

Type Inference baseline

0.52 0.67 0.43 0.67

2

07/01/2021
Transformer

Type Inference baseline

0.34 0.64 0.37 0.75