LLM Code Analysis

Results (Sortable)

Click headers to sort. The first header row indicates task categories: Syntax, Semantic/Static, Dynamic.

Loading…

Tasks & Metrics

Nine tasks across Syntax, Semantic/Static, and Dynamic dimensions with unified metrics and diagnosis rules.

AST (Syntax Tree)

Pass rate: AST_passes / AST_cases
Shown in Results as AST

Expression

Hit@5 / Hit@10 / Hit@20
Shown in Results as Expr@k

CFG

Pass rate: CFG_passes / CFG_cases
Shown in Results as CFG

CG

Pass rate: CG_passes / CG_cases
Shown in Results as CG

DP (Data‑flow)

F1 score
Shown in Results as DP F1

Taint

F1 score
Shown in Results as Taint F1

Pointer

Accuracy
Shown in Results as Pointer

Mutant

Few‑shot success rate
Zero‑shot success rate

Flaky

Summary accuracy
Concept accuracy

Artifacts & Plots

Snapshot numbers are available via the JSON artifact and Results page. The repository includes scripts to regenerate figures and tables.

AST pass/fail bars (outputs) — AST: pass/fail bars (latest outputs).

CFG pass/fail bars (outputs) — CFG: pass/fail bars (latest outputs).

CG pass/fail bars (outputs) — CG: pass/fail bars (latest outputs).

Get Started

Minimal steps to run and reproduce metrics:

# setup
bash scripts/setup_venv.sh -e -x
source .venv/bin/activate
cp .env.example .env  # fill provider keys

# evaluate and aggregate
python evaluation/evaluate_multi_models.py --out results/aggregated_summary.json

# render figures (optional)
python evaluation/render_graphs.py

See README and README_zh for datasets, prompts, and script options.

Supplementary Material

See Results and JSON for detailed notes and aggregated metrics.

Citation

@article{ma2023exploring, title={Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs}, author={Ma, Wei and Lin, Zhihao and Liu, Shangqing and Hu, Qiang and Liu, Ye and Wang, Wenhan and Zhang, Cen and Nie, Liming and Li, Li and Liu, Yang and Jiang, Lingxiao}, journal={arXiv preprint arXiv:2305.12138}, year={2023} }

Acknowledgements

We thank Ximing Xing for providing the webpage template.

Exploring Code Analysis: Zero‑Shot Insights on Syntax and Semantics with LLMs

ArXiv Preprint (2023)