A new study titled Competition-Level Code Generation with AlphaCode shows promising results for goal-oriented code synthesis using deep sequence-to-sequence models. It extends the previous networks (e.g. Codex, GPT-Neo) and releases a new dataset named CodeContests to contribute to the future research benchmarks.
Deep transformer-based sequence processing has established a solid footing in the industry and academia with many applications from language tasks to molecular biology research. Due to its high transfer-learning capacity, the pretraining recipe empowers search engines, translation services, and chatbots. AlphaCode aims to provide a proof-of-concept for its application to competitive programming. The work is part of increasing research efforts to exploit sequence models for task-based program generation (e.g. numerical data science problem solver JuPyT5).
AlphaCode includes several transformer architectures in varying depths (i.e. from 300 million to 41 billion parameters) with multi-query attention modules. The architectures consist of an asymmetric encoder-decoder pair with 1536 and 768 input tokens at the encoder and decoder respectively. The networks are pretrained with selected Github open-source code repositories (715 GB) using cross-entropy loss at the decoder and masked language modeling loss at the encoder side. The tokens used during training are produced by the SentencePiece tokenizer. The final fine-tuning is carried out using the proposed CodeContests dataset. To compare the models' performance to those of real programmers, several Codeforces challenges are used. The results indicate that AlphaCode was able to reach an average ranking at the top 54.3% across 10 different contests.
In addition to language comprehension which can be modeled by transformers, competitive programming brings additional complexity due to restrictions of the selected challenge such as input/output parsing and computational efficiency. Unlike general library/framework repositories, competitive programming code repositories are relatively scarce, limiting available data sources for the fine-tuning step. To improve the predictions, AlphaCode outputs are sampled, filtered, and clustered to sieve the best possible candidate per task.
The study shows initial promises of program synthesis using deep networks, but they are still far from practical usage and there may be a need for larger datasets.
Examples of AlphaCode generated submissions can be viewed on its official website. You may also find the related DeepMind blog post interesting.