ruby_parser 1.0: a Ruby Parser written in Ruby

Ryan Davis announced ruby_parser, a parser for Ruby source code written in Ruby. The parser was written using Ruby yACC (RACC), a parser generator that is bundled with the Ruby standard library:

ruby_parser (RP) is a ruby parser written in pure ruby (utilizing racc--which does by default use a C extension). RP's output is the same as ParseTree's output: s-expressions using ruby's arrays and base types.

The library is easy enough to use:

RubyParser.new.parse "1+1"

which returns

s(:call, s(:lit, 1), :+, s(:array, s(:lit, 1)))

A Ruby parser, written in pure Ruby has long been missing in the Ruby world. Just to clarify the term "Pure Ruby" in this context: this means the parser's code

consists solely of Ruby source files
does not add any native extensions or other C code (eg. with RubyInline) which requires a C compiler to be present on the user's system

These properties are crucial to ensure that the code runs across all Ruby runtimes. An implementation that would require a C-based native extension, would be unusable on Ruby versions that don't support them, e.g. JRuby, XRuby, or the .NET based runtimes IronRuby or Ruby.NET. Even if they supported native extensions (something under consideration for JRuby), native extensions cause deployment problems, because a shared library or DLL of the extension would have to be shipped for any conceivable OS/CPU combination (otherwise some users would not be able to use it). RubyInline, also a project by Ryan Davis, helps with this a bit by automatically compiling the inline C code, but this still requires a C compiler to be present on the target system - something that's not guaranteed, particularly on Windows systems.

A lack of a pure Ruby parser has been negligible for some time in Ruby's history, since getting the Abstract Syntax Tree (AST) of some Ruby code was possible with utilities such as ParseTree. However, ever since the explosion of alternative Ruby runtimes, Ruby parsers have been reimplemented several times - twice in Java (JRuby and XRuby), once in C# (Ruby.NET wrote the parser also used by IronRuby). All of these provide different ASTs and different ways of getting at it.
This has caused some issues for Ruby source tools. E.g. the Ruby Refactoring tools, now part of the Eclipse-based Aptana/RDT, are tied to both Java and the JRuby AST and not usable from other Ruby implementations. With similar tools now being (re)written for other Java based Ruby IDEs, this means that a large amount of code quality and code manipulation tools are now locked into Java and JRuby. Not just that: the logic of these tools is written in Java and not Ruby, which makes them less approachable for Ruby developers.

The pure Ruby parser now offers a chance to change that - a Ruby IDE or other tool, can now get at a Ruby AST without getting locked in. E.g. a Java based IDE can keep a JRuby instance around and run ruby_parser in it. To that end, the current version still needs to add correct source locations to the generated output - i.e. every AST node needs to know the offset where it's source code representation starts and ends. This is crucial for source tools - pure structural information is useful, but if the tool doesn't know where a node is, it can't modify it in the file.

Another client for ruby_parser is Rubinius, a Ruby VM written in (mostly) Ruby, which takes it's Ruby parser from MRI. Using ruby_parser will allow it to remove yet another piece of C code. To avoid the chicken-and-egg question "How can a Ruby VM work if it's parser is Ruby code?": during the build process of the Rubinius VM, the Ruby source code of ruby_parser can be compiled into Rubinius bytecode. When Rubinius starts, it loads the ruby_parser bytecode file - something that doesn't need the parser - and now has a running Ruby parser.

There's still a lot of work left for ruby_parser, as can be seen by a few of the issues mentioned in the release notes:

Known Issue: Speed sucks currently. 5500 tests currently run in 21 min.

Known Issue: Code is waaay ugly. Port of a port. Not my fault. Will fix RSN.

Known Issue: I don't currently support newline nodes.

Known Issue: Totally awesome.

Known Issue: dasgn_curr decls can be out of order from ParseTree's.

TODO: Add comment nodes.

Topics

Beyond the Breach: Proactive Defense in the Age of Advanced Threats

Transforming Legacy Healthcare Systems: A Journey to Cloud-Native Architecture

Navigating LLM Deployment: Tips, Tricks, and Techniques

Trends in Engineering Leadership: Observability, Agile Backlash, and Building Autonomous Teams

From Local to Production: A Modern Developer’s Journey Towards Kubernetes

Helpful links

Choose your language

Write for InfoQ

Rate this Article

This content is in the Dynamic Languages topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

AWS Amplify and Amazon S3 Integration Simplifies Static Website Hosting

Anthropic Releases New Claude Models and Computer Use Feature

Meta Releases NotebookLlama: Open-Source PDF to Podcast Toolkit

How Recall.ai Saved $1M on AWS by Eliminating WebSockets

Crossing the Feedback Chasm - a Conversation with Ken Finnigan

Thoughtworks Technology Radar Oct 2024 - From Coding Assistance to AI Evolution

Carle Lerche Talking at QCon SF about Rust: a Productive Language for Writing Database Applications

Google Introduces Gemini AI Features to Android Studio

GitHub Universe 2024 Unveils AI Innovations and Developer-Centric Tools

Transforming Legacy Healthcare Systems: A Journey to Cloud-Native Architecture

New "Laws" Announced at iSAQB Software Architecture Gathering

Improving Developer Experience Using Automated Data CI/CD Pipelines

Trends in Engineering Leadership: Observability, Agile Backlash, and Building Autonomous Teams

Using DORA for Sustainable Engineering Performance Improvement

Monorepos: beyond the Technicalities

QCon SF: Mandy Gu on Using Generative AI for Productivity at Wealthsimple

QCon SF: Large Scale Search and Ranking Systems at Netflix

Navigating LLM Deployment: Tips, Tricks, and Techniques

From Local to Production: A Modern Developer’s Journey Towards Kubernetes

Timescale Bolsters AI-Ready PostgreSQL with pgai Vectorizer

KubeCon + CloudNativeCon NA 2024: Key Announcements and Project Updates

QCon San Francisco

QCon London

InfoQ Dev Summit Boston

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?