Skip to content

Allow using Prism for parsing#1662

Draft
kddnewton wants to merge 1 commit into
lsegal:mainfrom
kddnewton:prism
Draft

Allow using Prism for parsing#1662
kddnewton wants to merge 1 commit into
lsegal:mainfrom
kddnewton:prism

Conversation

@kddnewton
Copy link
Copy Markdown

Hi there!

I wanted to open this PR to discuss if you're interested in using the Prism parser instead of Ripper. This is a WIP branch that introduces the ability to do that via an ENV var. It also includes some benchmarks. Locally I'm seeing about a 300% performance improvement through this work.

It's not done, and some edge case tests are still failing, and I'm sure there's lots to work through still. But I wanted to pause here because this is becoming quite a lot of code to review. So I'd love to know if you're interested in this direction before I continue down the path.

I also have been purposefully avoiding sharing code between the processors to make this diff smaller (which is in vain at this point...), but if this were to truly go through, I think it would make more sense to start factoring stuff out into shared classes.

Let me know what you think.

@kddnewton kddnewton mentioned this pull request Mar 21, 2026
18 tasks
@lsegal
Copy link
Copy Markdown
Owner

lsegal commented Mar 21, 2026

I really only skimmed the code, but I don't see a handler architecture involved in this implementation. Seems like you're currently directly hardcoding all behavior into the visitors. This kind of fundamentally goes against YARD's extensibility architecture, and is likely why you're seeing a performance boost vs ripper which objectively "does a lot more" than what this does. It might not account for the full 300%, but this implementation isn't an apples to apples comparison and therefore isn't really fair to make any sort of "faster" claims.

In order for this PR to be taken a bit more seriously, you'd need to be interested in a full "feature complete" porting, the same way we did for the legacy handlers to Ripper. tldr: if we were to accept a Prism upgrade, it would have to at the very least follow the same architectural goals for the project. Extensibility via handlers is key.

The second half would be backward compatibility support, specifically for sexp formatting. This is obviously more contentious, but I don't really have enough knowledge of Prism to know if it shares the same ast node naming conventions that ripper does, or at least if there's a mapping. I would expect that this is more possible, and if so, it would be beneficial to try to build a compatibility layer. The sexp layer is important for the developer ergonomics of YARD handlers, as it enables the ease of use in the extensibility layer. I'm open to different approaches here, but like I said, I don't know enough about what Prism offers in this respect.

I do like the way Prism manages source locations and some clarity around node management. I wouldn't be opposed, but it can't violate YARD's core architectural principles.

@kddnewton
Copy link
Copy Markdown
Author

Thanks for the quick feedback! If I understand you correctly, are you saying you'd like to keep the existing AstNode functionality unmodified?

The only reason I ask is that the Prism AST is very rich as it is, it has way more information than ripper provides. So there isn't really a need to build your own syntax tree since it is already there.

If you are willing to migrate, then I think we could get quite a lot more performance out of it, and still design a nice extensible architecture (and handle older ones). If you are not willing to migrate, then effectively this PR would need to change to be a direct Prism AST -> YARD AST conversion.

Which direction are you thinking?

@lsegal
Copy link
Copy Markdown
Owner

lsegal commented Mar 21, 2026

If I understand you correctly, are you saying you'd like to keep the existing AstNode functionality unmodified?

Not quite. There are two separate issues here:

  1. We need to retain the existing handler architecture, specifically the ability to register Hander::Base subclasses that process nodes in the same way that YARD currently does. This is non-negotiable.
  2. It would be a bonus to have some form of equivalence between YARD's AstNode and prism. This doesn't mean using the exact same AstNode, but it might mean implementing AstNode#=== such that s(:var_ref, s(:kw, "self")) == SelfNode.new or whatever the node is. That would avoid having to deal with backward compatibility problems that come with changing the AST format. I'll put it this way: the Prism parser will never ever become a default parser for YARD (even if the option exists) given that we have over a decade of handlers, plugins, and extensions built on top of the ripper architecture, much of which is colocated inside of specific Ruby gem release archives and have almost no chance of being updated, so finding a way to maintain compatibility-- or switch into compatibility mode-- is very important.

@lsegal
Copy link
Copy Markdown
Owner

lsegal commented Apr 16, 2026

Curious if you have plans to take these changes up. Adding a new parser library is not a simple slot-in, YARD has an extensibility and plugin layer that requires quite a bit of care to work with. If the plan is to completely migrate in a backward incompatible way, it would need to be well documented with migration paths for consumers. To make this a bit more complex, we already have a "legacy" parser stack, so that would need to be renamed, ripper moved, etc.

@kddnewton
Copy link
Copy Markdown
Author

Hey @lsegal — yes it's on my list, but honestly it's going to be a while. This was definitely the largest project I have attempted to migrate, and clearly I was missing some context :). I will do my best to come up with a design that works for everyone, be it completely backward compatible or with a very clear migration process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants