Allow using Prism for parsing#1662
Conversation
|
I really only skimmed the code, but I don't see a handler architecture involved in this implementation. Seems like you're currently directly hardcoding all behavior into the visitors. This kind of fundamentally goes against YARD's extensibility architecture, and is likely why you're seeing a performance boost vs ripper which objectively "does a lot more" than what this does. It might not account for the full 300%, but this implementation isn't an apples to apples comparison and therefore isn't really fair to make any sort of "faster" claims. In order for this PR to be taken a bit more seriously, you'd need to be interested in a full "feature complete" porting, the same way we did for the legacy handlers to Ripper. tldr: if we were to accept a Prism upgrade, it would have to at the very least follow the same architectural goals for the project. Extensibility via handlers is key. The second half would be backward compatibility support, specifically for sexp formatting. This is obviously more contentious, but I don't really have enough knowledge of Prism to know if it shares the same ast node naming conventions that ripper does, or at least if there's a mapping. I would expect that this is more possible, and if so, it would be beneficial to try to build a compatibility layer. The sexp layer is important for the developer ergonomics of YARD handlers, as it enables the ease of use in the extensibility layer. I'm open to different approaches here, but like I said, I don't know enough about what Prism offers in this respect. I do like the way Prism manages source locations and some clarity around node management. I wouldn't be opposed, but it can't violate YARD's core architectural principles. |
|
Thanks for the quick feedback! If I understand you correctly, are you saying you'd like to keep the existing AstNode functionality unmodified? The only reason I ask is that the Prism AST is very rich as it is, it has way more information than ripper provides. So there isn't really a need to build your own syntax tree since it is already there. If you are willing to migrate, then I think we could get quite a lot more performance out of it, and still design a nice extensible architecture (and handle older ones). If you are not willing to migrate, then effectively this PR would need to change to be a direct Prism AST -> YARD AST conversion. Which direction are you thinking? |
Not quite. There are two separate issues here:
|
|
Curious if you have plans to take these changes up. Adding a new parser library is not a simple slot-in, YARD has an extensibility and plugin layer that requires quite a bit of care to work with. If the plan is to completely migrate in a backward incompatible way, it would need to be well documented with migration paths for consumers. To make this a bit more complex, we already have a "legacy" parser stack, so that would need to be renamed, ripper moved, etc. |
|
Hey @lsegal — yes it's on my list, but honestly it's going to be a while. This was definitely the largest project I have attempted to migrate, and clearly I was missing some context :). I will do my best to come up with a design that works for everyone, be it completely backward compatible or with a very clear migration process. |
Hi there!
I wanted to open this PR to discuss if you're interested in using the Prism parser instead of Ripper. This is a WIP branch that introduces the ability to do that via an ENV var. It also includes some benchmarks. Locally I'm seeing about a 300% performance improvement through this work.
It's not done, and some edge case tests are still failing, and I'm sure there's lots to work through still. But I wanted to pause here because this is becoming quite a lot of code to review. So I'd love to know if you're interested in this direction before I continue down the path.
I also have been purposefully avoiding sharing code between the processors to make this diff smaller (which is in vain at this point...), but if this were to truly go through, I think it would make more sense to start factoring stuff out into shared classes.
Let me know what you think.