Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
I Don't Want to Teach My Garbage DSL, Either (github.com/cf020031308)
83 points by cf020031308 on May 31, 2019 | hide | past | favorite | 53 comments


The article this is in response to says:

> What’s worse than data silos? Data silos that invent their own query language.

and:

> I just want my SQL back. It’s a language everyone understands, it’s been around since the seventies, and it’s reasonably standardized.

Huh? SQL is just as DSL-y as anything else. It's so non-standardized that I can't take any program written for any RDBMS and run it against any other RDBMS, unless the author specifically extracted all the DSL into an interface layer and ported it to that other RDBMS already. Even something as common as "CREATE INDEX" has different syntax on every database I've ever used.

It's even worse than HTML/JS/CSS 10 or 20 years ago, where at least it had a chance of minimally working. And you can't seriously tell me that it's harder to make SQL implementations (where you control the database) portable, compared to, say, C++ compilers (where you don't control the hardware architecture).

Yeah, data silos with their own query languages are bad. But RDBMSs are some of the worst offenders, because they pretend this doesn't apply to them. It's easy for newcomers to justify creating their own SQL-like languages, because every existing database is already merely an SQL-like language.

All these database vendors need to get together and make some "SQL5" that finally works consistently.


Huh? SQL is just as DSL-y as anything else. It's so non-standardized that I can't take any program written for any RDBMS and run it against any other RDBMS, unless the author specifically extracted all the DSL into an interface layer and ported it to that other RDBMS already.

There are lots of weird parallels between SQL and Smalltalk. For example, both SQL and Smalltalk are constructed with Douglas Hofstader's "strange loops." All Smalltalk object instances have a class, and the classes themselves are object instances. SQL tables are defined using metadata stored in SQL tables. Another parallel, is that the language variants are very similar, but ultimately incompatible to the point where translation is non-trivial. In Smalltalk, this is due to a toothless language standard, resulting from political maneuvering by various language vendors. Not sure what it is in the case of SQL.

There is a problematic relationship between DSLs and libraries in languages of sufficient power. In programming languages with a certain level of expressive power, it's easy to write a DSL on top of a library, or on top of another DSL.

https://xkcd.com/927/

I used to joke about "lady/gentleman computer scientists." What's the difference? A computer scientist knows how to implement another computer language. A lady/gentleman computer scientist knows when they should know better.

A DSL should be used to super-charge your project's very idiosyncratic special context. The trick is knowing better when you shouldn't. Accessing a database isn't your project's very idiosyncratic special context.


If you’re developing a DSL which is just a query language, you are reinventing the wheel, and you should ask yourself if any benefit of your language over SQL is worth the effort of all your users to learn your new query language. It may be worth it of your data can not be usefully be modeled by tables; e.g. document query languages like XQuery and even simple XPath are useful and can not be easily replaced with SQL.


Well are there any data on how much more fuzz there is to learn a domain specific language compared to a domain specific library?

Arduino "whatever it is called but really c++"-language or Warcraft 3 scenario language are kinda nice for what they are.


Neither of those are query languages.


As a broader point, the general message of this post is not said often enough and it should be.

People get attached to their creations and especially if the thing starts as a hobby project, the author can get carried away pretty far. When they eventually want to share their creation with the world they might get a really cold shower.

IMO us the programmers started becoming less practical and more tinker-y. Which is not a bad thing; it's healthy for the psyche. But when it comes to pieces of tech that can end up getting used on vast scales, we should be more responsible.

In this lane of thought, DSLs should be reserved to niche business domains and not as the first tool you reach for when you can't quite describe a problem with your programming language of choice.


At the end of the day, most reports/dashboard UI's are a DSL over SQL. It may not be language in the traditional sense - more of a Visual or Interactive language. However, it is a language where the primitives (drop down, text boxes, etc) are understood by many, even more so than the language it is abstracting (SQL).


Unrelated to the article, just found it interesting how he’s using github to literally auto generate his blog via their md-to-html thingy, and issues for comments.


It's not a bad idea, aside from putting your blog entirely on a third party like github. Utterances is the alternative to disqus that uses github issues.[0]

Personally I think the best solution is using a static site generator like Hugo or Zola to control how you generate your content, and then host that using Caddy + Docker. A database is overkill.

You can also use caddy's git plugin to kick off automatic builds build from any git repo, not just a github one. It just needs to support webhooks, I believe.[1]

[0]: https://utteranc.es/

[1]: https://caddyserver.com/docs/http.git


raganwald.com uses github’s md-to-html thingy, which is jekyl.

But there is nearly zero walled garden downside. I own the domain, I don’t use github.io, so I don’t fear link rot if I move.

Because it’s git, I always have a copy of everything locally, I don’t depend on github for storage.

Because it’s jekyll, I can generate my blog on my own system and upload it somewhere else whenever I want.

I don’t support comments at all, but that’s a personal choice. I’m not in the community business, I outsource comments to Hacker News. Which is also how most of my readers want to discuss my writing.


Yeah, you're always going to be reliant on third parties in a complex world. What is important is how beholden or locked in you are in any given arrangement. If you are free to move from one contractor to another because they are all interoperable, then you're independent in a more important sense.


I suppose that works, I guess I just like hosting my own services over trusting an external service to stay the same and/or play nice in the future. That kicks the bucket to the VPS hosting providers, but with docker it's trivially easy to set up a new server.

With the setup I mentioned above, I just build, commit, then git push, which triggers a pull on both of my servers due to webhooks.


If you use the static-generator approach, how do you do comments? If you just allow posts you open yourself up to many fun attacks.


I don't have comments. Not every website needs comments.

Getting linked to reddit or hn and being discussed is fine enough.


I like that... subcontract to a better or audience-specific comment platform. If it does get on those, and there is value, one could edit and put a 'Follow discussions on [HN link], [Reddit link]'.


The author may not have intended for the blog posts to be read in that way. Its also hosted at https://cf020031308.github.io/blog/i-dont-want-to-teach-my-g...


That doesn't change the point. The github.io page is still markdown rendered as html, and the comment button still links to a new issue.


I don't think it's unrelated to the article, actually. Static site generators like Jekyll are a nice success story for DSLs. The takeaway would be: be more like Markdown, less like a poorly thought out SQL wrapper.


I don't understand why writing an ORM requires creating a new query language. The point of an ORM, as I understood it, was to think about objects, not query languages. When I wrote my garbage ORM (https://hrorm.org) I just ignored everything that's hard (you can write your own SQL for that) but the easy things (basic CRUD) don't require a query language at all. They are just provided by an object-based interface.


The trouble with SQL is that its foundation is old and hasn't been given much love with regards to modern language theory, and any attempts to talk about how the language could be improved are shot down on the basis of confusing the language with the application of the language.

The trouble with SQL is that it doesn't easily allow for basic building blocks that ORMs benefit from, like composability. This leads many ORM authors to build their own query language, which support features that SQL lacks or does poorly, that compile to SQL in order to simplify the rest of the development of the ORM.

This comes as a result of SQL not being a very good language (not to be confused with the application of declarative querying of relational data, which is beneficial and could benefit greatly from a good query language) by modern standards.


SQL is a great language. It's for querying records from a database. Trying to wrap stateful objects around SQL is where the industry went wrong, IMO. It does make some code more portable across datastores, but for me the price is too high. When you know how to write efficient SQL statements, ORM feels like having a hand tied behind your back.


> It's for querying records from a database.

The application of the language does not make the language itself great. As we have seen with DSLs that often come bundled with ORMs, there are other languages to query databases with (even if they ultimately compile to SQL), and I would argue that some of them do a lot better job than SQL does for providing a comfortable and cohesive environment for developers to write queries in.

In the imperative language space, we have one hundred and one different languages all trying to make things slightly more comfortable to developers. C, Go, and Rust can all be used to write the same kind of application, more or less, but that does not mean all of those languages are equally great. The same is true of declarative queries. Just because SQL is popular does not mean it is great.

> When you know how to write efficient SQL statements, ORM feels like having a hand tied behind your back.

ORMs and SQL are orthogonal concepts, really. There is no reason an ORM couldn't require you to hand-roll every single SQL statement. An ORM's concern is simply mapping the results of that query into the application's objects. That some ORM implementations also include functionality to build queries for you, often on top of the aforementioned DSLs, to make that mapping require less effort on the developer is, I would argue, largely a result of SQL being a bad language.


Working with relational databases might be difficult for those not acquainted with them. However, I think the SQL language as an interface primarily for relation databases is a great language.

I think, invariably when you use ORM, you end up having query-specific object structures. For example [1].

You also might be passed objects with deferred fields [2]. This will be completely opaque to someone consuming the resulting object. You'll eventually run into this problem [3]. Solving the lazy load problem requires an understanding of how SQL works in the first place. And if you look at the solution in that example, it's an ORM-wrapped series of joins.

From 3: > For good measure, we add a raiseload to throw an exception if we try to load anything that we didn’t load here.

Who wants to live in this world?

1: https://stackoverflow.com/a/45905714/5573538 2: https://docs.sqlalchemy.org/en/13/orm/loading_columns.html#c... 3: https://engineering.shopspring.com/speed-up-with-eager-loadi...


> However, I think the SQL language as an interface primarily for relation databases is a great language.

Which language(s) are you using as a point of comparison and why is SQL better than those other languages? SQL is no doubt better than nothing, but that is not in the spirit of our discussion.


I agree that SQL as the lingua franca for talking to relational data stores is not ideal, and that in various ways it could be improved. I personally would start out with some of the issues of cosmetic syntax orthogonality, though I realize there are bigger fish to fry.

But two things come to mind on reading your comment. 1. For the problem of Object-Relational-Mapping (ORM) which query language is used is not that important. If one accepts that there is one general purpose language for application development and another for managing persistent data, you will be left with a situation where there is a bunch of redundant code to write in whatever query language and application language you have available, and it would be nice to not have to write all that by hand. 2. Over the years, I have encountered many projects (languages) that attempt to address various deficiencies in SQL. I feel like a lot of these projects make life too hard on themselves. Rather than building extensions to existing database engines, they want you to adopt not just their new language, but their whole persistence stack. If I was smarter, and I had a design for a "better SQL", I would try to integrate it into Postgres or some other existing database engine. I guess that's what some of the ORM authors are trying to do, but by compiling their language to SQL supporting all databases at once. But returning to point 1, that seems pretty separate from the issue of Object-Relational-Mapping.


I agree to an extent. But, I disagree that learning ORMs are a waste of time. For example, I'm on Phoenix, using Ecto. I find `Repo.all` much more convenient that "SELECT * FROM .." etc.

I don't know if it's better or worse - the original blog post asking governments to regulate data formats/code!


Well, Ecto is not an ORM. It's a Data Mapper[0]. Rails' ActiveRecord is ORM.

I agree that some DSLs make SQL better and easier to reason about -- and that Ecto is one of them.

[0] https://en.wikipedia.org/wiki/Data_mapper_pattern


Aren't data mappers generally referred to as ORMs?


Not to my knowledge. Data Mappers should be the "less batteries included" piece of tech compared to the ORMs.

As a single non-representative example, ActiveRecord has `before_insert` hooks you can simply add as methods to your model class. Ecto doesn't have those.


That seems silly. They are a technology for mapping from a relational database to a system based on objects. That's what an ORM is.

> Data Mapper: A layer of Mappers (473) that moves data between objects and a database...

(https://martinfowler.com/eaaCatalog/dataMapper.html)

Sounds like an ORM to me.

Some people try to distinguish between the "data mapper" pattern and the "active record" pattern (it's not just the name of the Rails library, it's a pattern... which the Rails library may or may not implement very well). Both are ORMs, because both are ways of mapping from an rdbms to an object system.

(Neither of which actually has to do with query DSL. We could imagine just taking the part of ActiveRecord that produces queries, but having it return simple hash/string literals. It wouldn't really be an "ORM" (except in the most technical sense that even hashes are objects in ruby), but it would still have the parts you don't like. The nature of query building is actually not related to 'data mapper' vs 'active record' -- you could have an instance of either in which you wrote raw SQL queries, or an instance of either which used the same non-SQL DSL)).

But even distinguishing between "data mapper" and "active record", in actual practice, I don't think there are two completely separate, distinct, and unified camps. I don't think these categories actually serve well to deliniate the ORMs we've got. Instead, there are a just a whole bunch of approaches, some more light weight than others, some more mature/reliable than others, some 'leakier' than others, differing on all sorts of additional dimensions. I agree that some ORMs are better than others -- and some may disagree on which these are -- I don't think saying "data mapper" is actually useful for understanding which these are.


As long as the ORM lets you pass raw SQL when needed I'm fine with some nice convenience methods for the majority of the work. When it comes to really complex stuff that you are just ending up fighting the ORM and its choice of SQL is when I get frustrated and go to plain ol SQL


A DSL can be immensely useful in terms of useability. But it is not trivial to be done right. You need to define a consistent grammar, make sure it's intuitive to your users and free of all sorts of fun bugs that happen when odd constructs are put together.


I think you're describing what Fowler[1] calls an "external" dsl ie separate lexer, parser, and code generator. The alternative being "internal" dsl, something that uses a languages own constructs to create stuff; ie like how Ruby is is used in several well known frameworks.

[1] https://www.goodreads.com/book/show/8082269-domain-specific-...


I think this is still applicable advice for embedded DSLs too, since the grammar of the DSL will be made up of the ways in which you can compose the operations you provide, whether they are parsed by the host language's parser or your own.

Overall, I agree that it's important to think about how to make sure that operations compose in an unintuitive, expected way. I find this hard to figure out without thinking of it in terms of a grammar and semantics, whether it's embedded or not.


Also if you don’t write a debugger and report line numbers on errors, you have not fully understood your problem domain.


I inadvertently created a scripting language recently. It started out as simple commands like "thing param" (kinda like TCL, I guess), but it slowly evolved (needed conditional commands, state, etc) and after a while I wanted proper expression parsing because my ad-hoc command parser was way too brittle. So I wrote an EBNF grammar and used Clojures Instaparse to parse it into an AST and then I added functions, loops and other more traditional programming constructs because why not. I've cleaned up the grammar a few times and now its a simple but relatively full featured scripting language (with some domain-specific features). There are basically three parts: the text parser (EBNF grammar and instaparse), AST transformation that takes the AST that the grammar created and cleans it up (removing or simplifying intermediary nodes, turning string numbers into numbers, extracting function definitions, etc) and the evaluator (which, I'm quite proud of: its a pure function of the scripts inputs + mutable state and it generates a collection of "actions" and new mutable state).

Now onto why I mention this:

> free of all sorts of fun bugs that happen when odd constructs are put together

I actually had an interesting experience with this, but in a good way. As I said above, I hoist the code for functions out of the AST (the final AST is basically a map of function to function body AST, where one of these is the main script code) and the declarations are replaced by the transformation step with a local variable binding that gets set to some data describing how to call the function (key into the map, list of parameters, other stuff), so when you do 'f()', it looks up 'f' in local state, gets this data and figures out how to call the function from that. What this meant was that I could pass functions around. And so I got higher order functions almost for free, without having planned for it, just because I thought "hmm, what if I do this?" (the current version did add some special logic to capture state, for closures, but I did that after when I decided why not support it for real).

As an aside, my little language is a synchronous language[1], which acts as if its runtime is instantaneous (ie, the world does not change until it completes), which makes a lot of things easier (both for me and for users) and makes sense in the domain, since its basically commands that get triggered when certain conditions are met. This almost makes it worth writing a custom language over using an existing one, but if I were to start again, I'd probably just use Lua or Javascript and save some time, even if the final language isn't so easy for my target audience.

[1] https://en.wikipedia.org/wiki/Synchronous_programming_langua...


I don’t agree. Yes, of course everyone should consider if they really even need their DSL. But sometimes as the author rightly notes, sometimes the alternative to a DSL is a lower or significantly more complex existing interface style (such as SQL).


What is the difference between a "DSL" and an "API"?

We can't write software without making APIs...


Then again, REST/RPC are query languages in their own right.

How would one go about securing this thing anyways?

(I agree with the basic sentiment, but find it somewhat unrealistic)


Do anyone here expose real sql querying capabilities to your users?


In a dark past my company did: because these users were paying CMS users, they would not try to hack things, but rather because of the power of SQL and their lack of knowledge, they broke a lot which required backups to restore. We are talking mid 90s and Perl CGI scripts.


That sounds kind of terrifying.


Read only access to a replica databases seems quite a sensible choice for me.

If along you expose not the the real tables but views I honestly don't see what could go wrong.


The main thing I can think of is that migrations would become much more painful, because your API is the entire capability surface of the relational database you're using -- no more and no less. Hyrum's Law suggests that your users will come to depend on every facet of it.

Then again, no doubt there are some cases where this is the best solution. But it's worth being cautious before adopting an approach like this.


The parent poster's grammar is kind of bad but I think their idea is to expose only views, not actual tables, so there is some degree of dependency immunization. Views are a pretty good poor person's substitute for an API right & proper.


Could you not have given them read-only access?


传统的sql迟早被淘汰,我的见解就是这些了!没了


If your API is so complicated that it becomes a DSL, then perhaps there should be better communication between frontend and backend teams. Setting up a lean API contract makes more sense than creating an internal DSL or exposing internal data stores directly to users.


Some things are irreducibly complicated. You can't set up a lean API contract for a complicated thing without giving up power. The example of SQL is pertinent. Any attempt to set up a lean API contract for a relational database is doomed to be weak and painful to use. Without knowing the specific problem domain the blogger was dealing with, you can't know whether that was the situation. But if we assume they are competent, then we can assume a lean API was not possible.


complexity != complicated


It literally is though.

com·plex·i·ty n. the state or quality of being intricate or complicated.


They exist as distinct words for a reason. The difference is important. Search "complexity vs complicated", I even found a few articles about it right here on hacker news.


Do you extend this criticism to GraphQL?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: