groovy_bridge

Bridge technologies: Groovy and Grails

Groovy and Grails are the kind of technology that I like to call “bridge technologies”. This is the real value of them for me and on this post I’ll try to explain my definition of a “bridge”.

What is a bridge?

It’s an architectural construct that allow us to go from point A to B easily. You’ll not have to climb a huge mountain, fly, run like crazy or take a big jump: you just use what you already know (walk) to get to the other side.

Some technologies are bridges. Looking back at my education I can detect some of those. The first probably was Visual Basic. When I wanted to start my life as a developer the desktop was the main media: the web was starting, but it was not something so common (at least not here in Brazil). So the first language I tried was C++, which was a frightening experience. All those parameters and functions to create a single window! For someone who was just starting and didn’t have access to so many resources as we have today that was a huge challenge for me. A challenge that I almost loose.

So, actually by accident, I discovered Visual Basic and using only the mouse and a few lines of code I could start to develop my first desktop application. Of course it was a rough one, ugly and badly written, but was my first desktop application. Little by little I started to do some experiments with the Visual Basic syntax and constructs and begin to like that because as more and more I discovered about the language more confident I became.

Later Java emerged and I really wanted to learn object orientation and all that stuff but my mind was too “procedural” (maybe Dijkstra was right). Was really hard to understand that at the beginning because Visual Basic was object based, not object oriented at that time before .net. But I also learned Delphi in parallel and since I was getting used to it, and Delphi was object oriented I could start little by little to play with object oriented concepts on my projects. One day I was confident and started coding in Java, which I still do today, and also got back to C++ and didn’t feel any fear at all!

Again, little by little, I crossed another bridge. Can you see it? There were two bridges here. The first one was Visual Basic. I started with what I already got: mouse skills and the little coding experience (almost nothing) that I got. Later, as I got confident with my coding skills I found another bridge which was Delphi. With those coding skills, a step at a time I was learning object orientation and finally crossed another bridge.

Bridge technologies are this: they allow me to learn something using what I already have. When I think about languages like Erlang, at least for me, they don’t seem like bridges but huge jumps I must take to learn and get to the other side. The learning curve looks like a wall. Now I can talk about Groovy and Grails.

The Groovy bridge

groovy_bridge

When I first met Groovy I was basically a Java programmer. I could not understand a lot of things like dynamic languages and typing, testing, embedded languages, AOP and so many things that were on the other side of the Groovy bridge! But the language was familiar for me because it really looked like Java. And again, I saw my code starting to change. From things like

public class HelloWorld {
 
public static void main(String args[]) {
     System.out.println("Hello World!");
}
 
}

To something like

System.out.println("Hello World!");

to finally

println "Hello world!"

It was a smooth transition. I had the chance to keep experiment with the language and as I was getting more confident with it I gave the next step. And so I started to play with weak typing, got to understand the advantages of having a dynamic language and then started to learn other languages which became easier for me to understand like Python, Ruby and so many others. Groovy definitely was one of those languages that made me a better programmer.

And as a Groovy consultant I realized another amazing thing. Many people which wanted to use the Java ecosystem but did not feel confident about it started to learn it through Groovy because the language was really close to many other 4GL languages (I wrote about it some time ago).

Now with the release of version 2.4 I see Groovy as a bridge to Android: people will learn to develop for the Android platform thanks to Groovy and the way it present it’s concepts to the developer.

The Grails bridge

When I crossed the Groovy bridge I found the Grails one. And was another great experience to learn several stuff that were on the other side like convention over configuration, DRY, an agile way of developing web applications and so many other things.

I used what I got: Groovy, and again, step after step I was getting to know details about some technologies that I already used on the Java world but did not know at the time, like Hibernate, Spring, Sitemesh and basically all the libraries used by the framework.

Later as a consultant I noticed the same thing that happened with Groovy happening with Grails: people starting to use and leverage the Java EE platform through Grails. People starting to use things like JMS and JNDI thanks to the way Grails presented these technologies to them in a simple way. And many other people getting a deep knowledge about Spring too and becoming better developers. Another awesome bridge!

Crossing the bridges

The nice thing about these bridge technologies is the fact that you almost always realize that you crossed them by accident in one of those days that you look at yourself and realize that you no longer feel uncomfortable with what you are using.

This is the real value of Groovy and Grails for me: they empower developers using what they already know.

 

The growing irrelevance of MongoDB

My relationship with MongoDB can be divided into three phases. I’m on the third now, in which the product seems to be irrelevant for me. On this post you’ll see why.

First phase: fascination

I was really lucky in my first contact with MongoDB: it was used on a poliglot persistence architecture to deal with parts of the system in which the relational model was not a good fit. It worked as a charm: fast, easy to setup and use and had a really nice performance. Was perfect for that case.

I was dazzled with it: the fact that my main query language was JavaScript was an amazing experience. I’ve never thought that something like that could work so well. This was a period of discoveries in which I learned a lot about the product and how to manage the document model it presented me.

Second phase: reality

Maybe a better name for this phase should be maturity. At this moment I knew where and, even more important, where NOT use MongoDB. At this age you know that MongoDB is a nice product, but a product that you should use with caution. You know that the document model is great and solves a lot of problems for you, but not all: actually, quite a few.

The wakeup call for me was my own failures with it and, most of all, the failure of others. Mostly from people that I saw getting really excited about MEAN and trying to transform the world into a nail so that MongoDB could be the perfect hammer for it. It’s the moment where some facts get into your face and shout at you:

  • The relational model is not so bad as they sell. Actually, there’s a good reason why this is the most popular model and will be for a long time: it works.  And contrary to what happens in the NoSQL world, we have a large knowledge base about good and bad practices to be applied to this model.
  • ACID matters. MongoDB have a little annoyance: you can’t create a transaction to deal with multiple documents. The problem is: the number of situations in which you must do that is huge and cannot be ignored.
  • All that talk about the huge performance you’ll get with MongoDB don’t matter so much if you don’t know in advance WHAT performance your system need.

On this phase all the excitment is gone, which is the best thing could happen to anyone. This is how you learn what the tool can and cannot do for you. It’s the best phase.

Phase three: irrelevance

Today MongoDB is irrelevant for me. Not the document model, but the product. One day I woke up and realized I didn’t need MongoDB anymore because the alternatives were far more attractive for my projects. It came in waves.

First wave: TokuMX

TokuMX is a fork of MongoDB that I like to call “the charming twin of MongoDB”. It uses the same communication protocols, have basically the same commands and is 100% compatible with MongoDB. But with some great advantages:

  • I can have full ACID transactions among multiple documents.
  • Way faster than MongoDB (50x times more performant according to Tokutek).
  • Consumes 90% less storage than MongoDB.
  • 100% MongoDB compatible. All you have to do is replace your MongoDB instance with TokuMX, migrate your data (which is fairly easy) and you’re done.

And yes, it’s open source as MongoDB and have a free version which works extremely well. Of course it’s not a perfrect solution. It still have two limitations:

  • There’s no Windows distribution.
  • The Java libraries today doesn’t have native support for MongoDB ACID implementation. You can still use it, but still requires some boilerplate code.

TokuMX was the first time MongoDB seemed irrelevant for me. Of course, this can be temporary: MongoDB can still beat TokuMX on a future release. But only in a future release. Today it can’t.

Second wave: PostgreSQL

If TokuMX made MongoDB irrelevant for me, PostgreSQL 9.2 reinforced this impression. Since version 9.2 PosetgreSQL offer support for JSON and JSONB (JSONB support was actually added on the 9.4 release) data types. It’s an interesting solution because with it I can get the good parts of the relational model with the flexibility of the document. All of it on the same product. Nice!

Ok, but MongoDB still was more performant than PostgreSQL. I said it was, because since version 9.4 of PostgreSQL that became history: recent benchmarks show that PostgreSQL is now way faster than MongoDB dealing with JSON data types. I never compared PostgreSQL with TokuMX, but given that both are now more performant than MongoDB, I think you got my point here.

Conclusions

Comparing MongoDB with TokuMX and PostgreSQL makes the choice for MongoDB a hard one. It still is a nice product, and of course it will evolve to compete with these alternatives, but now it got a third place at most. Let’s wait to see what 2015 will bring for MongoDB.

The social value of Groovy and Grails in Brazil

As a Groovy and Grails consultant I can say that one of the biggest gains these technologies brought is the large amount of people that it introduced to the Java EE platform. This post is about these people and how I see it happen here in Brazil.

Groovy and Grails greatest achievement in my opinion

Here in Brazil when you talk to young developers who usually attend to conferences they can transmit the impression that software development is limited to two platforms (Java and .net) and the latest trends like Ruby (and Raills), NoSQL, Python, Node.js and all the new stuff. Add to this the publications which usually only will tell you about the newest technologies and you run the risk of actually believe in it.

This is not the case: much of software development in IT departments still is desktop based and a large (REALLY LARGE) chunk based on older technologies like Delphi 7 (and earlier versions), Visual Basic (classic), Power Builder, COBOL, CGI, Clipper, FoxPro, VBA, Microsoft Access (97),  or many fourth generation programming languages. And these are not bad software at all, many of these are simply AMAZING projects. The problem is that most of the technologies in which these are based are no longer supported, which creates the need to change the programming environment.

One of the most alluring software development environments to those companies is Java EE. The problem is that for those which are not used to the Java world the learning process can be a traumatic experience. The first contact is really scary when you face all those acronyms like JPA, EJB, JNDI, CMT, JMS and many others. Even with all the recent developments of the platform it still frighten and alienates many developers which are trying to learn it.

This is a big problem in a market lacking qualified workers and this is where Grails save the day by introducing the Java EE platform to this crowd without scaring the novices with all the acronyms that I mentioned above. One of the biggest myths about Java web development is that it is cumbersome. Depending on what you use it may be true but this resistence fall apart when you show how easy it is to create real projects using Grails scaffolding, GORM and how simple it is to apply the MVC pattern using Grails.

Groovy also play an important role bringing these developers to the Java EE platform, because the syntax is very similar to the languages these professionals are used to. Simple details like optional semi colons and parenthesis, which seems silly to many of us can be a source of resistence to the platform. If you can reduce the resistence with these simple things, why not? (I wrote about this on a recent blog post)

Why Java EE?

Here is a small list of the reasons my clients tell me to justify their switch to Java EE:

  • The fact that it’s an open platform and they won’t be caught again in a vendor lock-in. (there’s a huge Visual Basic 6 trauma in Brazil)
  • The huge collection of third party libraries, frameworks and components available, most of them being open source.
  • The low licensing costs compared to other closed platforms.
  • It’s multiplatform.
  • There’s a lot of material written about it.
  • Active community.

So now using Grails many IT departments can take advantage of these facts to empower their infrastructure.

A short story

Some years ago I used to help a developer which lived in a remote town here in Brazil. This experience helped me to understand a reality vastly different from mine: one in which Internet access was a luxury and books and magazines even more. We exchanged a lot of e-mails for some years and suddenly this guy disappeared.

This year I received one e-mail from him thanking me because that system actually worked and now he had a company with 40 employees. Woa! If a simple exchange of e-mails about Delphi 5 could achieve this just imagine what the popularization of the Java EE can do!

PS: I’m the founder of Grails Brasil: one of the largest Groovy and Grails user groups in the world. Often I receive e-mails like the one I mentioned above.

Things they don’t tell you about MongoDB

MongoDB is by far the most popular NoSQL database in Brazil (at least based on the amount of blog posts and articles writen about it here that I read). It’s really an amazing solution but what really bothers me is the fact that very few people know about it’s limitations. So I see the same story repeating itself: people unhappy with it treating his limitations as if they were bugs.

This post is about some of it’s limitations that really caught me by surprise, so that if you are thinking in adopting it at least you’ll be warned about them and so avoid these headaches.

Hungry for bytes

This was my first surprise: MongoDB consumes too much disk space. This is related to the way it is coded to avoid issues with disk fragmentation by pre-allocating space in large files. Here is how it works: when you create a database will be created a file named [db name].0 and it will have (by default) 64 Mb of size. When more than half of this file is used, another one will be created named [db name].1 with 128 Mb. This will happen again and again, so that files with 256, 512, 1024 and finally 2048 Mb will be written on your disk. All subsequent files will have 2048 Mb size.

If storage space is a restriction to your project you MUST take this in consideration. There’s a commercial solution for this problem: it’s named TokuMX. I didn’t tried it, but if it really works, so the storage consumption will decrease 90%. Also the repairDatabase and compact commands can help you on the long run.

Data replication with the replica-set strategy is amazing, but have it’s limitations.

The replica-set strategy for data replication in MongoDB is amazing. Easy to configure and works really well. But if your cluster have more than 12 nodes you have a problem. Replica-set in MongoDB have a limit of 12 nodes. There’s an issue to eliminate this restriction, so you can track to see if this problem will soon be only a sad memory.

Master-slave replication will not ensure to you high availability

Despite being considered deprecated, there’s another replication strategy that MongoDB implements which is the master slave. It solves the problem of the 12 nodes limitation but brings you another one: if you need to change the master node of your cluster, you must do it manually. Doubt me? Here is the link.

Avoid the 32 bit version

The 32 bit version of MongoDB is also considered deprecated because it can only handle 2 Gb of data. Remember the first limitation on this post? Very quickly you’ll face it with this version. Here is a blog post from MongoDB about this limitation.

Consulting prices are VERY expensive (at least for brazilian developers and companies)

I don’t know how this is considered outside Brazil, but at least here the consulting prices of MongoDB are infeasible. For the “Lightning Consult” plan it costs US$ 450,00 per hour, and you have to buy at least 2, so it cost at least US$ 900,00 to any company. We are used to contract companies like RedHat and Oracle for much better prices.

Bad administration tools

This is still a huge problem for beginners. The administration tools are in terrible shape. The best one I know is called RoboMongo, which is really handy for those who are taking the first steps with the tool. Unfortunetely it’s an exception.

Know the oficial limitations

It amazes me that so few people search on the limitations of the tools they are wanting to adopt. Fortunetely the staff of MongoDB have published a post with all those limitations so that you can know them in advance and avoid many unfortunate surprises on your adoption path.  Hope that you are not caught by surprise by the limitations exposed on this post. :)

Why Groovy?

Why should a team already used to Java pay attention to Groovy? Which problems does it solve for you? On this post I hope to show you some of these reasons, at least the ones that matter most to me.

First the pseudo reason: runs on the JVM

We all know the JVM is great but this can’t be considered THE main reason why you should pay attention to Groovy. After all, Java also runs on the JVM. In my opinion this must be seen as a great advantage, but not the main one.

It’s great to know that all your code will be executed by your Groovy code without problems and that you’ll have access to all the Java ecosystem. But it only helps the adoption of the language: does not fully justify it.

(but I must admit that my first contact with Groovy was strongly influenced by this aspect)

Can be executed as a script using all your legacy Java code and your favorite libraries

This is a good reason. Groovy is an awesome language to write maintanence code. I love to write Groovy scripts to execute small maintanence tasks on my systems, reusing all my business logic without having to directly access my database.

I also like to write infrastructure maintanence scripts on my servers, so that I can take advantage of great Java libraries like Apache Commons. For people like me (which are not system managers)  who have not mastered bash yet but knows Java this makes Groovy a great option.

The script format is also very inviting for beginners. There are those cases in which you want your program to do simple things like deal with a bunch of files on a folder. What code you think is easier to understand on these cases? The Java version below?

[code language="java"]

public class ReadFolders {
public static void main(String args[]) {
File folder = new File("/somewhere");
for (File file : folder) {
   // do something
   }
  }
}

[/code]

Or the Groovy version?

[code]

File folder = new File("/somewhere");
for (file in folder) {
 // do something
}

[/code]

Why wrap my kickstart code in a class when it’s simple?

Have some features that are not in Java yet.

If you already know Groovy maybe you feel the same thing: that feeling that Java is becoming more like Groovy in every new iteration. I must admit that Java 8  does not seems like something new to me because I actually have access to all those features since 2007 when learned Groovy. Here is a small list of Groovy features that I really miss in Java:

  • Closures: I know that in Java 8 closures will be a main feature, but Groovy have it since it’s first version.
  • Builders: Groovy allows you to deal with tree like data structures in a really nice and simple way. It’s one of those features that you will only understand when dealing with real problems. So I recomend you to learn more about Swing Builder and Markup Builder.
  • The way we deal with numbers it’s also great. Dealing with BigDecimal in Java is a struggle (I know better days await for Java 8, but today I still have to use 7). Just compare: a.multiply(b) or a * b? If you must deal with BigDecimal numbers, Java never was a good language to start with.
  • GStrings: Groovy have it’s own version of strings. Remember all that concatenation you have to do with Java? In Groovy all I have to do is interpolate it.
  • The fact of Groovy also being a dynamic language also helps a lot. Have the ability to change the behavior of your code at runtime without the need for design patterns often difficult to implement is a real time saver in several cases, like writing mock objects for your tests as an example.
  • Groovy also have special constructors in it’s syntax to deal with the Collections API. Much better than the way we are used to work with Java.
    Instead of
    List<String> words = new ArrayList<String>();
    just use
    def words = []
    And the support for hashes also make Groovy an excelent option to write mock objects. Great for testing!

A nice gateway to the Java platform

At least in Brazil (I’m a brazilian) I observe a huge mass of programmers from languages like Visual Basic, Delphi, PHP, Power Builder and many others that want to code for the JVM but think Java is a difficult to learn programming language. It seems strange to many, but it’s a reality that I deal with every day.

Groovy syntax is very attractive to this crowd. Having semicolons and parentheses as an optional aspect may seem like bullshit to many of you but to a really big audience this is a good reason to try the language. It’s closer to what they are used to work, and as time goes by, people start to learn more and more about the Java ecosystem in a much less traumatic way.

Groovy Console

Not exactly a feature of the language, but it’s a nice tool that I use a lot. Groovy comes with a full REPL interface that, in my opinion, is the best place to experimente the features of the language, learn APIs and even use it to execute some scripts once in a while.

Write a DSL in Groovy is easy (really easy)

I know this is not part of everyday life for most people, but since it is part of my, I feel compelled to mention it. Optional parentheses and semicolons, closures, AST and many other features make writing a DSL in Groovy really much easier than in Java.

There’s a great book just about this called Groovy for Domain Specific Languages which is really great.

Grails

Most people I know who learned Groovy learned it because of Grails, which is the “killer application” of this language. Grails simply saves the Java EE from boredom. I wrote about it four years ago and still think this way. Grails is web development for the Java platform as it always should have been. Simply as that. And as the founder of Grails Brasil, one of the largest users group of the world, I really know what I’m talking about, because I already seen many amazing usages of this framework.

And you?

Now I whould like to know what you think about Groovy? Have you already used it? Did you loved or hate it?

Why developers need secret weapons

It’s a common situation: while presenting a development platform which is not so mainstream as JSF or EJB questions like “where will I find people who knows how to use it?” or “nobody uses it, so why should I?” emerge. You know what? I think these are great examples of situations in which good strategies are simply thrown away. Let me tell you why.

“Competitive edge”

The main argument that I see against alternative development platforms – such as Grails, Play and many others – is the fact that since they are not so popular (at least not here in Brazil) or “industry standards” they don’t contribute for the team competitiveness. Well: let’s think for a while about the meaning of the word “competitive”.

If you compete with someone the main goal is to be ahead of your competitor, not behind nor beside: just ahead. Some people may say that if you use the same weapons you achieve competitiveness, but you know what? If you always use the same tools as your oponents you are actually getting at most the same level.

Fighting fear 

When you bring something new to the table one of the first reactions will be rejection. People tend to prefer what they already know. It’s human nature. Here are some common arguments.

(this is an awesome video of Carly Fiorina about it: http://www.youtube.com/watch?v=w3IbKbDhfKw)

“Where are we going to find skilled programmers for that?”

If the tool doesn’t show good documentation, this may be a valid statement, but if it does not? Well, on this case, at least in my opinion, we can say that the main asset of any member of the team is basically being thrown away: people can learn and usually like it.

And you know what? I think this kind of situation is perfect to test your teammates: you know a good developer when it faces a new challenge.

“This is not a industry standard”

An industry standard obviously have a lot of advantages like higher compatibility and so on, but does strictly adhering to an industry standard means using exactly the same tools as your competitors? Take the JVM as an example: we can say today that it is an industry standard, but does that mean that I should only code in Java? Not at all! Compatibility does not mean being equal, but actually iterate with it.

“Who will pay me to work with this?”

Well: for me this is the saddest argument. It’s ok to be a specialist on a given technology, but again this line of thought forget one of the greatest values of the human being which is it’s capacity to learn.  And let’s be honest: unless it is a ultra secret technology internally created, at least SOMEONE in the world had already tried it and probably will want to share it’s experiences about it.

Secret weapons

RTYV24HVMTZQ

 

If you are a developer and want to differentiate yourself from competition you need secret weapons. I call secret weapon all those technologies/knowledges that are ignored by most of your competition because are not “industry standards” or “widely used”. In a market dominated by development of CRUD systems, for example, in which most of the companies use something like JSF, alternatives like Grails, Ruby on Rails, Play and many others are excelent weapons because of the productivity they provide.

Another great example of secret weapon (now not so secret) are NoSQL databases. As I mentioned on another post, they bring to a world dominated by the relational thinking “new” notations which help us to achieve better results and, doing so, differentiating our work from what is being done by our competitors.

Secret weapons are not just for market share, but also for personal growth. Let me give you an example: Lisp. Before all this recent hype about functional programming I started to learn this language just for fun and was one of the best investments I have ever done. I learned A LOT with it and when new languages like Scala and Clojure started to show up everything was really easier for me. And you know what is really interesting? I never used Lisp on any job.

When you start to learn and use secret weapons you gain a competitive edge against your peers because your cognition is improved: there are more dots to connect now, so better solutions usually will emerge from it.

Conclusions

Secret weapons are not a luxury, but a necessity. The argument that you should use only what is most widely used is meaningless because it totally ignore the fact that people can learn new stuff.

Of course, you know your secret weapon is a valid one only if you can show enough arguments to your audience so that they can overcome the fear of change and opt for it.

Some great NoSQL resources

Recently as you can see on this blog I’ve been writen a lot about NoSQL. On this process I’ve found some really cool stuff that now I want to share with you. Here are the links:
“The Architecture of Open Source Applications: The NoSQL Ecosystem” - Adam Marcus – My favorite chapter from the online book “The Architecture of Open Source applications”. On this article are exposed the main architectural principles behind some NoSQL database systems. The part about scalability is really great and  helped me to understand how those solutions deal with the problem.
BigTable: a Distributed Storage System for Structured Data” – Google – This is now a classicle article in which BigTable is presented by ther first time. Excelent read!
Dynamo: Amazon’s Higly Available Key-Value Store” –  Amazon – As the BigTable article, this now is another classical text in which is shown Dynamo, created by the Amazon development team. The description about how the sharding is done is a high point for me.
Redis: Under the Hood” – Paul Smith –  Ever wonder how the source code of a database system work? This article is just that, and with it we can understand very clearly how some aspects of ACID are implemented by Redis. Excelent read!
A Relational Model of Data for Large Shared Data Banks” – E. F. Codd – How can we talk about NoSQL without knowing at least what is the relational model? This is the article which started it all. A must read.
Limitations of Record-Based Information Models” – William Kent – Many people nowadays love to criticize the relational model (and most criticisms are actually really poor). Well: this is an excelent argument against the use of record based data structures (aka tables) in several circumstances written in 1979. This article changed radically how I saw and used tables. Fascinanting.
Consistence Models in non-relational databases” – Guy Harrisson – As relational database users we simply forget about how consistency is obtained. This nice article show some models of consistency adopted by non relational systems. It’s a great introduction for the concepts of strong and weak consistency.
CODASYL at Wikipédia” – Wikipédia – On my research I always stumble upon CODASYL. On Wikipedia there’s a nice description about it, which, we can say, it’s the grandfather of all “nosql” systems. :)

The CAP theorem

A lot is being said and written recently about the CAP theorem. If you don’t know what it is, here is a short description: it basically says that, when dealing with data consistency, every distributed system is only capable to achieve two of three properties:

  • Consistency: all nodes have access to the same information at the same time.
  • Availability: each request to the system will always get the same response.
  • Partition tolerance: the system will still work even if part of it’s nodes are not available.

Well, here are some nice texts about it:

Brewer Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services” – Seth Gilbert e Nancy Lynch – A really nice description about the CAP theorem.

Brewer’s CAP Theorem: The Cool Aid Amazon and eBay have been drinking” – Julian Browne – Another nice description of the theorem (easier in my opinion).

More cool stuff

The Little MongoDB book” – Karl Seguin – An excelent introduction to MongoDB.

The Little Redis Book” – Karl Seguin – From the same author of “The Little MongoDB book”, guess what? Another excelent introduction.

Some myths about NoSQL

Logotipo cretino

As every buzzword, NoSQL generates a lot of hype and, as a consequence, a lot of nonsense is written and said about it. On this post I hope to debunk some of the myths created around it.

Myth #1: the novelty of NoSQL

Basically the main attribute of all NoSQL database systems is the fact that they are not based on the relational model. Well: in fact since this is the most popular characteristic ( in a recent post I actually showed how meaningless it is). The Codd article in which the relational model is first described is published in 1970.

Well: does it mean that before 1970 no other database system really existed? And, since they really existed, like the CODASYL navigation system for example, and it were obviously not a relational one, based on this main attribute, NoSQL actually predated the relational model. Myth debunked.

Myth #2: forget about the schema

This is probably the myth that generates more pain on the long run. The fact that many NoSQL solutions don’t force you to adopt a rigid schema on your data does not mean that you should ignore it. Actually, it’s quite the contrary, specially as time goes by and you must know why you used those attributes.

In some cases it can be even dangerous: take MongoDB for example. As a good practice, many experienced users will tell you to define the attributes of your documents with a pre allocated size to avoid the full copy of your document when it size changes. And this is just one case: you still need to know your schema to optimize your queries with indexes and so on. So, myth debunked.

Myth #3: NoSQL scalability is always superior because, well: it’s NoSQL!

High scalability is one of the main selling points of all NoSQL database systems, but if you opt for a NoSQL solution that does not guarantee that all your scalability problems are over. What really solves your problem is a great architecture: simple as that.

I’ve alreay saw cases in which the adoption of a NoSQL brought huge gains in scallability. But on these cases I wonder if the victory was not a result of using the right data structure instead of the wrong one for that particular case. By the way, I already wrote something about it on this blog when I mentioned that the NoSQL problem is actually  notational.

The funny thing about this myth is that it simply forget about the inumerous other cases where relational databases achieved great scalability too.

So yes, it’s easier to scale systems based on NoSQL solutions, but the adoption of a NoSQL database is not the only reason you’ll achieve that.

Myth #4 Unfair benchmarks

How can you fairly compare two completely distint persistent paradigms like, for example, key-value, relational, document based, and so on? I see some benchmarks that announce huge gains of performance in NoSQL using completely unfair comparisons.

It’s obvious that a key-value database will be faster then a relational one when your query involves only a key. A fair benchmark whould involve only systems which belong to the same paradigm. Then we can say something with a minimum sense.

Comparisons like MySQL vs MongoDB simply does not make any sense since they are made for different uses. If you get this huge gain, there’s a high probability that you were using the wrong data structures in the first place.

Myth #5: NoSQL provides higher productivity

"É como se você tivesse inúmeros braços com NoSQL". Rá!

Only in your dreams or if you are using the wrong tool. Since we lived for years using only relational solutions in the moment you start to use a NoSQL database you will have to pay a high price for your adaption and also your team.

Very few people talk about this, but the main challenge when you choose the NoSQL way is cultural, not technical. People fear change and move from their comfort zone, even if the solution you present is better. It’s not uncommon to see teams using MongoDB, for example, as a relational solution or cursing Redis because they  only can use the key as their main query.

Productivity comes with practice, not with magic.

Conclusions

The NoSQL movement is by far one of the best things that could happen to developers since it open our minds to alternatives for the relational systems we were used to, but we must always remember that with each gain comes a price.

Deconstructing NoSQL: the quest for better definitions

My main concern is language: which words should we use to denote things? It really annoys me when I find words that actually don’t achieve this goal. NoSQL is one of these terms. On this post I’ll propose you something: avoid the term “NoSQL” because it actually does not describe anything.

Martin Fowler’s definition of NoSQL

Martin Fowler

In 2012 Martin Fowler wrote a post on it’s blog in which he try to define what NoSQL is. Since then this has been the definition that (at least for me) really caught up, so I’ll use it as the foundation of this post in which I’ll deconstruct the term.

Curiously I’m not alone: Fowler doesn’t like the term either, but he shows us five main characteristics that these database systems have in common. As I’ll show you, they actually doesn’t define anything. Let’s get started.

First attribute: not using the relational model (nor the SQL language)

Protágoras riu pra mim. :)

Let me start with a quick sophism. If I point to a database system and say that it’s main characteristic is the fact that it’s a relational one I’m already distinguishing it from all the other options. So in the moment someone characterized the first relational system the term “NoSQL” was automatically created by exclusion. Sorry: I warned you that I’ll use a sophism. :)

What I mean by this is that simply saying that your database does not follow the relational paradigm it doesn’t mean anything since there are many other options, like document oriented, graph oriented, key-value, navigational and so one. It sounds like someone saying you are not from Brazil (like me): you may be from any other country!

Now let’s get back to the second aspect of this attribute: not using the SQL language. Again, this does not mean anything usefull because there are database systems characterized by some as “NoSQL” that are now adopting the SQL language. If you blindly follow this characterization, so those database systems can’t be called NoSQL anymore.

(about the so called problems of SQL, I sugest the reading of this excelent article)

Second attribute: open source

There’s not really much to say about it because there are today many other relational databases that ARE open source. Even worse: there are NoSQL solutions which are not open source like FatDB for example. So, again, it does not characterize anything, but actually make things worse since put relational and not relational databases on the same bucket.

Third attribute: designed to run on large clusters

One of the biggest selling points of the NoSQL is the fact that it brings to the table another alternative to solve the problem of scalability by popularizing horizontal scalability. So we see the emergence of concepts like the CAP theorem and the loosing of ACID to achieve this goal between many other things.

It’s ok, many NoSQL database systems were created aiming specifically at this problem, but there are also relational databases which deal (and really well) with the same problem. For years relational solutions like Oracle, PostgreSQL and JustOneDB achieve the same results. Again, no significant differentiation.

Fourth attribute: based on the needs of 21st century web properties

Which are those properties? Giant clusters? Already talked about it on the preceeding paragraphs. Low cost? Firebird, SQLite, MySQL and others are low cost solutions too (and relational!). Need constant need in the schema? So you are using the wrong persistent model my friend, should be using something else. :)

Again we see a broad characcteristic that any seller can use when showing you their persistence solution.

Fifth attribute: no schema

Which brings us back to the first attribute of this post and bring us another problem: the fact that this characteristic will make the key-value, graph based, orient based and navigational look the same. So it actually make things worse!

What is the solution?

Simple: let’s avoid the term NoSQL since it is so broad that does not define anything with minimum accuracy. Maybe it’s more interesting instead to use the paradigm name instead. Using MongoDB? Say something like “I’m using a document oriented database system called MongoDB” instead of “I’m using a NoSQL database called MongoDB”. Did you see how much more precise your words were?

Of course I’ll continue to use the term “NoSQL” for a while because basically everybody uses it, but I think that if we begin to stop using it slowly many missconceptions can be avoided on the long run.

What about the definition of NoSQL as “not only SQL”?

Even worse! Now we are putting relational and non-relational database system on the same set without any differentiation! :) Much more interesting is the emergence of terms like “polyglot persistence” as Fowler describes on his blog.

NoSQL: it’s a notational problem!

There are those things that look complex and modern at first sight that when we look a little closer are actually pretty simple, like NoSQL for example. You know, I read a lot of articles and blog posts about this topic and always get amazed with the capacity of some authors to hide the basic fact about it:  we are actually dealing with a notational problem.

Greeks, romans and indians

When I think about the old Greek historical figures like Arquimedes, Thales of Miletus or Euclid. Those were the guys which basically made the foundation of what today we call mathematics (at least for the occidental culture).

What really amazes me is the fact that in this period of time we see a huge development of geometry but very little in algebra. This is the moment that I ask myself why. There are two reasons for this. The first is a pragmatical one: agriculture was one of the main economical forces of that time, and landing disputes was a very common issue, so geometry was a great tool to solve these problems.

The other reason is what really interest us on this post: they didn’t have a good notation to represent numbers and quantities, only shapes. When we look at the roman empire which succeeds the greeks we can see that advances so huge like those of the greeks didn’t show up. Actually, there’s one: the roman numerals, which were just a little more efficient than the greek ones, but not enough to start a mathematical revolution.

But in the India of the VI-VII century something happened that changed forever the mathematical landscape: the arabic number system, which maybe for the first time allowed us to represent bigger amounts based on the position of each number. It’s amazing how a “simple” new notation can fire things up: we can say that algebra really started from that point in time.

Well: enough about indians, greeks and romans. Back to NoSQL.

Database notations

Give a hammer to a kid and suddenly the whole world will look like a nail

Does all information really need to be represented as a set of rows and columns? Initially things were not like that: CODASYL for example had a navigational database system much like graph databases we see today.  Curiously, with the success of the relational model it was like if all database systems started to be based on tables.

We can’t deny the huge gains that the relational model brought to us, but I think that maybe the price of this success came with an intellectual loss to. The impression that I have is that since 1976 when Oracle launched the first successfully commercial relational database system all data got flattened to only two dimensions.

When we “tablelize” the world we must represent we get with solutions that are way more exapensive than necessary. It’s easy to see how, you just have to remember that table you created some time ago with few fields that today have a couple dozens or that magic you had to do just to represent more complex relationships between the entities of your system. Maybe a graph or even a key-value representation could be a better fit for the problem.

What I mean with all this is that the NoSQL movement actually was a wake up call for all of us that for some time saw data structures as only a bunch of tables on a relational database. Now it’s easier for us to see another options like graphs, documents, key-values, etc. With those “new” data structures we have now new notations to represent our world, the same way our indian friends revolutionized algebra with their notational system.

Concluding…

NoSQL (I actually don’t like this name) is more than database systems that don’t follow the relational model: they bring to the table new notations that we can use to represent the world in which our software will work. What NoSQL brought to us was way more than a solution for scalability problems: it’s a wake up call for all those data structures that we simply forget about because of our marriage with the relational model.

This “notational discovery” I believe will bring a huge jump to all of us developers: now it’s easier to bring better solutions based on better notations. If your reality can’t be well represented on a relational model just look for another one: now there are affordable solutions that may fit your needs.

Yeah: I can see a huge win for all of us coming real soon.
By the way: we ARE already winning with all of this.