NestedText Implementation in Ruby

18 minute read

With this post I’m declaring my latest project nestedtext-ruby more or less done (for now)! It’s a library for a new exciting data format designed to address issues with the current popular formats JSON, YAML, TOML etc. The reference implementation is for Python, and I decided it’s a good opportunity to make one for the Ruby community.

But first, what is …

NestedText - The Human Friendly Data Format

Citing from the official introduction page:

NestedText is a file format for holding structured data to be entered, edited, or viewed by people. It organizes the data into a nested collection of dictionaries, lists, and strings without the need for quoting or escaping. A unique feature of this file format is that it only supports one scalar type: strings. While the decision to eschew integer, real, date, etc. types may seem counter intuitive, it leads to simpler data files and applications that are more robust.

NestedText is convenient for configuration files, address books, account information, and the like. Because there is no need for quoting or escaping, it is particularly nice for holding code fragments.

“Why do we need another data format?” is the right question to ask. The answer is that the current popular formats (JSON, YAML, TOML, INI etc.) all have shortcomings which NestedText addresses.

Example

Here’s a full-fledged example of an address book (from the official docs):

# Contact information for our officers

president:
    name: Katheryn McDaniel
    address:
        > 138 Almond Street
        > Topeka, Kansas 20697
    phone:
        cell: 1-210-555-5297
        home: 1-210-555-8470
            # Katheryn prefers that we always call her on her cell phone.
    email: KateMcD@aol.com
    additional roles:
        - board member

vice president:
    name: Margaret Hodge
    ...

See the language introduction for more details.

My Ruby Implementation

The implementation that I’ve been working on is available on my GitHub profile at erikw/nestedtext-ruby or at rubygems.org.

Dynamic GitHub repo image with stats

Provided is support for decoding a NestedText file or string to Ruby data structures, as well as encoding Ruby objects to a NestedText file or string. Furthermore there is support for serialization and deserialization of custom classes. The supported language version of the data format can be see in the badge above. This implementation pass all the official tests.

nestedtext-ruby logo

This library is inspired Ruby stdlib modules JSON and YAML as well as the Python reference implementation of NestedText. Parsing is done with a LL(1) recursive descent parser and dumping with a recursive DFS Depth-First Search traversal of the object references.

As it mostly goes with software projects, I underestimated the effort needed for this implementation. Looking at the language specification, it seems rather small and simple. However it turns out that to implement a parser and dumper for this will still cover a lot of tricky edge-cases. I worked in a TDD Test-Driven Development fashion by specifying the desired input to be parsed and the expected result (and the opposite for dumping) and then implemented the parses successively. In the end this resulted in some 228 unit tests with 100% code coverage.

A very nice thing about NestedText is that the language authors has provided an official test suite that library implementators can use to check that all edge cases has been covered. In nestedtext-ruby, I dynamically generate test cases from the provided test files, which adds a total of (at the time of writing) 137 test scenarios.

Usage

The full API documentation can be found at rubydocs.info. A minimal & fully working example of a project using this library can be found at erikw/nestedtext-ruby-test.

Decoding (reading NT)

This is how you can decode NestedText from a string or directly from a file (*.nt) to Ruby object instances:

Any Top Level Type

require 'nestedtext'

ntstr = "- objitem1\n- list item 2"
obj1 = NestedText::load(ntstr)

obj2 = NestedText::load_file("path/to/data.nt")

The type of the returned object depends on the top level type in the NestedText data and will be of corresponding native Ruby type. In the example above, obj1 will be an Array and obj2 will be Hash if data.nt looks like e.g.

key1: value1
key2: value2

Thus you must know what you’re parsing, or test what you decoded after.

Explicit Top Level Type

If you already know what you expect to have, you can guarantee that this is what you will get by telling either function what the expected top type is. If not, an error will be raised.

require 'nestedtext'

ntstr = "- objitem1\n- list item 2"
array = NestedText::load(ntstr, top_class: Array)

hash = NestedText::load_file("path/to/data.nt", top_class: Hash)

# will raise NestedText::Error as we specify top level String but it will be Array.
NestedText::load(ntstr, top_class: String)

Encoding (writing NT)

This is how you can decode Ruby objects to a NestedText string or file:

require 'nestedtext'

data = ["i1", "i2"]

ntstr = NestedText::dump(data)

NestedText::dump_file(data, "path/to/data.nt")

#to_nt Convenience

To make it more convenient, the Ruby Core is extended with a #to_nt method on the supported types that will dump a String of the data structure. Here’s an IRB session showing how it works:

irb> require 'nestedtext'
irb> puts "a\nstring".to_nt
> a
> string
irb> puts ["i1", "i2", "i3"].to_nt
- i1
- i2
- i3
irb> hash = {"k1" => "v1",
            "multiline\nkey" => "v2",
            "k3" => ["a", "list"]}
irb> puts hash.to_nt
k1: v1
: multiline
: key
    > v2
k3:
    - a
    - list

Types

Ruby classes maps like this to NestedText types:

Ruby NestedText
String String
Array List
Hash Dictionary

Strict Mode

The strict mode determines how classes other than the basic types String, Array and Hash are handled during encoding and decoding. By default strict mode is turned off.

With strict: true

Ruby NestedText Comment
nil empty (1.)
Symbol String Raises NestedText::Error
Other Class Raises NestedText::Error

With strict: false

Ruby NestedText Comment
nil Custom Class Encoding (1.)
Symbol String  
Custom Class Custom Class Encoding If the Custom Class implements #encode_nt_with
Other Class String #to_s will be called if there is no #encode_nt_with
  • (1.) How empty strings and nil are handled depends on where it is used. This library follows how the official implementation does it.

Custom Classes Serialization

This library has support for serialization/deserialization of custom classes as well. This is done by letting the objects tell NestedText what data should be used to represent the object instance with the #encode_nt_with method (inspired by YAML’s #encode_with method). All objects being recursively referenced from a root object being serialized must either implement this method or be one of the core supported NestedText data types from the table above.

A class implementing #encode_nt_with is referred to as Custom Class in this document.

class Apple
  def initialize(type, weight)
    @type = type
    @weight = weight
  end

  def encode_nt_with
    [@type, @weight]
  end
end

When an apple instance will be serialized e.g. by apple.to_nt, NestedText will call Apple.encode_nt_with if it exist and let the returned data be encoded to represent the instance.

To be able to get this instance back when deserializing the NestedText there must be a class method Class.nt_create(data). When deserializing NestedText and the class Apple is detected, and the method #nt_create exist on the class, it will be called with the decoded data belonging to it. This method should create and return a new instance of the class. In the most simple case it’s just translating this to a call to #new.

In full, the Apple class should look like:

class Apple
  def self.nt_create(data)
    new(*data)
  end

  def initialize(type, weight)
    @type = type
    @weight = weight
  end

  def encode_nt_with
    [@type, @weight]
  end
end

An instance of this class would be encoded like this:

irb> puts NestedText::dump(Apple.new("granny smith", 12))
__nestedtext_class__: Apple
data:
    - granny smith
    - 12

If you want to add some more super powers to your custom class, you can add the #to_nt shortcut by including the ToNTMixin:

class Apple
  include NestedText::ToNTMixin
  ...
end

Apple.new("granny smith", 12).to_nt

Important notes:

  • The special key to denote the class name is subject to change in future versions and you must not rely on it.
  • Custom Classes can not be a key in a Hash. Trying to do this will raise an Error.
  • When deserializing a custom class, this custom class must be available when calling the #dump* methods e.g.
    require 'nestedtext'
    require_relative 'apple'  # This is needed if Apple is defined in apple.rb and not in this scope already.
    
    NestedText::load_file('path/to/apple_dump.nt')
    

See encode_custom_classes_test.rb for more real working examples.

Schema

The point of NestedText is to not get in to business of supporting ambiguous types. That’s why all values are simple strings. Having only simple strings is not useful in practice though. This is why NestedText is intended to be paired with a Schema Validator!

A schema validators can:

  • assert that the parsed values are like the expected
  • automatically convert them to Ruby class instances like Integer, Float, etc.

The reference implementation in Python lists a few examples of Python validators. Here below is an example of how this Ruby implementation of NestedText can be paired it with RSchema.

Example with RSchema

The full and working example can be found at erikw/nestedtext-ruby-test.

Let’s say that you have a program that should connect to a few servers. The list of servers should be stored in a configuration file. With NestedText, this file could look like:

-
  name: global-service
  ip: 192.167.1.1
  port: 8080
-
  name: aux-service
  ip: 17.245.14.2
  port: 67
  # Unstable server, don't use this
  stable: false

After parsing this file with this NestedText library, the values for all keys will be string. But for to make practical use of this, we would of course like the values for port to be Integer, and stable should have a value of either true or false. RSchema can do this conversion for us!

# Define schema for our list of servers
schema = RSchema.define do
  array(
    hash(
      'name' => _String,
      'ip' => _String,
      'port' => _Integer,
      optional('stable') => boolean
    )
  )
end

# The coercer will automatially convert types
coercer = RSchema::CoercionWrapper::RACK_PARAMS.wrap(schema)

# Parse config file with NestedText
data = NestedText.load_file('conf.nt')

# Validate
result = coercer.validate(data_success)

if result.valid?
  servers = result.value
  # Now 'servers' is known to be valid and have the types specified in the schema.
  # Thus, we can use it now!
  stable_servers = servers.select { |server| server['stable'] }
  # Not a meaningful sum - just demonstrating that 'port's are integers and not strings anymore!
  port_sum = servers.map { |server| server['port'] }.sum
else
  puts result.error
end

Installation

  1. Add this gem to your ruby project’s Gemfile
    • Simply with $ bundle add nestedtext when standing inside your project
    • Or manually by adding to Gemfile
      gem 'nestedtext'
      

      and then running $ bundle install.

  2. Require the library and start using it!
      require 'nestedtext'
    
      NestedText::load(...)
      NestedText::dump(...)
      obj.to_nt
    

Closing

So there we go - now we can use NestedText with a breeze in Ruby projects!

While I underestimated the implementation effort, the joy of hacking away with the parser and dumper was sheer!

If you find this library useful or have suggestions on improvements, please let me know in the comments below or create an issue in the GitHub Project!

Tags:

Categories:

Updated:

Leave a comment

Your email address will not be published. Required fields are marked *

Loading...