What is an empty XML element?

Following a very enthusiastic discussion today at work, here’s the proper definition of an empty XML element, according to the W3C Recommendation of 26 November 2008:

[Definition: An element with no content is said to be empty.] The representation of an empty element is either a start-tag immediately followed by an end-tag, or an empty-element tag. — Extensible Markup Language (XML) 1.0 (Fifth Edition) § “Content of Elements”

[it’s curious that this discussion came up one day after XML’s 15th birthday]

Advertisements

Why refactoring?

Interesting thoughts about refactoring, starting with a comprehensive post by Jim Bird, stating what is not refactoring:

Fixing any bugs that you find along the way is not refactoring. Optimization is not refactoring. Tightening up error handling and adding defensive code is not refactoring. Making the code more testable is not refactoring – although this may happen as the result of refactoring. All of these are good things to do. But they aren’t refactoring. — in What Refactoring is, and what it isn’t

Refactoring

Then, there’s the obvious list of Refactoring Patterns, by Martin Fowler:

Refactoring is a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior. Its heart is a series of small behavior preserving transformations. Each transformation (called a ‘refactoring’) does little, but a sequence of transformations can produce a significant restructuring. — in Refactoring Home Page

Finally, a good advice from Joel Spolsky:

If you are writing code experimentally, you may want to rip up the function you wrote last week when you think of a better algorithm. That’s fine. You may want to refactor a class to make it easier to use. That’s fine, too. — in Things You Should Never Do, Part I

RethinkDB looks promising

RethinkDB is out – an open-source distributed database

RethinkDB is built to store JSON documents, and scale to multiple machines with very little effort. It has a pleasant query language that supports really useful queries like table joins and group by, and is easy to setup and learn.

An open-source distributed database with an intuitive query language, parallelized queries, and much more! It’s called RethinkDB and you can fork it on GitHub. What’s not to like?

MapReduce related patterns

Here’s a list of possible MapReduce related patterns I’ve been thinking about:

  • map-update: update each mapped document and emit its updated or original version;
  • map-delete: delete each mapped document;
  • map-reduce-map: map results of a map-reduce;
  • map-map: map results of a map;
  • map-recurse: apply recursion to the mapping function until a stop condition occurs;
  • any combination of patterns, e.g., map-reduce-map-update.

Example usage:

  • get the e-mail address of every customer with a negative balance: map-reduce-map. First, map-reduce to get the aggregate balance for each customer, then map again to get only customers with negative value and emit their e-mail address;
  • delete all documents older than one month: map-delete. First, map to get all documents older than one month and then delete each one;
  • get a list of documents and mark them as read: map-update. First, map to get the list of documents according to a given criteria, then update each document marking it as read;
  • and so on…

Blog posts firehoses

Three more firehoses to get all blog posts from the following platforms:

  1. blogger posts, through their changes.xml;
  2. tumblr posts, through superfeedr‘s track feature;
  3. WordPress.com posts, through their firehose feature.

While the first feed is free of charge, the other two have an attached price tag.

Also worth investigating is Paul Kinlan’s faux firehose for blogger.

Drinking from the firehose

Some firehoses to drink read from, other than twitter’s:

  1. Google Buzz, through their activity firehose API methods;
  2. FriendFeed, using their real-time updates methods.

While the first works via pubsubhubhub, the second uses a combination of long polling and a cursor that helps you make subsequent calls.