Welcome to Land of Tricks

Welcome to Land of Tricks

Archive for August 17th, 2009

Almost lost in May’s whirlwind launches of Wolfram|Alpha andMicrosoft’s Bing and the unveiling of Google Wave, was a quieter announcement that may bring a seismic shift toward the realization of Web 3.0.

igoogle screenshotWhile some aspects of the next generation of the Web are taking place, there are major physical and cultural challenges to bring it about.Google’s launch of Rich Snippets may well be a watershed moment in resolving these problems.

Before the term Web 2.0 came into common use, World Wide Web inventor Tim Berners-Lee outlined his vision of the next generation — what he called the Semantic Web. In a 2001 article in Scientific American, Berners-Lee described a global database of linked knowledge, in a markup format that could be understood and manipulated by computers. The World Wide Web Consortium (W3C), the international standards organization headed by Berners-Lee, has a longstanding group that has laid out the tools and protocols for the Semantic Web.

Web 3.0 is here (somewhat)
There are no hard borders between one generation and another, and parts of what is being described as Web 3.0 are already here.

Personalized home pages have been available for years. iGoogle, for example, steps into Web 3.0 territory by allowing users to create a home page with multiple tabs, built by inserting news headline feeds, weather forecasts, Twitter and Facebook feeds and hundreds of other content modules via widgets, and integrating e-mail, calendars and documents into mobile versions. Mobile “lifestream” features, which keep track of personal connections and activities, are widely used through Twitter and similar tools.

Google’s new Wave promises a watershed in collaboration, marrying e-mail, instant messaging, chats and media-sharing in a new communication model that has left reviewers grasping for words.

Google wave screenshotAnd some things that were seen as being enabled by the Semantic Web in 2001 are already here without it. For many Americans, persistent mobile connection is a reality — e-mail and SMS-capable phones are ubiquitous, and Web-enabled phones are common. But the full power of machine-understood data, linked across the entire body of information in one global Web, with “agents” focused on personal service to humans, is only in its infancy. The Semantic Web vision is the other part of Web 3.0, which vertically integrates data from a diverse set of sources, according to the W3C’s Semantic Web group.

The challenges to the Semantic Web
The Web, as of July 2008, included one trillion distinct URLs, by Google’s count. The search giant is estimated to actually index less than 5 percent of those, still a matter of tens of billions of Web pages. The overwhelming majority of these pages are meant to be read and understood by humans. The content of the pages isn’t meant to be understood by computers. Search engines can index keywords, but without context.

Semantic Web experts have collected the toolkit of languages and metadata markup systems that will allow machines to understand key words and the relationships between them. Such metadata is already being used in many places. A microformat called hResume, for example, allows LinkedIn.com to tag appropriate resume fields of its public profiles so that the resume data can be understood and reused elsewhere.

The value of such machine-usable data is obvious. Since the infancy of the Web, finding valuable information amid the growing clutter has been a major challenge. Directories such as Yahoo! made their mark by pointing users to useful, hand-selected websites. This manual work could barely keep up with the scope of the Web of the mid-’90s. It also faced growing credibility issues because links were chosen — or excluded — by human editors. Full-text search engines, such as Web Crawler and Alta Vista, gained popularity, but search results included large amounts of garbage. Today’s top search engines have worked to reduce the signal-to-noise ratio and increase the value of results by using sophisticated algorithms. Microsoft’s Bing, for example, promises to give more relevant results and aid in decision-making.

The Wolfram|Alpha “computational knowledge engine” is being hailed as a prototype of what a global database in the Semantic Web could do to deliver high-value information, easily accessed in plain language. And Wolfram|Alphaitself appears to be claiming the turf of global database. With more than 10 trillion pieces of information, and plans to expand significantly, the site says:

“Wolfram|Alpha’s long-term goal is to make all systematic knowledge immediately computable and accessible to everyone. We aim to collect and curate all objective data; implement every known model, method and algorithm; and make it possible to compute whatever can be computed about anything. Our goal is to build on the achievements of science and other systematizations of knowledge to provide a single source that can be relied on by everyone for definitive answers to factual queries.”

This may resonate with some in the Semantic Web community; a number have seen the task of retrofitting the current Web into machine-friendly markup so daunting that the global database might need to be built from scratch. But on face value, Wolfram|Alpha violates one of the cardinal precepts of the Semantic Web: that the proprietary hoarding of databases behind walls must end — data must flow freely from and to all sources.

And the vision of W3C’s Semantic Web isn’t to replace the current Web, but to enhance it. The question is how to get the work done. There was no organized plan to build the Web. To be sure, there were plans to create the technology and the infrastructure. But most of those tens of billions of indexed Web pages were built by corporations, small businesses, non-profits and individuals, each for their own reasons. Persuading websites to recode Web pages to Semantic Web specifications — or even to do so going forward — will take a powerful motivator.

Google breaks the ice
Google may have provided such a motivator with its May 12 announcement of Rich Snippets. “Snippet” is the name Google uses for the short block of text appearing below a search result, giving more information about the Web page. Google announced in its Webmasters Central Blog (a bookmark for anyone interested in making his or her website more visible to the leading search engine) that it is now applying Google’s algorithms to “highlight structured data embedded in web pages.” Translation, content marked for the Semantic Web. The “rich snippets” will be based on the structured data.

This is a major event for a couple of reasons. First, Google is the poster child for machine learning, which in Web terms means teaching machines to scan plain-language Web pages and cull meaning from them. This is the other end of the spectrum from the Semantic Web vision of coding pages in a special way so they have meaning to machines. Google’s announcement, which explicitly discussed plans to extend support for structured data in new ways as well as to recognize metadata coding developed elsewhere on the Web, puts the company on a course for a synergy between machine learning and Semantic Web practices.

Yahoo searchmonkeyGoogle isn’t the first major search company to focus on structured data. Yahoo’s Search Monkey platform for Web developers supports a robust package of metadata formats, and urges developers to have at it. But the reality is that Google is the one people are paying attention to where it counts.

This brings us to the second reason this is a major step: self-interest. It’s important to harness the force that created those tens of billions of indexed Web pages in the first place. And Google’s announcement means money.

In the current Web economy, search engine status is a prime motivation. And Google ranking is the Holy Grail. What Google is offering (while explicitly not promising) is the chance for websites to attract the eye of the search engine’s algorithms, and even some measure of control over that vital couple of lines of text that tells a user “click me.” In an environment where every keystroke in a Web page’s metatags is dictated by a Search Engine Optimization guru, and every word of a headline and keyword-packed top paragraphs, Web producers across the Net are — or are about to be — learning metaformats.

And that just may be the sound of a Semantic Web snowball starting down the hill.

Popularity: 3% [?]

On the back of the dotcom boom, Michael Simms ploughed £350,000 of his own money into a games company with the intention of bringing some of the most playable Windows titles to Linux.

Almost 10 years later, Linux Game Publishing, which specialises in porting Windows titles, is still going strong, releasing several titles every year. Linux Formatmagazine caught up with Michael on a recent trip and asked him about where the company will go from here.

Linux Format: What inspired you to start making games Linux-compatible?

Michael Simms: I started using Linux when I was at university, so I’ve been doing it for a long time. I did a few jobs in the Unix field and got to hear of Loki Software, who had just decided to make Civilisation: Call to Power. I got on to the beta for that, but I found it was hard to buy a copy of it when it came out. So I contacted Loki about becoming a reseller and that’s what started Tux Games.

When it became obvious Loki was going under, it was like: ‘crap, we’re going to have nothing to sell’. So we went to a company we knew weren’t able to make a deal with Loki, Creature Labs, and came to an agreement with them. We started off by publishing Creatures 3 and went from there.

LXF: What do you think Loki did wrong?

MS: Loki overestimated the market. It would spend a lot licensing a triple-A title and not generate enough sales, but carry on doing that again and again. A classic example was its Quake 3 special edition where it made 50,000 tin boxes and only sold a few thousand.

LXF: Didn’t you do a similar thing with X3?

MS: That was a limited edition of 500 rather than 50,000! We did it slightly differently. Just 500, and we won’t be making any more. We’ll carry on making the standard edition until whenever, but try to avoid making the same mistakes as Loki.

LXF: After deciding to port a game to Linux, what’s the next step for you at LGP?

MS: Once we’ve made the agreement, we get hold of the source code and then we just do whatever we need to do for the port. Usually, ports are fairly similar.

LXF: Do you choose games with a similar back-end?

MS: No, we choose games based on playability. I personally pick out a lot of the games because they’re what I like! But we’ve also got a few other people that we trust to give a balanced view of things.

LXF: Is there a massive difference between taking on something like X3 and a 2D puzzle game?

MS: We aim to do fifty-fifty top-end games to entry-level games so that we can pay equal attention to companies behind titles like Jets’n'Guns. We concentrate on both to make sure that we’re still seen to port big games, but we do small games so that smaller companies also have a route into the Linux market.

LXF: How long does a game like Jets take to port?

MS: Well, with Jets, we didn’t actually do most of the port. Rake In Games did it instead, we just added some polish and work at the end …

To do a port of something like Jets would take one developer a couple of months. Maybe a bit less. X3 – that’s more a team of four developers for five to six months.

Popularity: 4% [?]