Ad hoc text analytics

Twitter 2009

I found an old sentiment analysis application. It has very unglamorous packaging but a  good algorithm under the hood. I ran the Twitter user id’s of the brightest people I know. well, know of, who are active Twitter users. The assessment of “bright” was subjective by me.  All are acknowledged experts or advanced degree holders. Maybe half speak English as a second language, but are sufficiently articulate that their “essence”, well, intelligence shines through.

Guess what: It worked! I don’t know if anyone cares about this sort of thing, that really sharp successful people score well on this sentiment analysis indicator. That doesn’t necessarily mean it would have any predictive value. And no one seems to care much about this anyway. But what I’m saying is that most of these people only have okay-ish Klout scores e.g. 40’s. But they’re not trying to use Twitter for any particular social media purpose. Well, I don’t know that with certainty.

Published in: on February 13, 2012 at 6:00 pm  Comments (6)  
Tags: , ,

Chart art

Edward Tufte’s first text, The Visual Display of Quantitative Information, introduced standards for graphical representation. It is considered the definitive guide for visual display of complex data.

UPDATE 4 September 2014

Mind map about Tufte data visualization

Visualization of Edward Tufte visualizing data

Visualizing Edward Tufte’s thought processes

I found this while surfing Flickr. Austin Kleon of Austin, Texas is the artist. The image represents the cognitive process by which Edward Tufte transformed raw data into digestible information while writing Envisioning Information, one of his many follow-on publications to Visual Display. It is a mind map.

Tuftese

IEEE Spectrum’s Innovation blog featured the topic of data visualization, profiling Edward Tufte as a practitioner. The emphasis was unusual for IEEE. “Tufte-isms” explores how Tufte’s ideas have influenced language:

Tufte, it turns out, is not only a doyen of data visualization but also a neologist par excellence. His most famous term might be chartjunk, which refers to chart elements that not only serve no purpose but may in fact hinder understanding. In Tuftese, when chartjunk takes a cartoonish form…the result is a chartoon.

SAS* surprise

(more…)

Published in: on February 4, 2012 at 10:25 pm  Comments (2)  
Tags: , , , , ,

Taleb and the language of risk

Last night, I read about Nicholas Nassim Taleb on English Language and Usage StackExchange (EL&U). Professor Taleb wants to introduce a new word to the vocabulary of global financial collapse, antifragility:

So let us coin the appellation “antifragile” for anything that, on average, (i.e. in expectation) benefits from variability.

Consensus on EL&U was that this was a creative but unnecessary neologism. I echo the concerns of my EL&U comrades: Antifragility might cause confusion (maybe it is “anti-fragility”). There are many adequate, extant words that Taleb could use, however, antifragility is a term that will be uniquely associated with him.

I am not convinced that there are many entities that actually thrive due to uncertainty. A delta hedge that is long volatility is the only construct that I can think of off-hand. Perhaps that was what Taleb had in mind.

The original Black Swan

book cover of black swan with navy background

The Black Swan by Thomas Mann; 1954 UK First Edition

There was a slightly less contemporary black swan, the novella written by Nobel-prize winner Thomas Mann toward the end of his long and distinguished literary career.

The plot of that short fiction work also pertained to an anomalous event, one that could be considered a statistical outlier. (more…)

Published in: on February 1, 2012 at 6:28 am  Comments (8)  
Tags: , , , , , ,

US Mint ends production of one dollar coins

Last Tuesday, 13 December 2011, The U.S. Mint announced that current production of one dollar coins is ending. The Mint will continue to produce a few one dollar coins for collectors, as required by law. But these will have numismatic value, and cost more than $1.00.

instead of producing 70-80 million coins per president, the Mint will now only produce as many as collectors order.

US Mint one dollar coin

2010 Native American $1 Coin reverse

Forty percent of $1 coins were returned, unwanted, to the Federal Reserve Bank each year.

Circulating demand for $1 coins will be met through the Federal Reserve’s existing stockpile, which will be drawn down over time.

My favorite $1 coin featured Sacagawea, guide to Lewis & Clark. This is the 2010 Native American $1 coin, reverse side. It is beautiful. Click through for full details from the U.S. Mint. (more…)

Published in: on December 16, 2011 at 12:23 pm  Comments (6)  
Tags: , , , ,

Idea for a very open ID

Be receptive! Be open to each and every type of user input for authentication.

Universal sign on

This very user-centric approach for identity resolution leverages the many open API’s now available for web services. Feel free to select your user name-of-choice!

  • @Twitter user name
  • Facebook.com/user name
  • user name@gmail.com
  • YouTube.com/user name
  • user name.wordpress.com or user name.wordpress.org blog URL
  • Flickr.com/user name
  • user name@yahoo.com
  • Open ID provider URL
  • more?

In his identity resolution related post, developer Luis Farzati emphasizes that:

the objective is to allow the user to input whatever wanted [in order] to login… If it exists as a valid username out here, we’ll find it and suggest it!

Casual testing

Luis Farzati’s Smart Identity Resolver Widget is on Github. A demo is included. I tried it. (more…)

Published in: on December 6, 2011 at 9:04 am  Leave a Comment  
Tags: , , ,

Economic Models for Turbulent Times Part 2

A new research study has already received unusual attention. The Network of Global Corporate Control [PDF] discovers a relatively small group of multinational companies with disproportionate influence over the global economy. The authors, a trio of complex systems theorists at the Swiss Federal Institute of Technology in Zürich, are supposedly the first to empirically identify such a network.

The problem is approached using mathematical models designed for capturing behavior of complex natural systems. The study applies this methodology to a large data set of corporate information, to map ownership among the world’s transnational corporations (TNCs). Previous studies reported that a few TNCs drive much of the global economy. However, they analyzed fewer companies. Due to limited data availability and computing resources, past studies did not consider the effect of indirect ownership.

Methodology

Orbis 2007, a repository of over 30 million private and public companies, published by Bureau van Dijk, was the data source. The study sample used the 43,060 largest TNCs, and derived the associated ownership linkages. The network structure was based on the relationships between shareholding interests, then weighted by each company’s operating revenue. This yielded a directional map of global economic power.

Quantitative results

The model revealed a core group with networked ownership, see image below. These 1,318 companies:

  • all had ties to at least two other companies in the core group
  • on average, were each connected to 20 other core group companies
  • represented 20% of all global operating revenues,
  • collectively owned the majority of the world’s largest blue chip and manufacturing firms:
  • in total, generated 60% of all global revenues.

(more…)

Published in: on November 20, 2011 at 4:28 am  Comments (2)  
Tags: , , , ,

Internet standards for HTML

The World Wide Web Consortium (W3C) is standardizing over 100 specifications for the open web, in at least 13 working groups. The CSS Working Group alone is in charge of 50 specifications. This does not include work on Unicode, HTTP and TLS.

http://tantek.com/2011/028/t5/standards-w3c-100-openweb-specs

New tag proposal.  Not really.

The nice thing about standards is that there are so many to choose from

I was waiting to post this until the debate between W3C and WHATWG about the status of HTML5 scope was resolved. However, I have waited since February 2011. Consensus is that HTML5 is being inappropriately used as a catch-all for every standard supported by modern browsers. Modern browsers actually include much more: CSS3 styling, WOFF (web fonts), semantic web elements such as microformats, 3-D graphics including SVG, and performance enhancements. HTML5 tags are merely one part of semantic web support. As a result, terminology was modified by WHATWG. HTML is the new HTML5(more…)

Published in: on November 15, 2011 at 4:25 am  Comments (1)  
Tags: , , , , ,

PDF history and something special from Adobe

Part One: PDF history 

PDF is a formal open standard, ISO 32000. It was invented by Adobe Systems 17 years ago.

PDF = Portable Document Format

PDF history by Adobe

History of the PDF by Adobe Systems

The image links to a pleasant interactive timeline of Adobe Systems and its role in the development of the PDF. The chronology is in Flash, and thankfully free of any video or audio. Read more about Adobe Systems role in the history of PDF file development.

PDF files are more versatile than I realized, and

  • are viewable and printable on Windows®, Mac OS, and mobile platforms e.g. Android™
  • can be digitally signed
  • preserve source file information — text, drawings, video, 3D, maps, full-color graphics, photos — regardless of the application used to create them

Additional PDF file types exist, including PDF/A, PDF/E and U3D. All are supported by Adobe software.  (more…)

Published in: on September 5, 2011 at 7:30 pm  Comments (3)  
Tags: , , , , , ,

Power law relationship in modern demographics

Cognition seems to be the driver behind a power law relationship, which would be odd indeed. It implies a fixed way of thinking about geography and places that can be modeled statistically. Human thought processes aren’t generally amenable to quantitative models.

Is this something new?

curious relationship

Toponyms

Giving a name to a place is an important act. It says a place has meaning, that it should be remembered. For thousands of years, the way we kept track of place names—or toponyms—was by using our memory. Today, we’re not nearly so limited, and the number of toponyms seems to have exploded. Yet oddly enough, the number of places we name in a given area follows a trend uncannily similar to one seen in hunter-gatherer societies.…

via Per Square Mile
Next steps?

  1. Confirm if Eugene Hunn’s 1994 findings were reproduced with current data
  2. Check whether the USPS zip code information used was correct

Basic data visualizaton

Simple data viz

Internet users by country in 2010

This is the first of five graphics in a series, State of the Internet 2010. All are hand-made graphics by Jose Duarte. He is exploring new and simple ways to represent information. With his handmade visualization tool-kit, he provides the technology to rapidly create any kind of graphics including

abstracts maps and diagrams, area graphs and charts, arrow diagrams, bar graphs, Venn diagrams, time line charts, bubble graphs, circle diagrams, proportional charts, organization charts, and really, whatever you want.

Do you want your own kit? Follow the link embedded above, and follow the instructions. It can be yours, free of charge, no-strings-attached. Just send an email to Jose Duarte as instructed in the text accompanying the “handmade visualization tool-kit” link.

Published in: on August 8, 2011 at 9:42 am  Leave a Comment  
Tags: , , ,