VVHOLE Booke of Psalmes sells for $14 million

The first book printed in the United States sold yesterday at Sotheby’s auction for $14 million according to the NY Times.  The book was the 1640 Puritan copy of the book called the Bay Psalm Book.  The copy was one of two held by the Old South Church which were being held by the Boston Public Library.  There are only 11 known copies of the book now.

Libraries have a rich history with this item.  If you are interested in the content, here are some paths to follow to learn more.

The Library of Congress describes their copy in the following catalog entry http://lccn.loc.gov/71002405    The LC held copy has been scanned and is available publicly to view through the LC Digital Collections site.


The digital scan can be viewed in great detail at http://oc.lc/psalmbook   Use the “Next Image” link to browse through the pages.

I found the language of the time interesting reading. Here is an example from pages 94-95:  “Becaufe of his voyce that doth fcorne and fcoffingly defpight”   Doth thou know what meaning this is?

Libraries worldwide have microfilm copies of the book if you want to see a copy locally.  See WorldCat holdings at http://oc.lc/wcpsalmbook  for a links to libraries that hold the item.

This all makes me wonder how our digital texts will be viewed 400 years from now.  Will there be collectors?  Without the scarcity that leads to this level of curation, what will motivate special attention to one work over another?  Would there be an event that would thrust a specific work into mainstream news for a day?

The NY Times article is here: http://www.nytimes.com/2013/11/27/nyregion/book-published-in-1640-makes-record-sale-at-auction.html

Micro Generations, Mobile and the Next Revolution

Micro-generations and

The term “generation gap” was popularized in the
60’s and generations have been categorized, studied and targeted by marketers
ever since.  The use of Baby Boomer, Gen-X, Gen-Y is well documented and
in common use.   Digital Natives is another term has been used to
describe characteristics of youth, specifically around how they consume
content, manage social lives, and spend their free time.  OCLC Research
has been studying Digital Natives and related generational cohorts, in
particular to learn more about their use of libraries and how they conduct
research. See Lynn
Connaway’s work here
, for example.

Recognizing that use of technology varies by generations, I
often test my assumptions for new service ideas on my children, ages 9, 16 and
20.  I ask them about what is new, what is tired, and what they wish
existed.  Yes, I am a different kind of helicopter parent! After many
years of doing this, I have noticed some quite dramatic differences between my
children, their friends and their use of technology.  In casual work
conversation I have started using the term “micro generation” to
describe these differences.  My colleague in the Innovation Lab, Tip
House, suggested I write this as a blog post on these micro-generations.

Micro-generations describes the differences between
technology users in roughly four-year bands.  Their band tends to be
defined by their school mates and when they gain access to technologies. 
For example, if SMS became accessible and affordable when they were in middle
school (11-14 years old), it tends to define their use of that technology for a
period well beyond those years.  Depending on their adoption rates of
technological change, they may be a leader or follower within their band but
this is heavily influenced by those in the same school building, not just their
grade level.  I will not embarrass my children by using their real names
for the following examples… and I added a fictitious older cousin to describe
my view of micro-generations.

Jessica, a Mobile Immigrant –

  • Born 1986-1990
  • She didn’t get her own phone until at least High
    School but her parents have owned a mobile phone as long as she can remember.
  • She started using a computer just as the web
    emerged, but she has been quick to adopt mobile access such that mobile access
    is her primary means of connectivity now.
  •  Jessica still thinks of the phone as a separate,
    optional device to her laptop and still struggles a bit with device choices.
  • Jessica has graduated college but cannot find
    work in her degree area.

Christopher, an SMS Native –

  •  Born 1990-1994
  • He received his first cell phone on a family
    plan with SMS access in 7th grade.
  •  He fought with his parents over text message
    limits and laughed at his parent’s clumsy use of mobile devices.
  • He was capable of texting at rates of 10-12,000
    messages per month on a 12-key flip phone, with his eyes closed.
  • Mobile devices are Chris’ primary access to
    entertainment reading but increasingly, he is using them for textbooks.
  • Chris is still in college and has made a few low
    key attempts at starting a business.

Ashley, a Feature Phone Native

  • Born 1994-1998
  • Her first phone was a feature phone with QWERTY
    keyboard and unlimited text messaging
  • She fought with parents over “accidental”
    ringtone downloads and web KB data usage
  • She texts with friends but prefers to talk in
    person and she has been pulled toward the less-mobile web by Facebook and
  • Ashley is in high school, accumulating a lengthy
    resume of college application fodder.  She doesn’t think the best careers
    will be at large corporations.  

Jacob, a Smart Phone Native

  • Born 1998-2002
  •  Jacob’s first phone is an iPhone hand-me-down
    from his parents.  He doesn’t have a wireless contract but is connected
    solely via home or free WiFi.
  • Jacob loves Angry Birds and other games
    on his iPhone and easily installs games on his mom’s e-readers while at his
    sister’s school events.
  • Jacob easily picks up and uses any mobile device
    without an opinion on Apple-Google-Microsoft.  He uses what works and
    disregards anything that doesn’t.  If he is prompted to upgrade an OS on a
    device, he just puts it down and moves to something that works.  He rarely
    uses a computer to access the web for anything.
  • Jacob is trying to be big-man-on-campus in
    elementary school and wants to be a professional soccer player.

Why am I naming these generations in terms of mobile? 

I believe we are at the precipice of the next revolution in
technology.  I don’t see the current iterations of apps and web services
leading to a revolution… but rather negatively creating an environment with a
void to be filled.  The business and software architects of the services
we use today are likely building on a foundation of knowledge that is
pre-mobile. In the best case, they are pasting mobile access onto sites which
were born on the web.  More likely, they are building mobile access to
businesses that pre-date the web.

Consider modern information technology revolutions and their
widespread adoption:  Email, Internet, Web, and Social.  The
triggering events are spaced out about every 4-7 years.  Could it be that
the driver for each of these changes was the incoming micro-generation being
unhappy with the tools of their predecessor?  What micro-generation is
joining the workforce today?  The Christophers are about to enter
the workforce as the first micro-generation of mobile natives!   They
have been using mobile devices for almost 20 years.  Let’s face it; the
job market is not exactly kind to graduates today.  I can imagine they have
some really good ideas and they aren’t going to wait around for permission to
operate in existing environments.

The next revolution is coming.  It is not search,
social, and e-content clumsily forced through a mobile pipe.  It will not
have its foundation in the web.  It is not Google, Amazon, Facebook,
Twitter, Pinterest, or Instagram and certainly not an ad supported app glued
inside those environments.  It will be something that the mobile natives
will invent to solve what they see as a big problem.  Let’s make sure we
pay attention to them. 

If you are under 25, no pressure… oh never mind, they are
not reading blogs!

PS: Some further reading on this topic from Forbes: 

Google and Facebook might completely disappear in the next 5 years

did this 15-year-old from Maryland just change cancer treatment?


Also published at http://community.oclc.org/cooperative/2012/09/micro-generations-and-libraries.html


Press Release: OCLC provides downloadable linked data file for the 1 million most widely held works in WorldCat

DUBLIN, Ohio, USA, 14 August 2012–OCLC has published bibliographic linked data for the most widely held works in WorldCat. This downloadable file–representing nearly 1.2 million resources–contains approximately 80 million linked data “triples,” the term for the most granular relationship possible between discrete pieces of information.


OCLC Cooperative Blog: Interview with Mike Teets on the Website for Small Libraries Project

Andy Havens, OCLC Coop Blog Editor: We recently got together with Mike Teets, Vice President for Innovation, to discuss the recently announced release of a new project to come out of OCLC’s Innovation Labs: Website for Small Libraries. We wanted to get a bit more detail about the release, and about ongoing plans for the project…

OCLC Cooperative Blog: 4 million improved connections to WorldCat from Open Library

Bruce Washburn, a Consulting Software Engineer with OCLC Research, just put up a post on the Developer Network Blogabout a project he’s just completed with Open Library. I won’t repeat the technical background information that both Bruce and George Oates from the Open Library have detailed, but I do want to take a moment and reflect on how a collaboration like this benefits libraries…

OCLC Cooperative Blog: The conclusion of the Ask4Stuff experiment

Back last June, the OCLC Innovation Lab announced the availability of a Twitter-based service called Ask4Stuff. The idea (in a nutshell) was to let people tweet a request for information on a particular subject to the service using the #Ask4Stuff hashtag. The service would then return a link to a WorldCat.org set of resources based on a search of that subject. Later, we added a more complex multistep analysis of the request matching to various classification and ranking schemes. It was an experiment in developing a more “social search”…

OCLC Cooperative Blog: A Web presence for every library

In April of 2010 OCLC started the Innovation Lab, a small team focused on the exploration of new technologies in uncharted spaces to enhance the products and services offered by the OCLC cooperative. Examples include the beta WorldCat mobile offering at http://worldcat.org/m and the social network integration Ask4Stuff

Transparent Transaction Redirection

Moving on from the
previous post to the topic of transaction transparency.  The previous post, which in hindsight seemed
a bit dry, focused on the numbers of being highly available.  Sure, I had a colored graphic and all, but it
was just math.  There were probably only
one or two people that were inquisitive enough to actually check the math.  Transaction level redirection is where
meeting the numeric goals gets interesting. 
It may be the key distinguishing characteristic
between a hosted
web application and a system that is designed for web scale. 

To recap Transparent: Transaction level redirection without user
knowledge. Any service within the infrastructure should fail over quietly and
reliably to an alternate service with little to no disruption to the user.
There should be no degradation when services go down

Sessions and State leading to Efficient Workflow

 Can you believe that
it was almost 15 years ago that we started down the path of trying to force our
session based, stateful systems into the “World Wide Web” and the Mosaic
browser?  We slapped web skins on
existing applications.  Many even
continued to architect this way because it was the knowledge that existed at
the time.  Unfortunately, some of that
legacy is still with us.  Admittedly,
OCLC has some remaining pockets of this model. 

The limitations led many to view statefulness and session
based designs as bad juju.  Over the next
years, the information industry has meandered through various solutions to this
problem:  Cookies, bad cookies; Session
URL tags, bad session URL tags; etc.  Fast
forward to today and we find that most systems strive to implement good
stateful models without the legacy of doing so with sessions.   Think of “good stateful models” as those
that support efficient workflows.  Workflows
might be a good future topic… but I will just throw out that “efficient” is
usually not equal to “the way I always did it”.

What does that have to do with Transaction Redirects?

Efficient workflows require that your services maintain some
context for you.   Why are you here?  What did you do last?  What are you likely to do next?  These are relatively simple things to do in a
single host environment.  They become a
little harder across multiple machines. 
They become very difficult across data centers.   Now introduce failures into the system:
machines crashing, disks failing, applications failing, networks failing… all
things that will happen.  Your systems
must be prepared on every transaction to infer some context even if that system
has not previously seen your history. 

Consistency and Availability

As you might imagine, transaction redirections introduce a
balance between consistency and availability… something that has been difficult
in the library industry.  Up to a point,
they both can increase together with well built software.  At some point however, higher consistency
across larger and larger stores of data leads to lower reliability (insert your
favorite metasearch story here). There is not a single right answer.  For some applications, you don’t want an
answer unless it is guaranteed right… at your doctor’s office for example.  In other cases, having the service available
is more important than getting the exact same answer on multiple attempts…
shopping on Amazon.  The key is finding
the right balance for your environment.

Again, what does this have to do with Transaction Redirects?

I used the above path to demonstrate the issues of
statefulness and sessions in human interaction with our systems.  These issues apply to service components in
service oriented architectures.  In
modern service oriented systems, the number of components can be onerous (70+
in worldcat.org).  We must build the
overall system anticipating failures. 
The value of SOA systems is that components can be scaled independently
and fail over independently.  For
example, if cover art is becoming slow, we can add cover art virtual servers within
minutes.  If one fails, the calling
applications can switch over silently to an alternative.  Each component of the system must be prepared
for taking transaction loads in growth situations as well as failures.  They must do this without forcing the user to
back up to the top of a workflow chain.

But how do you do that?

The good news is that there are many blazing the trail and
pointing us to what works and what doesn’t work.  The not-so-good news is there is not an easy way
to take an existing application and add this stuff after the fact.  There is much written about the architectures
of the large internet services.   The
common point amongst all of them is that they are designing core
infrastructures that support models of scale and availability.  They are not simply hosting an application on
the web.

An example component at OCLC is our internally developed
text engine.  All of the data is in
memory for absurdly fast response time. 
It is spread across three clusters. 
Each transaction is sent to all three at the same time.  The first one to respond wins.  Each partition within each cluster is also
replicated.  A transaction failure,
outside of those we humans cause, is practically impossible.  We have even tested pulling plugs from the
wall and watching a load tests continue to hum along without skipping a beat.



Available and Reliable

This is the second post in the series on “What is Web-Scale”.  It has been a while since my first post so I had better get on with it.  I took an informal poll on twitter to select which area of the web-scale / cloud concepts to expand.   Transparency was popular but the most popular was “just do them in order”.  That is what I will do.  Transparency will be next.

Reiterating the bullet from the summary post:  Available and Reliable: 99.9 or 99.99% availability (24x7x365, not against an advertised availability) Always On: No down time, planned or otherwise. The site must always be available.

A common internet forum statement is “If there aren’t pictures, it didn’t happen”… so here is a picture.   The first thing about availability at scale is that you cannot depend on opinions or feelings about whether it is good enough.  You must measure it.  It must be measured every second of every day. 

The data must be logged over long periods to determine individual service frailty.  For massively scalable systems, this data must be reviewed daily with alarms going off anytime a
system falls out of specification.  The following is a high level dashboard of one of our system monitors at OCLC.  Failures must be evaluated for corrective action.  It’s just not optional.



The numbers:What does it mean… system managers slang is “Two nines” or “Four
nines”.  99% available is “2 nines”, 99.99% is four.  Simple enough right?   While it seems mathematically simple, this area tends to be often misunderstood.  We all tend to relate statements on reliability to personal devices and machines that are very local and singular in nature.  A single machine at 99.99% is down for 52 minutes a year.


99% – 3.65 days outage per year

99.9% – 8.76 hours outage per year

99.99% – 52.56 minutes outage per year

99.999% – 5.256 minutes outage per year

99.9999% – 31.536 seconds outage per year

99.99999% – 3.1536 seconds outage per year

Now the bad news: A service actually drops to 99.98 available when it is dependent on just two 99.99% lower level services (105 minutes per year).  This can be called series availability.  The more services you chain together the worse your reliability gets.

2 services in series: 365 * 24 * 60 * .9999 * .9999 = 105 minutes annually.

3 services in series: 365 * 24 * 60 * .9999 * .9999 * .9999 = 157 minutes annually.

As you might guess, our current Web 2.0 mashup world is generating reliability issues as services are very typically series based… metasearch -> webui  ->  SRU -> database -> data just as a common example.    

Don’t despair, there is good news!  This good news actually supports a service architecture environment instead of detracts.  There are ways to improve availability with a SOA model.  The first, and most expensive way is to buy and manage very highly reliable individual systems.  This is the path the big iron of the 80’s took… and it got very, very expensive.   In modern highly available environments, this issue is addressed by parallelizing the workload.  Simply double up each service so that both must be down for the entire system to be down and you are back to 52 minutes with 99.99% on each machine.

                Availability = 1 – (1-MachineAvail)**2 

Given that, we have even better news, if you double 99% available machines you get 99.99% availability, triple and you get 99.9999%!  This is why the massively scalable architectures now can use commodity hardware instead of paying for it in the individual machines.

In real life examples however it is never “simply double…”   There are issues in software design, data integrity issues, transaction routing, load balancing, fail-over, etc.  These all contribute a significant cost to obtaining highly reliable systems.  In other words, we moved some expense from hardware to software.  This is good news again since software copies scale less expensively than hardware.

Planned verses Unplanned:  How many of our services have an outage notification page or warning page of pending outage or a current outage?  Historically we have struggled over the words to use as I am sure everyone has.  We carefully craft messages and explanations.  But realize this… NOBODY READS THEM!  We might feel a little better when we find the notice after we see a service has failed but the vast majority of users of our systems just see a failure and move on to an alternate. Another false comfort is that somehow planned and unplanned outages are different.  Outages for upgrades are really not tolerated by users. 

Major internet services figured this out from the beginning.  The service must be on at all times.  Software installs must be done on a rolling basis while user transactions are serviced. Hardware additions or replacements must be the same.  Always on is now the default end-user expectation.

A positive byproduct of the scaling across commodity hardware for reliability is that there are now many options for rolling installs across an environment.  It can be done in parallel data centers, farms within data centers, individual machines or even virtual machines on a single host.  Again, it takes software design and configuration management design, but it is quite practical in today’s environments.

OCLC:  Focusing on just one service platform, worldcat.org is comprised of 150 servers.  These servers are divided into farms by function… 65 database servers, 75 application servers, and 10 servers supporting harvesting and bots.  We continually add hardware and rebalance the environment with demand.  We have two data centers today and will likely have more in the future as we grow and balance load geographically.