Sunday, September 28, 2008

assignment 2 - photo link

here's a photobucket album with my pictures!

Friday, September 26, 2008

Muddiest Point

Will we actually be doing any XML coding for our term projects, or is this something we just need to understand conceptually?

Reading Notes - Week 5

Bryan:
XML is subset of the Standard Generalized Markup Language.

XML allows users to:

  • bring multiple files together to form compound documents
  • identify where illustrations are to be incorporated into text files, and the format used to encode each illustration
  • provide processing control information to supporting programs, such as document validators and browsers
  • add editorial comments to a file
XML is formal language that can be used to pass information about the component parts of a document to another computer system

provides a formal syntax for describing the relationships between the entities, elements and attributes that make up an XML document

users must create a Document Type Definition that formally identifies the relationships between the various elements that form their documents

Where elements can have variable forms, or need to be linked together, they can be given suitable attributes to specify the properties to be applied to them

An XML file normally consists of three types of markup, the first two of which are optional:

  1. An XML processing instruction identifying the version of XML being used, the way in which it is encoded, and whether it references other files or not, e,g,
  2. A document type declaration that either contains the formal markup declarations in its internal subset (between square brackets) or references a file containing the relevant markup declarations (the external subset), e.g.:
  3. A fully-tagged document instance which consists of a root element, whose element type name must match that assigned as the document type name in the document type declaration, within which all other markup is nested.
XML-coded files are, by their nature, ideal for storing in databases. Because XML files are both object-orientated and hierarchical in nature they can be adopted to virtually any type of database, though care sometimes needs to be taken to ensure that enough structural data is retained in the database to reconstruct the original file


Ogbuji:

XML is based on Standard Generalized Markup Language (SGML), defined in ISO 8879:1986 [ISO Standard]. It represents a significant simplification of SGML, and includes adjustments that make it better suited to the Web environment.

an entity catalog can be used to specify the location from which an XML processor loads a DTD, given the system and public identifiers for that DTD. System identifiers are usually given by Uniform Resource Identifiers (URIs)

**
A URI is just an extension of the familiar URLs from use in Web browsers and the like. All URLs are also URIs, but URLs also add URNs** [is this true? i thought the opposite...]

In XML namespaces each vocabulary is called a namespace and there is a special syntax for expressing vocabulary markers. Each element or attribute name can be connected to one namespace

XML Base [W3C Recommendation] provides a means of associating XML elements with URIs in order to more precisely specify how relative URIs are resolved in relevant XML processing actions.

the XML Infoset, defines an abstract way of describing an XML document as a series of objects, called information items, with specialized properties. This abstract data set incorporates aspects of XML documents defined in XML 1.0, XML Namespaces, and XML Base. The XML Infoset is used as the foundation of several other specifications that try to break down XML documents

a physical representation of an XML document, called the canonical form, accounts for the variations allowed in XML syntax without changing meaning

XPointer, language that can be used to refer to fragments of an XML document

XLink offers such links (simple links), as well as more complex links that can have multiple end-points (extended links), and even links that are not expressed in the linked documents, but rather in special hub documents (called linkbases).


XML Tutorial:

The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD.

An XML Schema:

  • defines elements that can appear in a document
  • defines attributes that can appear in a document
  • defines which elements are child elements
  • defines the order of child elements
  • defines the number of child elements
  • defines whether an element is empty or can include text
  • defines data types for elements and attributes
  • defines default and fixed values for elements and attributes
XML Schema became a W3C Recommendation 02. May 2001.

One of the greatest strength of XML Schemas is the support for data types.

With support for data types:

  • It is easier to describe allowable document content
  • It is easier to validate the correctness of data
  • It is easier to work with data from a database
  • It is easier to define data facets (restrictions on data)
  • It is easier to define data patterns (data formats)
  • It is easier to convert data between different data types
A simple element is an XML element that can contain only text

Simple elements cannot have attributes. If an element has attributes, it is considered to be of a complex type. But the attribute itself is always declared as a simple type.

A complex element is an XML element that contains other elements and/or attributes.

There are four kinds of complex elements:

  • empty elements
  • elements that contain only other elements
  • elements that contain only text
  • elements that contain both other elements and text

WSDL is a schema-based language for describing Web services and how to access them.

WSDL describes a web service, along with the message format and protocol details for the web service.


Bergholz:

Extensible Markup Language (XML), a semantic language that lets you meaningfully annotate text. Meaningful annotation is, in essence, what XML is all about.

DTDs let users specify the set of tags, the order of tags, and the attributes associated with each

Elements can have zero or more attributes, which are declared using the !ATTLIST tag

Using namespaces avoids name clashes (that is, situations where the same tag name is used in different contexts). For instance, a namespace can identify whether an address is a postal address, an e-mail address, or an IP address

Unfortunately, namespaces and DTDs do not work well together

XML extends HTML’s linking capabilities with three supporting
languages.
 Xlink (http://www.w3.org/TR/xlink/), which describes how two documents can be linked;
 XPointer, which enables addressing individual parts of an XML document; and
 XPath, which is used by XPointer to describe location paths.

The Extensible Stylesheet Language (XSL) is actually two languages: a transformation language (called XSL transformations, or XSLT) and a formatting language (XSL formatting objects). Although DTDs were the first proposal to providefor a standardized data exchange betweenusers, they have disadvantages. Their expressive power seems limited, and their syntax is not XML. Several approaches address these disadvantages by defining a schema language (rather than a grammar) for XML documents:
 document definition markup language (DDML), formerly known as XSchema,
 document content description (DCD),
 schema for object-oriented XML (SOX), and
 XML-Data (replaced by DCD). The W3C’s XML Schema activity takes these four proposals into consideration.

Friday, September 19, 2008

Reading Notes

Setting the Stage
metadata: the sum total of what one can say abotu any information object at any level of aggregation
- content (intrinsic) - what is contained/about
- context (extrinsic) - who what when etc about creation
- structure (either int or ext) - associations b/c or among independent info objects

library metadata: includes indexes, abstracts & catalog records created according to cataloging rules (MARC, LCSH etc)

archival & manuscript metadata: accession records, finding aids, catalog records.
- MARC Archival and Manuscript Control (AMC), no MARC format for bib control

not as much emph on structure for lib/arch, but always important even before digitization.
- growing as comp capabilities increase
- structure can be exploited for searching, etc but need specific metadata

metadata:
- certifies authenticity & degree completeness of content
- establishes & documents context of content
- identifies & exploits structural rel's that exist b/t & w/in info objects
- provides range of intell access points for increasingly diverse range of users
- provides some of info an info prof might have provided in a physical setting

repositories also metadata for admin, accession, preserving, use of coll's
- personal info mgmt, recordkeeping

Dublin Core Metadata Element Set

Table 1 - Types
1) Administrative (acqu info, rights & repro, documentation of legal access req's, location info, selection criteria, version control, audit trails)
2) Descriptive (cat records, finding aids, spec indexes, hyperlink rel's, annotations by users, md for recordkeeping systems)
3) Preservation (documentation of phys condition, actions taken to preserve)
4) Technical (hard/software documentation, digitization info, track sys resp time, auth & sec data)
5) Use (exhibit records, use & user tracking, re-use & multi-version info)

Attributes of metadata:
- Source (Int or Ext)
- Method of creation (auto/comp or manual/human)
- Nature (lay/nonspecialist v. expert - but often orig is lay)
- Status (static, dynamic, long- or short-term)
- Structure (structured or no)
- Semantics (Controlled or no)
- Level (collection or item)

Life-Cycle of Info Object:
1) Creation & Multi-Versioning
2) Organization
3) Searching & Retrieval
4) Utilization
5) Preservation & Disposition

Little-Known facts about metadata:
- doesn't have to be digital
- more than just description
- variety of sources
- continue to accrue during life of info object/sys
- one info obj's metadata can simultaneously be another's data

Why Important?
- Increased accessibility
- retention of context
- expanding use
- multi-versioning
- legal issues
- preservation
- system improvement & economics


Border Crossings

DC: simple, modular, extensible metadata

prob's w/ user-created metadata

structured md important in managing intellectual assets

md for images critical for searchability/discoverability

controlled v. uncontrolled vocab (folksonomy)
- still not sure of future role

"Railroad Gage Dilemma" - why no common formal model?

demand for international/multicultural approach


Witten 2.2

19th c goals for bib sys: finding, collocation, choice

Today: 5 goals
1) locate entities
2) identify
3) select
4) acquire
5) navigate rel's between

principle entities: documents, works, editions, authors, subjects.
- titles & sub classifications are attributes of works, not own entitities

doc's: fundamental hierarchical structure that can more faithfully be reflected in DL than physical

"helpful DLs need to present users an image of stability & continuity" (p.49)

works: disembodied contents of a document

edition - also "version, release, revision"

authority control

subjects: extraction: phrases anlyzed for gram & lexical structure
- key phrase assignment (auto class), easier for scientific than literary

LCSH controlled vocab rel's: equivalence, hierarchical, associative


Witten 5.4 - 5.7

MARC & Dublin Core, BibTeX & Refer

MARC: AACR2R guidelines
- authority files & bib records

Dublin Core: specifically for non-specialist use
- title, creator, subject, description, publisher, contributor, date, type, format, identifier, source, language, relation, coverage, rights

BibTeX: math/sci

Refer: basis of endnote

metadata even more imp for images & multimedia
- image files contain some info

TIFF: tags. integers & ASCII text. dozen or so mandatory.
- most DLs use for images

MPEG-7: multimedia content description interface
- still pics, graphics, 3D models, audio, speech, video, combinations
- stored or streamed
- complex/extensible: DDL (desc def lang) usus XML syntax
- temporal series, spectral vector
- some automatic, some by hand

text-mining: auto metadata extraction
- structured markup s/a XML

Greenstone includes lang ident, extracting acronyms & key phrases, generating phrase hierarchies.

some info easy to extract: url/email, money, date/time, names
- generic entity extraction

bib ref's / citation info can be located & parsed

n-grams procedure for lang ID, can assign language metadata

key-phrase: assignment/extraction
- assignment: compare doc to each entry in list (Y or N)
- extraction: phrases from doc listed & best chosen (easier)

IDing phrases: content word, phrase delimiters, maximal length

Wednesday, September 17, 2008

muddiest point

My "muddiest point" this week is installing and running greenstone. It appears that the only version available for mac is the web version, which requires me to download a web server called Apache...I've done so but am not sure what to do with Apache.

Maybe I am making things way more complicated than necessary, but some help would be appreciated!

Friday, September 12, 2008

Reading Notes week 3

I don't have all of the readings in front of me at the moment, so I'm going to just do a brief commentary on them as a whole.

I have to say that I'm still confused by the concept of DOI's. Unfortunately I read the Lynch article last...I feel like reading that one first would have perhaps made things a bit clearer. I understand the motivation behind developing a DOI system, in particular the concept of persistence. However, I do not necessarily how they would work. Lynch raised some good points that helped me to understand that this system is not yet fully fleshed out, and that was helpful. In particular the statement that "Today's standard browsers do not yet understand URNs and how to invoke resolvers to convert them to URLs, but hopefully this support will be forthcoming in the not too distant future."

I suppose one of the main things that confused me was the need for something similar to an ISBN for digital content. Perhaps (probably) I am over-simplifying things, but it seems to me that an ISBN or ISSN are necessary because large numbers of copies exist for printed works. For most digital content, only one or a small few copies exist and this/these can be accessed by multiple users concurrently. To be honest, I am not well versed in Intellectual Property or Copyright issues at all, and this is probably what is confusing me the most.

Another thing that stood out to me was the issue of not being able to learn an object's DOI unless the object carries it as a label. If I wanted to know the ISBN of a particular book, there are ways for me to find it by searching with other, known identifiers such as the title. It seems to me that there is a huge issue in general with the naming of web-based content. Digital files can be given filenames that could serve as some sort of searchable identifier, but in general web pages and sites that host digital content are haphazardly named and authoring information is inconsistently revealed, if at all.

Lesk and Arms Chapter 9 were straightforward and I do not really have much to say about them. In particular, Arms provided a good refresher for me on some concepts that I am somewhat familiar with, but could always learn more about!

I hope this was sufficient for this week's posting...

about me =)

It occurred to me that since I missed the first class session and had to run out right at the end of last week's class (to get to another), I haven't really met anyone in this class yet. And to top it off, comcast had my internet off all week for no good reason! So, here's a little bit more about me, in case anyone is looking for a potential group member...

I grew up in Louisville, KY and then attended college at Emory University in Atlanta, GA. I have a BA in Linguistics and Psychology with a minor in Russian (although I've pretty much lost all ability to speak it, oops). I'm hoping to get a second masters in the not-too-distant future, most likely in cognitive science. I'd like to work in an academic library and focus on services for science/math.

After finishing my BA I worked at an educational non-profit organization teaching kids from 3rd-8th grade, and when I couldn't make rent on that "salary" anymore (ha) I took a job working for an advertising consultant doing generic office grunt work. I also did a good amount of publication and presentation design. I was graphics editor and then editor-in-chief of my college's humor magazine, so I have lots of experience with that sort of thing.

I never really thought of myself as especially tech-savvy, but recently I've decided that maybe I'm selling myself a bit short. I don't have any formal training in anything, but I can generally figure out most things that are put in front of me. As I mentioned, I have lots of design experience and can certainly contribute that.

I've thrown a couple of ideas for the project around in my head, but I'm basically open to anything. At the moment I'm still confused about copyright issues for the materials we use, so I'm not trying to get too attached to any one idea.

I am a talker. In case you didn't notice, ha. I can go on at length on pretty much any topic. But some things that I am most interested in are music (especially independent rock, hip-hop, and electronic stuff), crafts, magazines, pets (especially cats), kitschy robot stuff, and language.

I'll be out of town this weekend (back for class Monday), but feel free to email me if you'd be interested in working together.

Muddiest Point - Week 3

I am still rather confused over the issue of using copyrighted materials for our term projects. It would be great if someone could go into detail about what types of materials are appropriate/ok to use and what are not...

Monday, September 8, 2008

An Architecture for Information in Digital Libraries

Main bldg blocks: digital objects, handles, repositories

Purpose of IA is to rep riches & variety of library info
-digital object: way of structuring info in dig form, some of which may be metadata & includes a unique identifier called a handle.
-DOs often in sets, structure depends on info
-material can be devided into cat's (SGML, WWW objects, comp prog;s, digitized radio prog's etc)
-user interface: browser & client svcs
-repository: interface called Repository Access Protocol
-handle system: unique indentifiers
-search system

Issues in structuring info:
-dig materials frequently related to others by relationships (part/whole etc)
-same item may be in different formats
-diff versions created often (mult copies, or time-based)
-obj's have diff rights & permissions
-users access from diff comp sys's & networks

key-metadata: info to store, replicate, transmit obj w/out providing access to the content. includes terms & cond's and handle

digital material: used to store DL materials

A Framework for Building Open Digital Libraries

"Open" DLs build directly on concepts/philosophies of Open Archives Initiative

Most existing systems classified as DLs resulted from custom-built software dev projects - each involves intensive design, implementation & testing cycles.
- why repeat effort?
- some software toolkits: Dienst, Repository-in-a-box

Most programming environments adopt a component model

Oct 1999 - OAI launched.
- focus on high-level communication among systems & simplicity of protocol
- OAI protocol for Metadata Harvestingn (OAI-PMH): system of interconnected components
- OAI protocol can be thought of as glue that binds together components of a larger DL (or collaborative system of DLs)

DLs modeled as networks of extended OAs, with each OA being a source of data and/or provider of services.
- this approach closely resembles the way physical libraries work
- research & production DLs differ

ODLs guided by a set of design principles & operationalized with aid of OAI-PMH extensions. proven techniques from internet development
- simplicity of protocols, oppeness of standards, layering of semantics, independence of components, loose coupling of systems, purposeful othogonality, reuse

Formal Principles:
1) All DL svcs should be encapsulated w/in components that are extensions of OAs
2) All access to DL svcs should be through their extended OAI itnerfaces
3) Semantics of OAI Protocol should be extended or overloaded as allowed by OAI protocol, but w/out contradicting essential meaning
4) All DL svcs should get access to other data sources using extended OAI protocol
5) Dls should be constructed as networks of extended OAs

OAI Harvester obtains datastream which creates indices for searching.

Components in prototype systems:
- Union: combine metadata from mutl src
- Filter: reformat metadata from non-OAI data srcs
- Search: search-engine functionality
- Browse: bategory-driven browsing fx'lity
- Recent: sample of recently-added items

Tested on NDLTD system - good feedback

Designing by principles & implementing for real-world scenario

Interoperability for Digital Objects & Repositories

Cornell & CNRI: open architecture, confederated DLs, goal of interoperability & extensibility. Allows flexible interaction of existing services & augmentation of the infrastructure with new services.

Interoperability: broad problem domain. typically investigated w/in specific scope (comminity, classification of info, IT area, etc)
- creating a general framework for info access & integration across domains
- goal to enable communities w/ different info & tech to achieve general level of info sharing
Definition: ability of DL components of services to be functionally & logically interchangeable by virtue of their having been implemented in accordance with a set of well-defined, publically known interfaces.

some approaches:
1) standardization
2) distributed object request architectures (eg COBRA)
3) remote procedure calls
4) mediation
5) mobile computing
Cornell/CNRI approach:
1) agreement on common abstractions
2) definition of open interfaces to services/components that implement the abstractions
3) creation of extensibility mechanism for introducing new functionality into arch w/out interfering w/ core interoperability

Principle abstractions:
1) repository: different content managed in uniform manner
2) Digital Object: datastreams/elements (MIME sequence in bytes)
3) Disseminator: extend behavior of DOs & enable interaction
4) AccessManager

Disseminator Types: set of op's that extends basic functionality of a DO. (book - "get next page" "translate text" etc)
- signatures
- DOs can be used to make diss types avail in interface

Servlets: executable program capable of performing the set of op's defined for specific diss types.
- equivalence achieved when diff servlets operate on diff types of underlying datastreams to produce equivalent results.
- stored & registered in infrastructure in uniquely named DO

Extensibility: key is clean seperation of object structre, diss types, & mechanisms taht implement extended functionality.
- can endow DO w/ additional functionality
- new interfaces can be added

Interoperability Experiments:
- IT0: Protocol & Syntactic Interoperability
-IT1: Functional & Semantic
- 1.1: DO Access
- 1.2: DO Creation
- 1.3 Extensible Access
- IT2: Interoperability of Extensibility Mechanisms.
- 2.1: Ability to dynamically load signatures & servlets
- 2.2: Demonstrate flexibility w/ which new diss types are dynamically added to infrastructure

Arms chapter 2

I really enjoyed this article as a refresher on some concepts that I'm familiar with, but hadn't known all the specifics about.


*****
Internet- collection of networks. LAN & Wide-Area Networks. Based on protocol of ARPAnet (TCP/IP)
*IP: Internet Protocol. joins together network segments that constitute internet. Four numbers, each 0-255 stored as 4 bytes. Connected by routers. Info in packets.
*TCP: Transport Control Protocol. Divides msg into packets, labels each w/ destination IP & Sequence #, sends them out on network. Receiving comp acknowledges receipt & reassembles.
- guarantees error-free delivery, but not prompt.

TCP/IP Suite of Programs:
*Terminal Emulation: telnet
*File Transfer: FTP
*Email: SMTP (single message)

Scientific publishing on the internet:
*RFC's (request for comment)
*IETF (Internet Engineering Task Force
*Los Alamos E-print archives

HTML, HTTP, MIME, URLs
*MIME specifies data type.

Conventions:
*web sites
*home page
*buttons
*hierarchical organization

web is giant step to build DLs on...not just detour until real thing comes along

Thursday, September 4, 2008

Suleman & Fox: A Framework for Building Open Digital Libraries

-- "Open" DLs build directly on concepts/philosophies of Open Archives Initiative.

-- Most existing systems classified as DLs resulted from custom-built software dev projects - each involves intensive design, implementation & testing cycles. why repeat effort? some software toolkits: Dienst, Repository-in-a-box etc

-- Most programming enviro's adapt a component model.

-- OAI launched october 1999. Focuse on high-level comm among systems & simplicity of protocal.

-- OAI protocol for Metadate Harvesting (OAI-PMH): system of interconnected components. OAI protocal can be thought of as glue that binds together components of a larger DL (or collaborative system of DLs)

** DLs modeled as networks of extended OAs, with each OA being a source of data and/or provider of services. This approach closely resembles the way physical libraries work.

-- research & production DLs differ

-- ODLs guided by a set of design principals & operationalized with aid of OAI-PMH extensions. proven techniques from internet development: simplicity of protocols, openness of standards, loose coupling of systems, purposeful orthogonality, reuse whenever possible.

-- Formal Principals:
1. All DL services should be encapsulated within components that are extensions of OAs
2. All access to DL services should be through their extended OAI interfaces
3. Semantics of OAI protocol should be extended or overloaded as allowed by OAI protocol, but without contradicting essential meaning.
4. All DL services should get access to other data sources using extended OAI protocol
5. Dls should be constructed as networks of extended OAs.

-- OAI Harvester obtains data stream which creates indices for searching.

-- Components in prototype systems:
1. Union: combine metadata from mult sources
2. Filter: reformat metadata from non-OAI data sources
3. Search: search-engine functionality
4. Browse: category-driven browsing functionality
5. Recent: sample of recently added items

-- Tested on NDLTD system w/ good feedback.

-- Designing by principles & implementing for real-world scenario.

ARMS ch. 2 - link broken?

I'm having trouble accessing the ARMS ch. 2 reading. The link brings up an error message on the site that the article cannot be found. Is anyone else having this problem?

(also posted to Bb discussion board)

Tuesday, September 2, 2008

SOPAC 2.0

found this article on a blog I read for my internship and thought it was interesting, relevant & exciting!

http://tametheweb.com/2008/09/02/on-sopac-change-and-mr-john-blyberg/

Monday, September 1, 2008

Paepcke et al: Dewey Meets Turing: Librarians, Computer Scientists, and the Digital Libraries Initiative

I appreciated that this reading was rather relaxed and approachable, especially after finishing the DELOS article which was quite technical. I have entered my written notes, hopefully this is not too informal.

-- 1994: NSF launches DLI
*3 interested parties: librarians, comp scientists, publishers
*spawned google & other developments
-- DLI has changed work & private activities for nearly everyone

-- For scientists: exciting new work informed by librarianship
*resolution of tension between novel research & valuable products
-- For librarians: increased funding opportunities, improved impact of services
*OPACs were all most lib's had. expertise was needed

-- Advent of web: changed plans & got in the way
*blurred distinction b/t consumers & producers
-- Problem for CS: bound by perproject agreements & compyright. Web removed restrictions.
* broadening of relevance
-- Web threatened pillar of lib'ship...left w/out connection to recognizable, traditional library fx's
* Both valued predicatbility & repeatability - web led to laissez faire attitude toward info retrieval (did not particularly upset public)

-- Librarians feel they haven't gotten adequate funding for coll. dev.
-- CS's couldn't understand vast amounts of time devoted to structures like metadata

-- Core fx of lib'ship remains despite new, technical infrastructure
*importance of collections re-emerging
-- Direct connection b/t lib's & scholarly authors
-- Broadened opportunities in lib sci as a result of DLI

Setting the Foundations of Digital Libraries: the DELOS Manifesto

This reading lacked the depth of explanation that the other readings have had, probably owing to its purpose as an overview to a larger work. Some of the terminology was a bit confusing to me, especially the distinction between a Digital Library System and a Digital Library Management System.

I don't have as many thoughts/reflections on this article, as it was fairly practical and informative without raising questions or topics that are up for debate. I have left below the notes that I cut/pasted from the reading, as well as a few of my own notes (designated with a "--"). I hope this is sufficient...

*****

"Generally accepted conceptions have shifted from a content-centric system that merely supports the organization and provision of access to particular collections of data and information, to a person-centric system that delivers innovative, evolving, and personalized services to users."

"expectations of the capabilities of Digital Libraries have evolved from handling mostly centrally located text to synthesizing distributed multimedia document collections, sensor data, mobile information, and pervasive computing services"

-- point out that definition and expectations of DL's change even as we try to define them.

"three types of relevant "systems" in this area: Digital Library, Digital Library System, and Digital Library Management System."

"Digital Library (DL)
A possibly virtual organization that comprehensively collects, manages, and preserves for the long term rich digital content, and offers to its user communities specialized functionality on that content, of measurable quality and according to codified policies.

"Digital Library System (DLS)
A software system that is based on a defined (possibly distributed) architecture and provides all functionality required by a particular Digital Library. Users interact with a Digital Library through the corresponding Digital Library System.

"Digital Library Management System (DLMS) A generic software system that provides the appropriate software infrastructure both (i) to produce and administer a Digital Library System incorporating the suite of functionality considered foundational for Digital Libraries and (ii) to integrate additional software offering more refined, specialized, or advanced functionality."

"Six core concepts provide a foundation for Digital Libraries. Five of them appear in the definition of Digital Library: Content, User, Functionality, Quality, and Policy; the sixth one emerges in the definition of Digital Library System: Architecture."

"We envisage actors interacting with Digital Library Systems playing four different and complementary roles: DL End-Users, DL Designers, DL System Administrators, and DL Application Developers. "

"Digital libraries need to obtain a corresponding Reference Model in order to consolidate the diversity of existing approaches into a cohesive and consistent whole, to offer a mechanism for enabling the comparison of different DLs, to provide a common basis for communication within the DL community, and to help focus further advancement"

"Reference Architecture is an architectural design pattern indicating an abstract solution to implementing the concepts and relationships identified in the Reference Model."

"Concrete Architecture - At this level, the Reference Architecture is actualised by replacing the mechanisms envisaged in the Reference Architecture with concrete standards and specifications."

LESK, ch. 1

Right from the start, I found this reading a bit more practical and informative than Borgman.

I found the extended definition on pp. 2-3 quite helpful, with the following four main points:
1. DL must have content
2. Content needs to be stored and retrieved.
3. Content must be made accessible.
4. Content must be delivered to user.
Followed by the introduction of the new costs and legal issues surrounding digital collections.

Section 1.2 seemed quite straightforward to me with some good points, especially that "For more than a decade, nearly every word printed and typed has been prepared on a computer. Paradoxically, until very recently most reading has been from paper." The focus on the interplay between technology, economics, and user-driven information usage felt like a good overview of the challenges facing digital libraries, and digital information sources generally.

I particularly enjoyed the in-depth (albeit lengthy) discussion in section 1.4 of changing prices and capacities for different info storage and retrieval technologies. The level of detail really drove home the challenges of storing and maintaining such vast quantities of information, even as our resources improve.

It feels like I have been hearing about Vannevar Bush in every single class I've had this week, so section 1.3 of the chapter was interesting more in the contrast depicted between Bush and Warren Weaver. Reading about the different emphases of their research reminded me of events in the history of Psychology, specifically the emphasis throughout much of the early twentieth century on behaviorist theories until the so-called "cognitive revolution" in the 60s. And in fact, the chapter later alluded to that same revolution on p.24 (section 1.5), with references to scientists such as Chomsky and Oettinger.

I have a BA in Lingustics (full disclosure, ha), so section 1.5 was highly interesting to me. However, I did feel that it understated the real challenges and shortcomings of attempts to capture the intricacies of natural human language with computers and other machines, perhaps for necessary reasons of length.

The background info on the history of the internet and the programs/interfaces that are commonly used today was quite informative and clear, in particular the discussion of Google's groundbreaking method of ranking search results to provide better information, and not just a lot of it. On a side note, I was talking to my mother on the phone last night and she told me that for the first time ever she successfully used google to find information that she was looking for (a substitution for self-rising flour in a recipe). That this is the first time she has managed to do so only now, after nearly six years of regular computer usage (yes, she was a late adopter) really drove this point home to me, of the very really challenges involved in making digital information an effective tool for retreiving high-quality information.

As stated in section 1.7, I felt that the two most important questions lingering over the development of DL's are:
1. What do people want to do with available technologies?
2. What content should be provided? And specifically, which content can be provided entirely digitally, and which types will never be as effective/adequate in a digital form?