Feeding News


RSS

From: http://en.wikipedia.org/wiki/RSS
RSS (most commonly expanded as Really Simple Syndication) is a family of web feed formats used to publish frequently updated works—such as blog entries, news headlines, audio, and video—in a standardized format.[2] An RSS document (which is called a "feed", "web feed",[3] or "channel") includes full or summarized text, plus metadata such as publishing dates and authorship. Web feeds benefit publishers by letting them syndicate content automatically. They benefit readers who want to subscribe to timely updates from favored websites or to aggregate feeds from many sites into one place. RSS feeds can be read using software called an "RSS reader", "feed reader", or "aggregator", which can be web-based, desktop-based, or mobile-device-based. A standardized XML file format allows the information to be published once and viewed by many different programs. The user subscribes to a feed by entering into the reader the feed's URI or by clicking an RSS icon in a web browser that initiates the subscription process. The RSS reader checks the user's subscribed feeds regularly for new work, downloads any updates that it finds, and provides a user interface to monitor and read the feeds.
RSS formats are specified using XML, a generic specification for the creation of data formats. Although RSS formats have evolved from as early as March 1999,[4] it was between 2005 and 2006 when RSS gained widespread use, and the ("
Feed-icon.svg
Feed-icon.svg
") icon was decided upon by several major Web browsers.[5]

Flow Mapping

Tobler's Flow Mapper

Flow Mapper is a software tool designed by Waldo Tobler
http://www.csiss.org/clearinghouse/FlowMapper/
The tool uses a list of coordinates and an interaction matrix to generate a flow pattern.
For this project, we evaluated a Python version of the tool called Flowpy:
http://enj.com/
With the correct Python GDAL bindings in place, a network graph can be shown in ArcGIS and exported, see the Graphs and Maps section for an example of immigration flow mapping.

Mapping Platforms


Ushahidi

http://www.ushahidi.com/
http://github.com/ushahidi/Ushahidi_Web#readme
System Requirements:
- A server with Unicode support.
- MySQL version 5.0 or greater.
- PHP version 5.2.3 or greater.
- An HTTP Server. Kohana which Ushahidi is built upon is known to work with the following web servers: Apache 1.3+,
Apache2.0+, lighttpd, and MS IIS.

Google

Open Street Maps


Data Mining


Swift River

http://swift.ushahidi.com/
http://github.com/ushahidi/Swiftriver#readme

SwiftRiver is a free and open source software platform that uses a combination of algorithms and crowdsourced interaction to filter news and determine determining the veracity and accuracy of news related to an event.. It is an open source initiative supported by many contributing people and organizations including Meedan, Appfrica, GeoCommons and Ushahidi.

Google Books

Google Books mines the text of a book for placenames and shows the locations, with links to the respective passages, on a map, in this case the passage mentions Accra. This allows users to intuitively access data relevant to the location they are researching.
Screen_shot_2010-04-23_at_1.12.10_AM.png
Figure 1: Screenshot from Google Books page of Bonnie Campbell's Mining in Africa: Regulation and Development"


Media Data Mining

Reuters - Professional products - Reuters NewsScope Archive. Available from: http://www2.reuters.com/productinfo/newsscopearchive/.
Dow Jones Insight. Available from: http://www.dowjones.com/product-djinsight.asp.

Allan, J., V. Lavrenko, and R. Swan, Explorations within topic tracking and detection, in Topic detection and tracking: event-based information organization. 2002, Kluwer Academic Publishers. p. 197-224
In finance, news-based programmatic trading strategies that employ media coverage topic detection and tracking from high volume news feeds form a competitive advantage to investors and traders. Reuters NewsScope Archive[9] offers a machine-readable archive of Reuters global news with each release of information timestamped and tagged with an array of metadata fields for easy machine consumption. The NewsScope “Sentiment Engine” even measures positive, negative or neutral coverage of a particular story and tracks sentiment on a company over time, comparing it to portfolio, sector, and market scores. Users can filter and flag articles according to their specific research and trading interest. The Dow Jones Insight service offers media monitoring and reputation management across blogs, traditional media sources. The platform allows users to analyze media along with key performance indicators, tag news for messages, analyze the slant and opinion of coverage and send out news letters.
Data mining can be used to detect textual similarities and track the flow of information from the first appearance of a particular news story and links to later articles . Various techniques include measuring the frequency of occurrence of a particular “bag of words,” plagiarism detection through the “fingerprinting” of meaningful chunks of text and the reuse of phrases. Use of this type of technology combined with the knowledge of the information flow network would allow for the detection of the flow of information and the links between articles as they appear. A possible scenario is a situation where a researcher is interested both in finding out how the story of the cyanide leak progressed from when it first appeared, and who was exposed to the information through the network. It could be found that, for instance, reports from international NGOs incorporate materials from reports from regional NGOs, which on their part incorporate materials from local journalists.


Open Calais


Calais_01.gif
Figure 2: Open Calais Process Diagram

http://www.opencalais.com/about
"The OpenCalais Web Service automatically creates rich semantic metadata for the content you submit – in well under a second. Using natural language processing (NLP), machine learning and other methods, Calais analyzes your document and finds the entities within it. But, Calais goes well beyond classic entity identification and returns the facts and events hidden within your text as well.
The tags are delivered to you; you can then incorporate them into other applications - for search, news aggregation, blogs, catalogs, you name it.
If you're not familiar with OpenCalais, the Document Viewer is a way to get a quick peek at OpenCalais output. Paste a chunk of text into the Document Viewer window, submit it, and see the resulting OpenCalais tags. Note that this is not the OpenCalais web service; it is simply a demonstration of what OpenCalais can do.
You can also visit our Showcase for examples of how developers have implemented OpenCalais in a variety of ways."
Con: Possible Problems with Mining locations from articles such as the Tom Burgis 2010 "Mining fails to produce golden era for Ghana" article, contain information specific to Newmont and Ghana, and do have the location of the reporter in Accra, but not the location name of the village discussed in the article — in this case the village of the location of the Cyanide spills.

Linked GeoData

Geonames

http://www.geonames.org/about.html
The GeoNames geographical database is available for download free of charge under a creative commons attribution license. It contains over eight million geographical names and consists of 7 million unique features whereof 2.6 million populated places and 2.8 million alternate names. All features are categorized into one out of nine feature classes and further subcategorized into one out of 645 feature codes. (more statistics ...).
The data is accessible free of charge through a number of webservices and a daily database export. GeoNames is already serving up to over 11 million web service requests per day. GeoNames is integrating geographical data such as names of places in various languages, elevation, population and others from various sources. All lat/long coordinates are in WGS84 (World Geodetic System 1984). Users may manually edit, correct and add new names using a user friendly wiki interface. GeoNames has Ambassadors in many countries who assist with their help and expertise.

It should be noted that Ghana has a Geonames density of just over 0.5 place names per person, in contrast to 2.6 for Denmark and 6.2 for the US. These numbers are dependent on population density, but the low number for Ghana does give some idea. The project would benefit from addition of location names relevant to Newmont Mining's locations.

Publication Metadata

Citing News - Collecting References

Once researchers have identified relevant articles for their research, they need to gather reference data so they can cite the articles. Manual data entry is both time consuming and error-prone and should be avoided to optimize research time. In addition to recording reference metadata, it is beneficial to be able to directly access the content of the publication through URL links. In an idealized research workflow, one could collect references on the fly and later automatically import the articles needed for analysis.

Citation data can be exported from research databases like Lexis Nexis, ProQuest or Factiva, and imported into reference management software such as Endnote and Zotero. It should be noted that, although Google Scholar offers an option to import references into reference management software, the correct information is sometimes not properly contained in the exported data. For instance, while Endnote separates the publication year from the publication date, an "import to endnote" record from most articles on Factiva will have one field for both year and date. Upon import in Endnote the date information is lost.

Reference Management is an active field of development, here are some innovative solutions:

COinS - Z3988 Spans

http://ocoins.info/

COinS (ContextObjects in Spans) is a simple, ad hoc community specification for publishing OpenURL references in HTML
COinS specify citation metadata for articles, such as the author name, the name of the publisher, the publication date and geographic location.

Cite-U-Like

http://www.citeulike.org
One increasingly popular way for people to keep track of the information they find online is through social bookmarking.
Several social bookmarking services for citation bookmarking have emerged, one of these is Cite-U-Like

Screen_shot_2010-04-25_at_10.17.25_PM.png
Figure 3: CiteULike AddThis button on http://mineweb.com/mineweb/view/mineweb/en/page72068?oid=96302&sn=Detail


Cite-U-Like uses the Content Object in Spans(COinS) standard, it works quite well when tagging scholarly publications, here's an RSS feed of a Cite-U-Like library:
CiteULike: Klaartje's library
However, news articles that may offer the Cite-U-Like button are not yet providing COinS with articles, Cite-U-Like can't process the reference metadata, this results in an error:
Screen_shot_2010-04-25_at_10.14.33_PM.png
Figure 4: Screenshot of Error from CiteULike engine, sponsored by Springer

In order to make Cite-U-Like effective, publishers would have to agree on the metadata standards they use. Although native metadata formatting might be different for each publication provider, a simple adoption of COinS embedded in HTML would go a long way.

Other

Ajax

From: http://en.wikipedia.org/wiki/Ajax_%28programming%29
With Ajax, web applications can retrieve data from the server asynchronously in the background without interfering with the display and behavior of the existing page. The use of Ajax techniques has led to an increase in interactive or dynamic interfaces on web pages. Data are usually retrieved using the XMLHttpRequest object. Despite the name, the use of XML is not actually required, nor do the requests need to be asynchronous.[2]
Like DHTML and LAMP, Ajax is not a technology in itself, but a group of technologies. Ajax uses a combination of HTML and CSS to mark up and style information. The DOM is accessed with JavaScript to dynamically display, and to allow the user to interact with the information presented. JavaScript and the XMLHttpRequest object provide a method for exchanging data asynchronously between browser and server to avoid full page reloads.

XML
PDF-x