Last and this week I have been preparing an introduction about the Map/Reduce algorithm I have found a lot of new excellent references (see my previous post) and I have read some books that I did not know. As a result I have made a compilation (it is just a summary that I will continue updating) that can serve as roadmap about what MapReduce is and what you can do with this programming model. As soon as I review my presentation and the examples I will upload them,
week
There are 8 posts tagged week (this is page 1 of 1).
Week #10 and #11 in Thessaloniki
Hi! I have been a little bit lazy about writing in the blog but it is now time of recovering good practices. I am going to summarize my tasks during the last two weeks,
Reading (pi = persona interest)
- MapReduce Design Patterns
- Hadoop MapReduce Cookbook
- Hadoop Real-World Solutions Cookbook
- Hadoop and Pig Overview (tutorial slides)
- Introduction to Pig (tutorial slides)
- SIGMETRICS Tutorial: MapReduce – Research at Google
- MapReduce — algorithms, MapReduce — distributed functions and MapReduce — applications (slides by Hugh C. Lauer)
- Google’s MapReduce Programming Model — Revisited∗
- Programming Hadoop
- …and others wrt Map/Reduce and Hadoop
- Service Quality Management (SQM)
- Quality in the Cloud
- Architectural Perspectives on QoS Management in Distributed Multimedia Systems
- Service Quality Perspectives and Customer Satisfaction in Commercial Banks Working in Jordan (not about the cloud)
- Service Quality (not about the cloud)
- Mapr design
Writing, reviewing and researching
- I am managing a Special Issue in the Journal of Computers in Industry, Elsevier.
- I finished the review of a book for Manning Publications.
- I have been included as Technical Development Editor in Manning Publications.
- I have been included as PC member in the workshop proposal “Data Mining on Linked Data (DMoLD’13) workshop with Linked Data Mining Challenge” thanks to my colleagues at the University of Economics in Prague.
- I am reviewing a paper for the journal “Expert Systems with Applications” (IF: 2.203)
- I am reviewing and finishing the paper with my colleague Alejandro Montes about his final Master Project.
Meetings
- I have had a meeting with my SEERC colleagues to talk about next actions.
- I have had a meeting with Michalis Vafoupolus to prepare the Linked Data Cup paper.
- I have had two meetings with Lum about his Bachelor Degree Project. It is a kind of supervising to address the problem of sentiment analysis using Rapidminer, Lingpipe, Alchemy API and a custom solution.
Coding and Tools
- I have made in my leisure time a tool for unifying company names called CORFU using Python, NLTK and the APIs of Google Places, Linkedin and Google Suggestions. It also includes other algorithms based on string similarity, etc.
- I have developed a simple sentiment analyzer using Alchemy API.
- I have adapted some examples of Map Reduce patterns
Other things
- I continue my fight to learn Greek…I have to study a bit more!
Week #9 in Thessaloniki
Just a few comments for this week (to be completed)….
Reading (pi = persona interest)
- Tutorial of Cascalog and other features
- Incremental Map/Reduce
- Large-scale Incremental Processing Using Distributed Transactions and Notications
- MapReduce Online
- Online Aggregation and Continuous Query support in MapReduce
- …
Writing, reviewing and researching
- I am finishing the book chapter about publishing statistical data in RDF
- I am managing a Special Issue in the Journal of Computers in Industry, Elsevier
- I just realized that Labra added me in the Acknowledgements part of his work about “Multilingual Open Data Patterns” I am very proud of that! (to be honest I just collaborated in the first presentation with some links and specially through some comments with regards to SKOS-XL). I also suggest to read the paper in which each of the patterns is explained and discussed with excellent examples.
Meetings
- I have had a meeting with my SEERC colleagues to present my prototype and plan next actions in QoS, etc.
- I have had a meeting with Michalis Vafoupolus to prepare the Linked Data Cup paper.
Coding and Tools
- I have implemented a real-time based architecture using the Lambda approach and following some hints from Pere Ferrera. It is not the same algorithm and I am just take the approach to tackle the problem not source code. Next steps include to use RDF as views for batch and real-time layers using SPARQL federated queries (for instance Fedex). The example just takes a Twitter stream using Tweet4J API and counts words presenting the results in a HTML page. Documentation is available here and also the the source code (under development).
- I have linked to the CPV the public procurement notices from UK, USA and AUS.
Other things
- This week is the Father’s day!
- I have also registered for the International Conference on Performance Engineering (ICPE) so I will be in Prague from 16-21 of April to attend the conference and the RELATE project plenary meeting.
Week #7 in Thessaloniki
Just a few comments for this week (to be completed)….
Reading (pi = persona interest)
- QoS-Aware Composition of Adaptive Service-Oriented Systems
- A Taxonomy of QoSAware, Adaptive EventDissemination Middleware
- SMICloud: A Framework for Comparing and Ranking Cloud Services
- WSExpress: A QoS-Aware Search Engine for Web Services
- An Efficient Qos-Based Ranking Model for Web Service Selection with Consideration of User’s Requirement
- I have followed this thread about real time processing, the post of Nathan Marz about the Cap theorem and the presentation about mixing Hadoop/MapR and Storm
- Ranking Algorithms for Named–Entity Extraction: Boosting and the Voted Perceptron
- DistanceRank: An intelligent ranking algorithm for web pages
- Re-Ranking Algorithms for Name Tagging
- High Quality Expertise Evidence for Expert Search (pi)
- Topic level expertise search over heterogeneous networks (pi)
- Knowledge on Demand: Knowledge and Expert Discovery (pi)
- Sharing Expertise: The Next Step for Knowledge Management (pi)
- Awareness of Organizational Expertise (pi)
- Facilitating the Online Search of Experts at NASA using Expert Seeker People-Finder (pi)
- Expert Finding Systems (pi)
- Graph-Based Ranking Algorithms for E-mail Expertise Analysis (pi)
- Telling Experts from Spammers (pi)
- A Survey of Ranking Algorithms
- Google Scholar‘s Ranking Algorithm: An Introductory Overview
- Apache Drill (slides)
- Cloudera Impala (slides)
- Storm Getting Started by O’Reilly (re-reading)
- Big Data and the Web: Algorithms for Data Intensive Scalable Computing (starting to read)
Writing, reviewing and researching
- I am reviewing a paper for a Special Isuee of a JCR journal
- I am finishing the book chapter about publishing statistical data in RDF
- I have also made the first review of WESOMENDER (we have to work hard to get a good contribution but the expectations are high)
- I have been invited to be part of the PC of the Special Session “Engineering Tool Integration for Industrial Automation System Development (ETAS 2013)” in conjuction with IECON2013
- I have joined in the research group “Comercio Electronico en Colombia – GICOECOL” thanks to Luz Andrea RODRIGUEZ ROJAS with whom I will collaborate to empower the use of Open Data in e-Health.
Meetings
- I have had a meeting with my SEERC colleagues to plan the next actions in QoS, etc.
- I have had a meeting with Michail Vafopoulos to collaborate in some e-Procurement actions
- I have attended to the first meeting (call) of the Spanish Open Data W3C Community Public Contracts Task Force
Coding and Tools
I have implemented a real time word counter of Twitter status using different tecniques:
- The classical Observer design pattern
- The Storm framework, I have reused some examples to implement my own spouts and bolts
- The Trident framework on the top of Storm, I have also reused some examples of the storm-starter project customizing the code to get a better understanding
Other things
This week I have started the 3-month Greek course and I am very happy because I can now understand some words and read a little bit 🙂 Besides my classmates are from a lot of countries: Bulgaria, Germany, Bosnia, France, Serbia, New Zealand, Italy, Moldova and Russia. It is a GREAT experience.
Week #5 in Thessaloniki
The last week I have been focused on two main tasks: my presentation at the City College and the submission of a paper. Following the same structure of last weeks I leave some links to the activities I am carrying out:
Reading
- I have read the book “Clean Code“
- I have also read some part of the book “The Well-Grounded Java Developer“
Writing and reviewing
- I have continued with the structure and firts contents of two papers and one special issue proposal.
- I have managed all the abstracts for the COMIND Special Issue.
- I have submitted a paper to “Computers and Human Behavior“
- I have made the presentation in the following bullet to the Deparment of Computer Science at City College
- I have reviewed my previous presentation about MOLDEAS and the new one is supposed to be more didactic such as an “Intro” to Linked Open Data
Coding and Tools
I have not made any relevant progress in developing tasks.
Week #3 in Thessaloniki
This week I will be updating this post because I am reading a lot of papers and I need a way to track them. Following the same structure of last weeks I leave some links to the activities I am carrying out:
Reading
I have focused on some interesting subjects Statistics (Bayesian networks), Data Streams, Feedback Control Loops, Autonomous Computing and e-Learning systems (this is just for personal interest). I have started, and finished, the next list of papers and books:
- Open-Loop Feedback Control of Nonlinear Stochastic Systems Based on Deterministic Dirac Mixture Densities
- Probabilistic framework for opportunistic spectrum management in cognitive ad hoc networks
- Probabilistic Framework for Sensor Management (Book)
- A Probabilistic Approach to Mixed Open-loop and Closed-loop Control, with Application to Extreme Autonomous Driving
- When cloud computing meets with Semantic Web: A new design for e-portfolio systems in the social media era
- Sharing innovative teaching experience in higher education on the Web. An interdisciplinary study on a contextualized Web 2.0 application for community building and teacher training
- Crash Course on Data Stream Algorithms (Slides)
- Data Streams: Models and Algorithsm
- Data Streams: Algorithms and Applications
- Data Streams Algorithms
- A Survey of Graph Algorithms in Extensions to the Streaming Model of Computation
- Big Data versus the Crowd: Looking for Relationships in All the Right Places
- Elementary: Large-scale Knowledge-base Construction via Machine Learning and Statistical Inference
- Hazy: Making it Easier to Build and Maintain Big-data Analytics
- Expert finding systems (some articles from the next Google Search)
- Expertise ranking algorithms (some articles from the next Google Search), these two last activities are just for own-interest.
- Bayesian Networks
- Bayesian networks [1, 2](in Spanish)
- Bayesian networks in R
- Machine Learning & Statistical Learning, R packages
- Causality: Models, Reasoning, and Inference
Writing and reviewing
- I have started with the structure and firts contents of two papers and one special issue proposal.
- I have also completed a workshop proposal for the 9th ACM International Conference in Cloud and Autonomic Computing.
- I have finished the review of a paper for the Journal Current Topics in Medicinal Chemistry (impact factor 4.174)
- I have finished the review of a paper for the International Conference on Marketing – Challenging Environment (ICOM 2013)
- I have also reviewed two papers of our Special Issue (just an initial review)
Coding and Tools
I have not made any relevant progress in developing tasks but I have being refreshing my know-how on R.
- Métodos Estadísticos con R y R Commander (Spanish guide)
- Estadística Básica con R y R–Commander (Spanish basic statistics method)
- R-intro (Spanish version)
- R for Beginners
- Test of Rome Java library (RSS and Atom Utilities for Java)
- Weka (Data mining Java library)
Teaching
I have finished the evaluation of alumni in Health Information Systems and I am very proud of the marks and the work carried out by student during the last months. I have some links of their works building mashups but I prefer do not leave here the links due to privacy issues.
Week #2 in Thessaloniki
Hi all,
This week has elapsed very fast and I have made a lot of things that I leave bellow:
Reading
My main concern in the research is how can I address the automatic computation of a lot of sensors (applications, cloud management platforms, etc.), i.e. how can I monitorize resources? and which the variables to be taken into account are. In this sense I have read some papers from my colleagues at SEERC and other authors:
- A survey of Autonomic Computing — degrees, models and applications
- An ontology-driven approach to self-management in cloud application platforms (Poster)
- An ontology-driven approach to self-management in cloud application platforms.
- Modelling Feedback Control Loops for Self-Adaptive Systems.
- MAPE-K adaptation control loop (presentation)
- Towards Quality of Service in the Cloud
- Hazy: Making it Easier to Build and Maintain Big-data Analytics (slides)
- …
- Review of one paper for the SI in e-Procurement
- …apart of research papers I have also started to read the book “Getting Things Done: The Art of Stress-Free Productivity“
Writing
I have made some progresses in the article about the experience publishing the “Webindex” as Linked Data and I have also planned the potential articles for this year and their contents.
Coding
Sometimes you feel very motivated to test new tools and frameworks and I am now in this phase:
- Storm Starter and Storm+Node.js
- Cleo and Cleo-primer
- Improvements in the MOLDEAS prototype (fixing bad URIs)
Teaching
I have finished the evaluation of Health Information Systems in Nursing and Physiotherapy course at the University of Oviedo. They have developed very good works applying Web 2.0 concepts for building mashups in the Health sector, I am very proud of all students.
Administrative Stuff
Here it is where I spent most of the time (and thanks to my colleague Fotis) but I finally got (almost) all the required documentation:
- My new apartment in Thessaloniki is located in Mitropoleos St.
- I have my Tax ID, Social Security ID and other “official” stuff.
- I have a Greek phone number.
- I have a new account in Alpha Bank.
- I have also sent my application to a three-month Modern Greek course at Aristotle University of Thessaloniki
Week #1 in Thessaloniki
As you may know I am starting a new stage in a new country and institution. Now I am a Marie Curie Experienced Researcher (Postdoc) working at SEERC in Thessaloniki, more specifically in the RELATE-ITN FP7 project. My research will be address some topics such as stream reasoning, cloud computing, big data, etc. to create a system for monitoring QoS in cloud computing environments and service oriented architectures. As far as I know the objective is to get information about applications on the cloud and verify that the current status of different variables are aligned to SLAs so it is necessary to continuosly gather data from applications, promote to an existing knowledge-base and check restrictions through reasoning processes for finally making decisions such as new provisioning, etc.
This first week I was adapting to my new office and I would like to thank you to all administrative staff and colleagues from SEERC for their warm welcome! I am very motivated. My work during this first week was focused on reading papers, testing tools and coding some prototypes. Following I am going to leave a summary of my activities:
Reading (specs, research papers and books)
- NIST Cloud Computing, documents related to Cloud Computing and QoS
- OASIS Cloud Computing Management for Platforms
- Drools Fusion (for Complex Event Processing)
- The SSN Ontology of the W3C Semantic Sensor Network Incubator Group
- Stream Reasoning papers and presentations
- Continuous Semantics to Analyze Real Time Data
- Semantic-Based Storage QoS Management Methodology — Case Study for Distributed Environments
- Drools Developers Cookbook, Grails and Groovy (starting reading)
- Big Data (first chapters)
- Lambda Architecture
- An example “lambda architecture” for real-time analysis of hashtags using Trident, Hadoop and Splout SQL
- I attended to a seminar on Open Tosca
- JBoss Drools and Drools Fusion (CEP): Making Business Rules react to RTE
- … (other I can not remember)
Writing
- I am finishing a technical report abou our work in the Webindex project
- I am starting to write some papers in different topics that I will announce as soon as they are finished
Development and testing
- I have made a simple Python prototype to partially transform the Webindex in DSPL. The result can be found here. It just is an small example of consuming Linked Data.
- I have integrated Git, Heroku and Travis for creating a quality development environment as Labra suggested and I have to say it really works fine!
- I have made my first prototype with Drools Fusion processing events
- I have made some tests with C-SPARQL
- I have made my first prototype with Grails
- I have downloaded, compiled and deployed the current version of Apache Stanbol.
- I have started to use Cosm (an IoT platform)
- I have deployed the CommonCrawl prototype in Amazon EC2 Elastic Map/Reduce
- Some interesting presentations at
- Runaway complexity in Big Data… and a plan to stop it
- James Ward presentations and blog posts
- Play vs Grails Smackdown UberConf2012
- I have created accounts to monitorize cloud computing applications in the next platforms (of course you need, in most of cases, an Amazon EC2 account):
- I have also created my profile at W3C Community and joined the next groups