Friday, August 20, 2010

Interesting articles on Virtualization and Cloud Computing

In this article, I shall point you to some really interesting articles on Virtualization & Cloud Computing I came across during the literature survey for my research. These articles are indeed very stimulating and have been useful in my research in some way or the other. The texts mentioned here are categorized under different areas like Resources for beginners, Physical-to-Virtual (P2V) conversions, Server consolidation, Economics of cloud computing.

Stuff for beginners

P2V conversions/migrations

Economics of Cloud Computing

Saturday, July 3, 2010

Virtualization Lifecycle in the Context of Cloud Computing

Virtualization is a key technology for enabling cloud computing. As we have read before, virtualizing applications and consolidating hardware reduces IT Infrastructure costs (which includes purchasing hardware and maintaining it), allowing easier management of resources and on-demand provisioning of resources in the data center. The Infrastructure as a Service (IaaS) model of the Cloud deals with provisioning of hardware, storage and networking resources. Virtualizing applications via the use of Virtual machines holds great significance in this model to deliver and manage IT services. The Virtualization lifecycle comprises a set of technical assessment activities which are governed by business and operational decisions. Technical assessment for virtualizing candidates revolves around meeting end-user Service Level Agreements (SLA's), reducing IT costs, and designing an optimized data center. Every phase in the virtualization lifecycle for cloud computing is highly challenging with a wide variety of complex open problems which are currently being tackled.

Analysis & Discovery : For the process of moving from Physical environments to Virtualized environments (P2V), solid analysis of the virtualization candidates must be performed. This stage involves discovering the data center entities (servers, networks, storage devices), collecting utilization profile data of those entities along the different dimensions (CPU, memory, network i/o, disk i/o). The main theme of P2V is to move applications from an under-utilized bare metal environment to a virtualized / hypervisor environment to enable optimum utilization of hardware. In addition to discovering the heavy artillery in the data centers, it is important to assess the applications deployed on them. The OS characteristics (scheduling policies, caching strategies etc), application characteristics and configurations (Tomcat no. of starting threads etc). Once the performance and application characteristics are assessed, capacity management models for the need to be developed to host those applications in the virtual environments. For more information on the technical assessment for virtualization, see 'Conduct a Technical Assessment for Server Virtualization'.

Implementing Models : Developing capacity models for a virtual environment is a tricky task since it is governed by other business and operational factors. Target SLA's (performance, availability ), power consumption levels are to be kept in mind along with the possible impacts of virtualization (hardware normalization, hypervisor overheads). The idea is to come up with a 'pre-VM placement' strategy which describes the 'footprints' of VM's.

VM Placement & Management : Allocating Virtual Machines to Physical Machines dynamically and optimally would be the goal of IT Enterprises. VM's can be scaled out/up on demand. Provisioning of VM's goes hand-in-hand with capacity management monitoring to track the desired Service Levels of applications. Server consolidation is an important and inherent part of this phase, as optimal placement of VM's across physical architectures is the key to meet the business goals. Re-shaping and re-sizing the footprints of VM's dynamically in real time is also a hot research topic. Management of VM's also involve migration to other physical hosts, monitoring performance which can be done centrally. Many open issues also exists in VM Migration, such as synchronization problems and security issues.

Thus, the virtualization life cycle poses many challenges and many groups in the Industry and Academia are grappling to solve these issues. If executed well, it has a promising future in the context of cloud computing. For more resources on Virtualization and Cloud Computing, see 'Virtualization / Cloud Computing Blogs, Websites, Resources, Articles'.


Thursday, April 29, 2010

Resources for Starting with Ruby on Rails 3

Ruby and Rails (that together form the well-known Ruby on Rails -RoR- combination) have become a de-facto standard in the industry in order to develop small and not-so small Web applications rapidly (RAD).

The Rails framework has contributed directly to the increasing popularity of the Ruby programming language. Rails is based on the MVC design pattern and allows the developers to implement efficiently Web applications incorporating many of the recommendations and best practices of agilists. Rails 3 is the result of the fusion of Merb -another famous MVC framework for Ruby- with the previous Rails version.

In this post I include some links to resources related to Rails 3, the forthcoming version of Rails currently in Beta:
  • This post includes a compendium of links to blog posts, tutorials, presentations and conference talks on Rails 3.
  • This tutorial allows the reader to learn how to use the GIT version control system and deploy Rails applications on the Cloud using Heroku at the same time you learn Rails 3.
  • This, this and this resources will teach you how to implement and manage associations between the models of Rails applications.
  • This is a very useful site for learning Rails through screencasts, including lots of examples, tricks and useful libraries/add-ons/pluggins (e.g. Formtastic, RSpec, etc.).
  • Finally, this book on the Pragmatic Programmer series (currently in Beta) will be one of the main references for exploiting all the new features of Rails 3.

Hope you find these resources useful for putting hands on Rails 3 :-)

Thursday, April 22, 2010

Two Upcoming Books on Programming Languages

Recently, I've paid attention to the following books on programming languages that are going to appear in the following months:
The first one is from Bruce A. Tate -a well known author in Java and Ruby communities- and introduces the reader inside the most important features of 7 different programming languages that are nowadays relevant or will be relevant in the next few years: Ruby, Io, Prolog, Scala, Erlang, Clojure and Haskell. The second one is being written by the omnipresent Martin Fowler and talks about DSLs, that are gaining a lot of attention in the development communities.

Tuesday, April 20, 2010

Useful Commands for PostgreSQL DBMS

The following commands are very useful to get some meta-information about the data repository in PostgreSQL. In the psql command line put...

# select datname from pg_database;

to get a list of databases in Postgres in the current repository (e.g. the one pointed by $PGDATA env. variable), ...

#\dt

to get a list of the tables of the current database, and...

#select pg_size_pretty(pg_database_size('DBNAME'));

in order to get the size of the database in user readable format.

Wednesday, April 7, 2010

Technical Bookshops on Computer Science

Today I want to write a post about technical bookshops (both physical and online) and book sites on the web.

Despite everyone knows about Amazon as a very good bookshop and source of information about computer science books, there is another online bookshop that looks very interesting:
With regard online free technical books, FreeTechBooks.com is a very good option.

Moreover, I like to touch the physical books (I like their format and appearance), so from time to time I use to visit (some of the few) physical libraries in Madrid. These are the ones I like to go:
  • Cocodrilo Libros (Madrid): This is my favorite bookshop. They have a good and extensive catalogue of technical books in english in their bookshop and, the most important thing, they offer you a very good personal treatment.
  • Librería Diaz de Santos (Madrid): They also have a good catalogue of english books, but it is not close to the city center.
  • Casa del Libro (Madrid): They used to have books in english but since three years ago, they just have books in spanish.
  • FNAC (Madrid): Just books in spanish, mainly for beginners.
Can you recomend any other library/resource you known in your town or in the web?

Tuesday, March 23, 2010

To Err is Human

To err is inherent to human beings. Even the human being may be considered an error in itself.  Unfortunately, in these days nobody wants to recognize his/her errors. We just have to take a look at politicians making wrong decisions in government, managers loosing money due to bad strategical policies, referees in sports, our relatives and friends in their personal decissions, etc. And of course, ourselves in our daily context. The reason is that errors are usually perceived by a majority of people as a weak point of the person that caused the error, and... everybody wants to be/appear flawless.

But, IMHO, that attitude is also an error, because in every role we play in our lives (as managers, students, engineers, doctors, programmers, architects, etc.) we are supposed to err sometimes. No exceptions. So, to me it is very important to recognize the errors we make. So, in this post I'm going to post some references to resources that are related to errors in software.

First of all, it is good to know what are the most frequent errors we can make when developing software. This web page describes a catalogue of software weaknesses in source code and operational systems. It also offers the current list of the 25 most common errors in software that lead to most part of vulnerabilities in programs.

Once we are aware about what are the errors we can make, the next step is try to avoid them when developing. Test Driven Development (TDD) is a well-known technique to follow when reaching the development phase in software construction. It is encouraged by agilists, but of course it can be applied to any software development process. Basically, it consist on repeating the following development cycle:
  1. Write the necessary test cases that define a new functionality that is going to be added to the software in development;
  2. Implement the required functionality;
  3. Pass the test cases developed in the first step;
  4. Refactor the code to make it adequate with regard to code quality standards.

As we have seen in the fourth step of TDD, when trying to avoid errors in the development phase, it is a requirement to program in a professional way. Books such as Code Complete, The Pragmatic Programmer or Clean Code can teach you about how to program properly to produce quality code.

OK, after a lot of efforts, our program code compiles well and all the test cases are passed. So, it is ready to be deployed in (pre)production environments... However, after some use of the program, several anomalies use to arise in the form of unknown/unpredictable behaviors. Of course, these are our well-known "bugs". This book, guides the practitioner in the art of debugging code introducing techniques and tools used in the academia and the industry.

Finally, even when we think that we have smashed all the bugs in our programs, we have to deal with other kind of errors (e.g. in the form of hardware failures, such as power outages, hard disk failures etc.) That's the reason why we design fault-tolerant systems. However, I'll talk about them in other posts. In the meantime, I'll try to not make so many errors... But, you know... that's life!

P.S. How many errors are in this post?

Friday, March 19, 2010

"We don't know how to program anymore"

Some weeks ago a co-worker was telling us that we (including himself) don't know how to program anymore. We have got used to having plenty of memory, hard disk space... we don't think about the resources anymore. He was saying how in the old good times, people like Brian Kernighan, Ken Thompson, Dennis Ritchie really had to care about these issues. This made them think of efficient algorithms to handle the data to be processed.

I think my colleague was right. The hardware gets faster and faster but the programs eat more and more resources, giving the users always the same feeling: frustration. Imagine you want to edit a huge text file with your favorite text editor, it is very likely that it will become unresponsive.

We don't normally develop having performance in mind, we might not think what if the data to be processed gets 1000 times bigger. We will think that our program works pretty smooth until the data to be processed exceeds what we had initially thought, our program will sweat and we will have a frustrated user.

We have to think of scalability and of resource usage. These things are pretty difficult nowadays as most of the times we are developing on top of many software layers, giving us the feeling we are working on an abstract machine with unlimited resources.

Hardware is more powerful than yesterday but, the volume of data to be processed also increases, so it is always a good thing to think twice which data structures and algorithms to use.

P.S. [by Francisco Perez-Sorrosal] I've just found a recent blog entry that is somehow related to this one.

Monday, March 15, 2010

How To Apply for Research Internships

The following are some key steps while applying for Research Internships, answering the pattern formed by the what, why, how and where questions.

What is a Research Internship (RI)?
A RI is an opportunity to get your hands dirty by being involved in academic/industrial research projects. Typically a RI can last from a minimum of 3 months to one year long periods. It is a chance for students to investigate critical research problems arising in academia and the industry (Mechanical Eng., Computer Science, etc.) A RI in Computer Science offers a variety of things, which depends on the level of experience and expertise in a particular area. Literature survey, research paper reviews, programming, analysis and design of software architecture & algorithms, writing a research paper are the few things a candidate would end up doing during a RI. A research intern would typically work in a team or individually depending on the research assignment. In case of industrial research the intern may collaborate with other researchers, as opposed to academic research in which he may just work with the advisor.

Why do a RI?
A RI gives you a chance to get involved in research activities early in the academic career. It introduces you to the world of research and broadens the horizon in terms of understanding research methodologies, being updated with the latest techniques and trends in science and by exposing students to a new world other than a stereotypical bookish world. Doing research during undergrad or immediately after finishing the UG program lays a solid foundation for getting a Master's/PhD degree. Prior research experience is always beneficial before starting a Master's or a PhD programme since it helps the student in exploring a particular area and identifying his/her research interests along the way. It's a wonderful opportunity to work with industrial research labs, and work closely with advisors (professors) in Universities. Working with eminent professors helps in defining a certain way of thinking and attacking problems. Research work creates a whole new perspective and allows in-depth study in a particular area. Getting SOLID recommendations highlighting research experience from researchers is a boon while applying for advanced degrees.

How to apply for one?
Such applications are usually done via e-mail. Applying for RI's can get tricky at times. Tricky in the sense, it is very important to be precise while applying. While applying keep the body of the email short and simple, highlighting your research interests, any prior work / assignments done in that particular area, your long term plans (next 2-3 years), a list of technical reports. An effective strategy while applying for internships is to ensure bulk e-mailing to Professors in different Universities at the same time applying very carefully (A particular email which is meant to be addressed to Prof.ABC may be sent to Prof.XYZ, haha). Getting admitted for internships can be a long shot in the dark, but if you get it right, it is one of the best thing that can happen to your career. Especially foreign internships are fascinating in a way that it allows you to study (work) abroad and meet with people from different cultures and collaborate with smart academicians. It is very important to follow up with Professors at the time of application. If they do not reply to your email (that happens most of the times, do not get disheartened though), feel free to call them on their office telephone after a week and enquire about your application. Shortlisting professors is one of the most important thing while applying. You have to take a look at their web pages which can be found on the University listings. Study their webpages really well since many professors have their own "rules" defined for prospective students. Keep in mind that your research interests match with the professors interests and apply accordingly. Also make use of your current undergrad professor's contacts to find a suitable place for doing an internship.

Where to apply ?
Since I am a Computer Science professional, I would list a few places which look at hiring Computer Science students. Some places to look at in Europe and India include:
- IITs (Bombay, Delhi, Madras, KGP, etc.)
- IISc Bangalore
- EPFL Switzerland
- ETH Zurich
- TU Dresden
- Karlsruhe University
- RWTH Aachen
- TU Darmstadt
- DERI Ireland
- INRIA France
- TU Madrid
- TU Delft / Eindhoven

I hope this information has been useful in some way for your application. Feel free to ask any questions. Thank you.

Wednesday, March 10, 2010

Two Interesting Events in Madrid for the Next Few Days

Next saturday is the day selected to celebrate the RetroMadrid 2010 exhibition at the Facultad de Informática of the UCM in Madrid. This time it is for real guys ;-) It is time again to revive the old times of the Spectrum, Amstrad, MSX and Commodore...

...and next week, the 18th, there is the first Hackaton event ever held in Madrid. This one is related to Google Chrome. This event will be also hold at the Facultad de Informática of UCM. In order to go there, early registration is required.

Enjoy!

Friday, February 26, 2010

More on NoSQL Storage

In the last few days I've been collecting some interesting links about NoSQL databases in some blogs:
  • This one and this other, about the CAP's theorem. The second one also includes a reference to the origins of punk rock :-)
  • This entry discusses the NoSQL storage ecosystem and some features, and...
  • ... this other goes deep into the main features of NoSQL storage.
  • Finally this other document also introduces the main features and systems in the NoSQL ecosystem.

Thursday, February 18, 2010

A New Book on Design

Some posts ago, I wrote about how design simplicity was applied to a concrete part of a software project and I mentioned a book I was reading called Subject to Change. Today, exploring the Pearson's higher education web page, I've found that Fred Brooks is preparing a forthcoming book about design. I'm glad to know that, as this blog pretends to do, other people find interest in how software can take profit from other disciplines such as aesthetics and design.

Wednesday, February 10, 2010

Software Architecture

I'm happy of having found this webpage on software architectures. I share with his author (Simon Brown) some of the same interest about software architectures. I seize the opportunity to recover some old thoughts.

I've been searching in my HD for a document I wrote some time ago about software architectures. In that document, I collected several definitions about what a software architecture is. Here they are some formal definitions from well-known members of the software engineering community:
  • D. E. Perry and A. L. Wolf in "Foundations for the Study of Software Architecture", ACM SIGSOFT Software Engineering Notes, 17:4, 1992 "... software architecture is a set of architectural (or, if you will, design) elements that have a particular form. We distinguish three different classes of architectural element: processing elements; data elements; and connecting elements.”
  • D. Garlan and D. E. Perry in editorial of IEEE Transactions on Software Engineering, April 1995: "the structure of the components of a program/system, their interrelationships, and principles and guidelines governing their design and evolution over time.”
    Garlan and Perry, guest editorial to the IEEE Transactions on Software Engineering, April 1995
  • D. Garlan and M. Shaw in Software Architecture: Perspectives on an Emerging Discipline, Prentice Hall, 1996: "the description of the elements that comprise as system, the interactions and patterns of these elements, ther principles that guide their composition, and the constraints on these elements".
  • L. Bass, P. Clemens, R. Kazman in Software Architecture in Practice, Addison-Wesley, 1997: "is the structure or structures of the system, which comprise software components, the externally visible properties of those components, and the relationships among them".
  •  IEEE Std 1471 (Maier,  Emery,  Hilliar): "The fundamental organization of a system embodied in its components, their relationships to each other, and to the environment, and the principles guiding its design and evolution".
Other prolific authors such as Martin Fowler do not want to enter into the definition of software architectures.

Of course, I finished my document giving my own definition of a software architecture (why not? :-):

"a software architecture is some kind of document which describes a high level view of the evolution of the structure and behaviour of a particular software system or application in terms of software components and their relationships".

By reviewing all these definitions I've realized that no one includes a reference to the fulfilment of the requirements that must drive the design process. So, I would like to change my definition :-) into this new one:

"the software architecture of a system or application comprises the necessary means (e.g. documents, figures, diagrams, graphs etc.) that, taking into account the collected requirements (functional and not funtional), best describe its design process and the evolution of its structure and behaviour (in terms of components and their relationships) with regard to the fulfilment of the expected requirements".
 
To conclude this post just to mention some sources of information on software architecture. First three (more or less recent) books in which I've paid attention:
Of course it is also worth to mention here the work of Grady Booch on software architectures. His web page is an infinite source of reliable information.

Finally, I want to leave an open question here: Are system architecture, enterprise architecture and technical architecture related to the concept "software architecture"? That is, are/require they minor refinements of the concept, or they deserve particular attention and radically different definitions?

NOTE: I've just found an entry in Grady Booch's blog about something related to this open question (October 29th, 2009) discussing about enterprise and technical arthitectures: "Although the two share the noun "architecture" they are different things. EA attends to the architecture of a business that uses technology; TA attends to the architecture of the software-intensive systems that support the business". So, it seems that Booch understands technical architecture as synonym of software architecture, separating the concept from the enterprise architecture concept. See also my blog entry on Application Architecture and Enterprise Architecture

Friday, January 29, 2010

Non-Relational Storage/Databases and Consistency

With the explosion of cloud computing, non-relational storage/databases (e.g. Google's BigTable) have gained attention in order to scale systems that serve thousands of clients/custormers (I've found this web page that collects projects related to non-relational databases). However, most of the frameworks/PaaS for developing applications using this new logical storage force the developer to use a shared-nothing approach when developing applications (e.g. Google App. Engine and Mircrosoft's Azure). This means that no state can be stored in the application between two invocations from the same client.

Part of the work done on my PhD thesis was related to consistently scale stateful applications running in clusters of servers based on multi-tier architectures (in a LAN). We use transparent replication -in order to hide the clients from the complexity of the replicated architecture- and guarantee snapshot isolation for the data being accessed (stored in relational DBs). In his blog, Werner Vogels (Amazon's CTO), goes one step further and discusses about consistency in the context of stateful applications and the requirements of cloud computing. The Vogel's view on this topic has been applied in projects at Amazon such as Dynamo. Whilst our approach ensures strong consistency, Amazon's approach is eventually consistent.

Wednesday, January 27, 2010

Software and Related Disciplines

I started this blog among other things in order to try to connect practices and processes used in other disciplines to software design and viceversa, i.e., how software can impact and help other knowledge areas. Recently, I've found two examples that are related to this intention:
  1. In this article is discussed new ways of managing projects in civil architecture and their relationship to agile techniques used in the software process.
  2. This book is related to the use of data-intensive computing for scientific discovery in the fields of Earth and Environment, Health and Wellbeing, Scientific Infrastructure and Scholarly Communication.

Friday, January 22, 2010

Application Architecture and Enterprise Architecture

In this essay, Scott W. Ambler compares effectively these two concepts that include the term architecture and the roles of application and enterprise architects.

In his book Patterns of Enterprise Application Architecture, Martin Fowler compiles a set of patterns useful for application architects in charge of developing enterprise applications. This book does not include patterns suitable for enterprise architects. Whilst the work of application architects can be considered mostly technical (despite they have to have other skills on other knowledge areas and virtues related for example to management of people), the work of enterprise architects is more close to the political/diplomatical branch of IT. This book can be a good source of information for enterprise architects.

Friday, January 15, 2010

How to do Research: Reading, Writing and Evaluating Technical Papers

I collect here two texts I found some time ago that are helpful when doing research on software systems. They are related to reading/writing technical papers from/for software conferences and evaluating papers when involved in a program commitee respectively. The texts have been writen by well-known and respected people in the area of software systems.
  1. Umesh Bellur: "How to Read a Research Paper", Date unknown
  2. Patrick Valduriez: "Some Hints to Improve Writing Technical Papers", 1994
  3. Timothy Roscoe: "Writing Reviews for Systems Conferences", 2007
Searching in the web, I've also found this page at INRIA collecting some other papers/presentations with helpful hints on how to do research.

08/02/2009 Another presentation about reading effectively technical papers written by Vaide Narvaez.