Grid Gurus

Syndicate content
Updated: 1 hour 13 min ago

The Private Clouds

Mon, 12/08/2008 - 07:24
Last month I was invited to give a couple of talks about Cloud computing in the wonderful C3RS (Cisco Cloud Computing Research Symposium). The slides are available online, if you want to check. Although the audiences were quite heterogeneous, there... Ignacio Martin Llorente
Categories: Grid and Cluster

The Private Clouds

Tue, 12/02/2008 - 10:43

Last month I was invited to give a couple of talks about Cloud computing in the wonderful C3RS (Cisco Cloud Computing Research Symposium).  The slides are available online, if you want to check. Although the audiences were quite heterogeneous, there is a recurrent question among the participants of these events: How can I set my private cloud?. Let me briefly summarize the motivation of the people asking this:

  • Lease compute capacity from the local infrastructure. These people acknowledge the benefits of virtualizing their own infrastructure as a whole. However, they are not interested, in selling this capacity over the internet, or at least is not a priority for them. This is, they do not want to become a EC2 competitor, so they do not need to expose to the world a cloud interface.
  • Capacity in the cloud. They do not want to be the new EC2 but they want to use EC2. The ability of moving some services, or part of the capacity of a service, to an external provider is very attractive to them.
  • Open Source. Current cloud solutions are proprietary and closed, they need an open source solution to play with. Also, they are using some virtualization technologies that would like to see integrated in the final solution.

I said to these people, take a look to OpenNebula. OpenNebula is a distributed virtual machine manager that allows you to virtualize your infrastructure. It also features an integral management of your virtual services, including networking and image management. Additionally, it is shipped with EC2 plug-ins that allow you to simultaneously deploy virtual machines in your local infrastructure and in Amazon EC2.

OpenNebula is modular-by-design to allow its integration with any other tool, like the Haziea lease manager, or Nimbus that gives you a EC2 compatible interface in case you need one. It is a healthy open source software being improved in several projects like RESERVOIR, and it has a growing community.

Go here if you want to set up your private cloud!

Ruben S. Montero

Reprinted from blog.dsa-research.org 

Categories: Grid and Cluster

Managing Resource Quotas in Grid Engine

Tue, 12/02/2008 - 10:37
It is often the case that cluster administrators must impose limits on using certain resources. Good example here would be preventing a particular user (or a set of users), from utilizing entire queue (or cluster) at any point. If you’ve ever tried doing something like that for Grid Engine (SGE), then you know that it is not immediately obvious how to impose limits on resource usage. SGE has a concept of “resource quota sets” (RQS), which can be used to limit maximum resource consumption by any job. The relevant qconf command line switches for manipulating resource quota sets are “-srqs” and “-srqsl” (show), “-arqs” (add), “-mrqs” (modify) and “-drqs” (delete). Each RQS must have the following parameters: name, description, enabled and limit. RQS name cannot have spaces, but its description can be an arbitrary string. The boolean “enabled” flag specifies whether the RQS is enabled or not, while the “limit” field denotes resource quota rule that consists of an optional name, filters for a specific job request and the resource quota limit. Note that one can have multiple “limit” fields associated with a given RQS. For example, the following RQS prevents user “ahogger” to occupy more than 1 job slot in general, and it also limits the same user from running jobs in the headnodes.q queue: $ qconf -srqs ahogger_job_limit { name ahogger_job_limit description "limit ahogger jobs" enabled TRUE limit users ahogger to slots=1 limit users {ahogger} queues {headnodes.q} to slots=0 } The exact format in which RQS have to be specified is, like everything else, well documented in SGE man pages (“man sge_resource_quota”).
Categories: Grid and Cluster

Managing Resource Quotas in Grid Engine

Tue, 12/02/2008 - 10:36
It is often the case that cluster administrators must impose limits on using certain resources. Good example here would be preventing a particular user (or a set of users), from utilizing entire queue (or cluster) at any point. If you???ve... Sinisa Veseli
Categories: Grid and Cluster

Automating Grid Engine Monitoring

Mon, 11/17/2008 - 08:04
When visiting client sites I often notice various issues with the existing distributed resource management software installations. The problems usually vary from configuration issues to queues in an error state. While things like inadequate resources and queue structure usually require more analysis and better design, problems like queues in an error state are easily detectable. So, cluster administrators, who are often busy with many other duties, should try to automate monitoring tasks as much as they can. For example, if you are using Grid Engine, you can easily come up with scripts like the one below, which looks for several different kinds of problems in your SGE installation: #!/bin/sh . /usr/local/unicluster/unicluster-user-env.sh explainProblem() { qHost=$1 # queue where the problem is found msg=`qstat -f -q $qHost -explain aAEc | tail -1 | sed 's?-??g' | sed '/^$/d'` echo $msg } checkProblem() { description=$1 # problem description signature=$2 # problem signature for q in `qconf -sql`; do cmd="qstat -f -q $q | grep $q | awk '{if(NF>5 && index(\$NF, \"$signature\")>0) print \$1}'" qHostList=`eval $cmd` if [ "$qHostList" != "" ]; then for qHost in $qHostList; do msg=`explainProblem $qHost` echo "$description on $qHost:" echo " $msg" echo "" done fi done } echo "Grid Engine Issue Summary" echo "=========================" echo "" checkProblem Error E checkProblem SuspendThreshold A checkProblem Alarm a checkProblem ConfigProblem c Note that the above script should work with Unicluster Express 3.2 installed in the default (/usr/local/unicluster) location. It can be easily modified to, for example, send email to administrators in case problems are found that need attention. Although simple, such scripts usually go long way towards ensuring that your Grid Engine installation operates smoothly.
Categories: Grid and Cluster

Automating Grid Engine Monitoring

Mon, 11/17/2008 - 08:04
When visiting client sites I often notice various issues with the existing distributed resource management software installations. The problems usually vary from configuration issues to queues in an error state. While things like inadequate resources and queue structure usually require... Rich Wellner
Categories: Grid and Cluster

Who Cares What's inside a Cloud?

Thu, 11/06/2008 - 07:57

When I consider my microwave, telephone, or television I see fairly sophisticated applications that I simply plug into service providers and get useful results. If I choose to switch between individual service providers I can do so easily (assuming certain levels of deregulation of utility monopolies of course). Most importantly, while I understand how these appliances work, I would never want to build one myself. Yet I am not required to do so because the providers use standardized interfaces that appliance manufactures can easily offer: I buy my appliances as I might any other tool. Consequently, I can switch out the manufacturer or models for each of the services I use without interacting with the provider. I use these tools in a way that makes my work and life more efficient.

Nobody listens in on my conversations, nor do they receive services at my expense, I can use these services how I wish, and because of competition, I can expect an outstanding quality of service. At the end of the month, I get a bill from my providers for the services I used. These monetary costs are far outweighed by the convenience these services offer.

It is this sort of operational simplicity that motivated the first call for computational power as a utility in 1965. Like the electrical grid, a consumer would simply plug in their favorite application and use the compute power offered by a provider. Beginning in the 1990s, this effort centered around the concept of Grid computing.

Just like the early-days of electricity services, there were many issues with providing Grid computing. The very first offerings were proprietary or narrowly focused. The parallels with the electric industry are easily recognized. Some might provide street lighting whereas others would provide power for home lighting and still others for transportation and yet another group industrial applications. Moreover, each provider used different interfaces to get the power. Thus switching between providers, not a rare occurrence in a volatile industry, was no small undertaking. This, clearly was very costly for the consumer.

It took an entrepreneur to come to the industry and unify electrical services for all applications while also creating a standardized product (see http://www.eei.org/industry_issues/industry_overview_and_statistics/history for a quick overview). Similarly several visionaries had to step in and define what a Grid computer needed to do in order to create a widely consumable product. While these goals were largely met and several offerings became very successful, Grid computing never really became the firmly rooted utility-like service that we hoped for. Rather, it seems to have become an offering for specialized high-performance computing users.

This market is not the realm of service that I started thinking about early in this post. Take television service: this level of service is neither for a single viewer nor a small-business who might want to repackage a set of programs to its customers (say a sports bar). Rather it is for large-scale industries whose service requirements are unimaginable by all but a few people. I cannot even draw a parallel to television service. In telecommunication it would be the realm of a CLEC.

Furthermore, unlike my microwave, I am expected to customize my application to work well on a grid. I cannot simply plug it in and get better service than I can from my own PC. It would be the equivalent of choosing to reheat my food on my stove or building my own microwave. You see, my microwave, television service, and phone services are not just basic offerings of food preparation, entertainment, and communication. Instead, these are sophisticated systems that make my work and life easier. Grid computing, while very useful, does not simplify program implementation.

So in steps cloud computing: an emerging technology that seems to have significant overlap with grid computing while also providing simplifying services (something as a service). I may still have to assemble a microwave from pre-built pieces but everything is ready for me to use. I only have to add my personal touches to assemble a meal. It really isn't relevant whether the microwave is central to the task or just one piece of many.

When I approach a task that I hope to solve using a program, how might I plug that in just as easily? Let's quickly consider how services are provided for television. When I plug my application(TV) in to the electricity provider as well as a broadcaster of some sort, it just works. I can change the channel to the streams that I like. I can buy packages that provide me the best set of streams. In addition, some providers will offer me on-demand programming as well as internet and telephone services. If anything breaks, I call a number and they deal with it. None of this requires anything of me. I pay my bill and I get services.

Okay, how would that work for a computation? Say I want to find the inverse for a matrix. I would send out my data to the channel that inverted matrices the way I like them. The provider will worry about attaining the advertised performance, reliability, scalability, security, sustainability, device/location independence, tenancy, and capital expenditure: those characteristics of the cloud that I could not care less about. Additionally, the cloud properties that Rich Wellner assembled don't interest me much either. Certainly they may be differentiators, but the actual implementation is somebody else's problem in the same way that continuous electrical service provision is not my chief concern when I turn on the TV. What I want and will get is an inverse to the matrix I submitted in the time frame I requested deposited where I requested it to be put. I may use the inverted matrix to simultaneously solve for earthquake locations and earth properties or for material stresses and strains in a two-dimensional plate. That is my recipe and my problem.

After all, I should get services "without knowledge of, expertise with, or control over the technology infrastructure that supports them," as the cloud computing wiki page claims. Essentially the aforementioned cloud characteristics are directed towards service providers rather than to the non-expert consumer that highlights the wiki definition. Isn't the differentiator between the Cloud and the Grid the concealment of the complex infrastructure underneath? If the non-expert consumer is expected to worry about algorithm scalability, distributing data, starting and stopping resources and all of that, they certainly will need to gain some expertise quickly. Further, once they have that skill, why wouldn't they just use a mature Grid offering rather than deal with the non-standardized and chaotic clouds? Are these provider-specific characteristics not just a total rebranding of Grid?

As such, I suggest that several consumer-based characteristics should replace the rather inconsequential provider-internal ones that currently exist.

A cloud is characterized by services that:

  • use a specified algorithm to solve a particular problem;
  • can be purchased for one-time, infrequent use, or regular use;
  • state their peak, expected, and minimum performances;
  • state the expected response time;
  • can be queried for changes to expected response time;
  • support asynchronous messaging. A consumer must be able to discover when things are finished;
  • use standard, open, general-purpose protocols and interfaces (clearly);
  • have specified entry-points;
  • can interact with other cloud service providers. In particular, a service should be able to send output to long-term cloud-storage providers;

Now that sounds more like Computation-as-a-Service.

Categories: Grid and Cluster

Who Cares What's inside a Cloud?

Thu, 11/06/2008 - 07:57
When I consider my microwave, telephone, or television I see fairly sophisticated applications that I simply plug into service providers and get useful results. If I choose to switch between individual service providers I can do so easily (assuming certain... Roderick Flores
Categories: Grid and Cluster

Cloud Computing: Commodity or Value Sale?

Mon, 11/03/2008 - 13:18

There is a controversy in the cloud community today about whether the market is going to be one based on value or price. Rephrased, will cloud computing be a commodity or an enablement technology.

A poster on one of the cloud computing lists asserted that electricity would be a key component of pricing. He was then jumped on by people saying that value would be the key.

It seems like folks are talking past one another.

His assertion is true if CC is a commodity.

Now that said, there are precious few commodities in IT. Maybe internet connectivity is one. Monitors might be another. Maybe there are a few more.

But very quickly you get past swappable components that do very nearly the same job and into the realm of 'stuff' that is not easily replaceable. Then the discussion turns to one of value.

Amazon recognized the commodity of books and won the war over people who were trying to sell value. They appear to be attempting to do the same with computer time, which makes the battle they will fight over the next few years with Microsoft (and the increasing number of smaller players) extra interesting.

There is also the problem of making sweeping statements like "the market will figure things out". There is no "the market". Even on Wall Street. The reason things happen is because different people and institutions have different investment goals. Those goals vary over time and create growing or shrinking windows of opportunity for other people and institutions.

I've made my bet on how "the market" for cloud computing will shake out in the short to medium term. Now I'm just hoping that there are enough of the people and institutions my bet is predicated on in existence.

Categories: Grid and Cluster

Cloud Computing: Commodity or Value Sale?

Mon, 11/03/2008 - 13:18
There is a controversy in the cloud community today about whether the market is going to be one based on value or price. Rephrased, will cloud computing be a commodity or an enablement technology. A poster on one of the... Rich Wellner
Categories: Grid and Cluster

Elastic Management of Computing Clusters

Wed, 10/29/2008 - 10:26

Besides all the hype, clouds (i.e. a service for the on-demand provision of virtual machines, others would say IaaS) are making utility computing a reality, check for example the the Amazon EC2 case studies . This new model, and virtualization technologies in general, is also being actively explored by the scientific community. There are quite a few initiatives that integrates virtualization with a range of computing platforms, from clusters to Grid infrastructures. Once this integration is achieved the next step is natural, jump to the clouds and provision the VMs from an external site. For example, a recent work from UNIVA UD has demonstrated the feasibility of supplementing a UNIVA Express cluster with EC2 resources (you can download the whitepaper to learn more).

This cloud provision model can be further integrated with the in-house physical infrastructure when it is combined with a virtual machine (VM) management system, like OpenNebula. A VM manager is responsible for the efficient management of the virtual infrastructure as a whole, by providing basic functionality for the deployment, control and monitoring of VMs on a distributed pool of resources. The use of this new virtualization layer decouples the computing cluster from the physical infrastructure, and so extends the classical benefits of VMs to the cluster level (i.e. cluster consolidation, cluster isolation, cluster partitioning and elastic cluster capacity).

Architecture of an Elastic Cluster
A computing cluster can be easily virtualized by putting the front-end and worker nodes into VMs. In our case, the virtual cluster front-end (SGE master host) is deployed in the local resources with Internet connectivity to be able to communicate with Amazon EC2 VMs. This cluster front-end acts also as NFS and NIS server for every worker node in the virtual cluster.

The virtual worker nodes communicate with the front-end through a private local area network. The local worker nodes are connected to this vLAN through a virtual bridge configured in every physical host.  The EC2 worker nodes are connected to the vLAN with an OpenVPN tunnel, which is established between each remote node (OpenVPN clients) and the cluster front-end (OpenVPN server). With this configuration, every worker node (either local or remote) can communicate with the front-end and can use the common network services transparently. The architecture of the cluster is shown in the following figure:


Figure courtesy of Prof. Rafael Moreno

Deploying a SGE cluster with OpenNebula and Amazon EC2
The last release of OpenNebula includes a driver to deploy VMs in the EC2 cloud, and so it integrates the Amazon infrastructure with your local resources. The EC2 is managed by OpenNebula just as another local resource with a configurable pre-fixed size, to limit the cluster capacity (i.e. SGE workernodes) that can be allocated in the cloud. In this set-up, your local resources would look like as follows:

>onehost list
HID NAME     RVM      TCPU   FCPU   ACPU    TMEM    FMEM STAT
   0 ursa01     0       800    798    800 8387584 7663616  off
   1 ursa02     0       800    798    800 8387584 7663616  off
   2 ursa03     0       800    798    800 8387584 7663616  on
   3 ursa04     2       800    798    600 8387584 6290432  on
   4 ursa05     1       800    799    700 8387584 7339008  on
   5 ec2        0       500    500    500 8912896 8912896  on

The last line corresponds to EC2, currently configured to host up to 5 m1.small instances.

The OpenNebula EC2 driver translates a general VM deployment file in an EC2 instance description. The driver assumes that a suitable Amazon machine image (AMI) has been previously packed and registered in the S3 storage service. So when a given VM is to be deployed in EC2 its AMI counterpart is instantiated. A typical SGE worker node VM template would be like this:

NAME   = sge_workernode
CPU    = 1
MEMORY = 128                                                            

#Xen or KVM template machine, used when deploying in the local resources
OS   = [kernel="/vmlinuz",initrd= "/initrd.img",root="sda1" ]
DISK = [source="/imges/sge/workernode.img",target="sda",readonly="no"]
DISK = [source="/imges/sge/workernode.swap",target="sdb",readonly="no"]
NIC  = [bridge="eth0"]

#EC2 template machine, this will be use wen submitting this VM to EC2
EC2 = [ AMI="ami-d5c226bc",
        KEYPAIR="gsg-keypair",
        AUTHORIZED_PORTS="22",
        INSTANCETYPE=m1.small]

Once deployed, the cluster would look like this (sge master, 2 local worker nodes and 2 ec2 worker nodes:

>onevm list
  ID      NAME STAT CPU     MEM        HOSTNAME        TIME
  27  sgemast runn 100 1232896          ursa05 00 00:41:57
  28  sgework runn 100 1232896          ursa04 00 00:31:45
  29  sgework runn 100 1232896          ursa04 00 00:32:33
  30  sgework runn   0       0             ec2 00 00:23:12
  31  sgework runn   0       0             ec2 00 00:21:02

You can get additional info from your ec2 VMs, like the IP, using the onvm show command

So, it is easy to manage your virtual cluster with OpenNebula and EC2, but what about efficiency?. Besides the inherent overhead induced by virtualization (around a 10% for processing), the average deployment time of a remote EC2 worker node is 23.6s while a local one takes only 3.3s. Moreover, when executing a HTC workload, the overhead induced by using EC2 (vpn, and a slower network connection) can be neglected.

Ruben S. Montero

This is a joint work with Rafael Moreno and Ignacio M. Llorente

Reprinted from blog.dsa-research.org 

Categories: Grid and Cluster

Elastic Management of Computing Clusters

Wed, 10/29/2008 - 10:26
Besides all the hype, clouds (i.e. a service for the on-demand provision of virtual machines, others would say IaaS) are making utility computing a reality, check for example the the Amazon EC2 case studies . This new model, and virtualization... Ignacio Martin Llorente
Categories: Grid and Cluster

Auditing the Cloud

Mon, 10/20/2008 - 13:25

I've written here about the importance of SLAs for useful cloud computing platforms on a few occasions in the past. The idea behind clouds, that you can get access to resources on demand, is an appealing one. However, it is only part of the total picture. Without an ability to state what you want and go to bed, there isn't much value in the cloud.

Think about that for a minute. With the cloud computing offerings currently available there are no meaningful SLAs written down anywhere. Yet people, every day, run their production applications on an implicit SLA that is internalized something like "amazon is going to give me N units of work for M price".

There are two problems with this.

  • Amazon doesn't scale your resources. Your demand may have spiked and you are still running on the resource you signed up for.
  • There is no audit capability on EC2.
In the Cloud Computing Bill of Rights we wrote about three important attributes that need to be available to do an audit.
  • Events -- The state changes and other factors that effected your system availability.
  • Logs -- Comprehensive information about your application and its runtime environment.
  • Monitoring -- Should not be intrusive and must be limited to what the cloud provider reasonably needs in order to run their facility.

The idea here is that rather than just accepting what your cloud provider sends you at the end of the month as a bill, the world of cloud computing is complex enough that a reasonable set of runtime information must be made available to substantiate the providers claim for compensation.

This is particularly true in the world of SLAs. If my infrastructure is regularly scaling up, out, down or in to meet demands it is essential to be able to verify that the infrastructure is reacting the way that was contracted. Without that, it will be very hard to get people to trust the cloud.

Categories: Grid and Cluster

Auditing the Cloud

Mon, 10/20/2008 - 13:25
I've written here about the importance of SLAs for useful cloud computing platforms on a few occasions in the past. The idea behind clouds, that you can get access to resources on demand, is an appealing one. However, it is... Rich Wellner
Categories: Grid and Cluster

Using GridWay With Unicluster Express - Part II

Mon, 10/20/2008 - 13:05
In my previous post I described how to build and install GridWay metascheduler on top of Unicluster Express 3.2 (UCE). However, before one can actually use the software, there are several configuration steps that have to be completed. For the notes below I will assume that GridWay is installed in the /opt/gw directory, and that UCE is installed in its default location (/usr/local/unicluster), with the default “ucluster” administrative account. Note that you will need root access on your GridWay machine. 1) Create new UNIX group (e.g., gwusers). Your UCE administrative account (ucluster) and all users that will be submitting jobs to GridWay must be members of that group. 2) Make sure that your GridWay installation directory is owned by the UCE administrative account. Assuming GridWay is installed in /opt/gw, invoking something like “chown –R ucluster.gwusers /opt/gw” would do the trick. 3) Edit the /etc/sudoers file, and add the following entries: ... # User alias specification ... Runas_Alias GW_USERS = %gwusers ... # Defaults specification Defaults>GW_USERS env_keep="GW_LOCATION GLOBUS_LOCATION" ... # GridWay entries. ucluster ALL=(GW_USERS) NOPASSWD: /opt/gw/bin/gw_em_mad_ws * ucluster ALL=(GW_USERS) NOPASSWD: /opt/gw/bin/gw_tm_mad_ftp * 4) Configure GridWay. At minimum you must edit the GridWay daemon configuration file /opt/gw/etc/gridway/gwd.conf in order to add the following entries appropriate for UCE: IM_MAD = mds4:gw_im_mad_mds4:-s petruchio.univaud.com:gridftp:ws EM_MAD = ws:gw_em_mad_ws::rsl2 TM_MAD = gridftp:gw_tm_mad_ftp: The only entry that you will need to change in the above example is the host that is running your UCE container, which was set to petruchio.univaud.com in my case. Other files that you might want to inspect are /opt/gw/etc/gridway/sched.conf (contains GridWay scheduler configuration), /opt/gw/etc/gridway/job_template.default (default values for job templates), and /opt/gw/etc/gridway/gwrc (default environment variables for GridWay's so-called middleware access drivers, or MADs). 5) Edit UCE configuration file /usr/local/unicluster/etc/globus_wsrf_mds_usefulrp/gluerp.xml and enable ganglia information provider by un-commenting the following line: <defaultProvider>java org.globus.mds.usefulrp.glue.GangliaElementProducer</defaultProvider> 6) Edit UCE configuration file for the SGE GRAM service to use ganglia information provider (/usr/local/unicluster/etc/gram-service-SGE/gluerp-config.xml) and add the following xml excerpt after the "<ns1:resourcePropertyImpl> org.globus.mds.usefulrp.rpprovider.GLUEResourceProperty</ns1:resourcePropertyImpl>" element: <ns1:resourcePropertyElementProducers> <ns1:className>org.globus.mds.usefulrp.glue.GangliaElementProducer</ns1:className> <ns1:arguments>localhost</ns1:arguments> <ns1:arguments>8649</ns1:arguments> <ns1:period>300</ns1:period> <ns1:transformClass>org.globus.mds.usefulrp.rpprovider.transforms.GLUEComputeElementTransform<ns1:transformClass> </ns1:resourcePropertyElementProducers> 7) Restart the UCE container (as root, run “/etc/rc.d/init.d/unicluster-container restart”). 8) Start the GridWay daemon (gwd) under the ucluster account: source /usr/local/unicluster/unicluster-user-env.sh export JAVA_HOME=/opt/jdk export GW_LOCATION=/opt/gw export PATH=$JAVA_HOME/bin:$GW_LOCATION/bin:$PATH gwd The GridWay daemon should now be able to get information from your UCE container and you should be able to see your available SGE resources using the gwhost command. Note that the gwd command requires the “-m” flag for the multi-user mode. 9) Create simple job template file for your testing (sample job template files can be found in the /opt/gw/test/jt directory), acquire grid proxy using unicluster-grid-logon, and submit your test job via the gwsubmit command. Note that most of the configuration steps I outlined above are described in more detail in the GridWay System Administrator’s Guide. If anything goes wrong, GridWay log files located in the /opt/gw/var directory might help your troubleshooting efforts.
Categories: Grid and Cluster

Using GridWay With Unicluster Express - Part II

Mon, 10/20/2008 - 13:05
In my previous post I described how to build and install GridWay metascheduler on top of Unicluster Express 3.2 (UCE). However, before one can actually use the software, there are several configuration steps that have to be completed. For the... Sinisa Veseli
Categories: Grid and Cluster

Cloud and Grid are Complementary Technologies

Mon, 10/13/2008 - 07:34

There is a growing number of posts and articles trying to show how cloud computing is a new paradigm that supersedes Grid computing by extending its functionality and simplifying its exploitation, even announcing that Grid computing is dead. It seems that new technologies and paradigms have always the mission objective to substitute existing ones. Some of these contributions do not fully understand what grid computing is, focusing their comparative analysis on simplicity of interfaces, implementation details or basic computing aspects. Others posts define Cloud in the same terms as Grid or create a taxonomy which includes Grid and cluster computing technologies.

Grid is as an interoperability technology, enabling the integration and management of services and resources in a distributed, heterogeneous environment. The technology provides support for the deployment of different kinds of infrastructures joining resources which belong to different administrative domains. In the special case of a Compute Grid infrastructure, such as EGEE or TeraGrid, Grid technology is used to federate computing resources spanning multiple sites for job execution and data processing. There are many success cases demonstrating that Grid technology provides the support required to fulfill the demands of several collaborative scientific and business processes.

On the other hand, I do not think there is a single definition for cloud computing as it denotes multiples meanings for different communities (SaaS, PaaS, IaaS...). From my view, the only new feature offered by cloud systems is the provision of virtualized resources as a service, being virtualization the enabling technology. In other words, the relevant contribution of cloud computing is the Infrastructure as a Service (IaaS) model. Virtualization rather than other non significant issues, such as the interfaces, is the key advance. At this point, I should remark that virtualization has been used by the Grid community before the arrival of the "Cloud".

Once I have clearly stated my position about Cloud and Grid, let me show how I see Cloud (and virtualization as enabling technology) and Grid as complementary technologies that will coexist and cooperate at different levels of abstraction in future infrastructures.

There will be a Grid on top of the Cloud

Before explaining the role of cloud computing as resource provider for Grid sites, we should understand the benefits of the virtualization of the local infrastructure (Enterprise or Local Cloud?). How can I access on demand to a cloud provider if I have not previously virtualized my local infrastructure?.

Existing virtualization technologies allow a full separation of resource provisioning from service management. A new virtualization layer between the service and the infrastructure layers decouples a server not only from the underlying physical resource but also from its physical location, without requiring any modification within service layers from both the service administrator and the end-user perspectives. Such decoupling is the key to support the scale-out of a infrastructure in order to supplement local resources with cloud resources to satisfy peak or fluctuating demands.

Getting back to the Grid computing case, the virtualization of a Grid site provides several benefits, which overcome many of the technical barriers for Grid adoption:

  • Easy support for VO-specific worker nodes
  • Reduce gridification cycles
  • Dynamic balance of resources between VO’s
  • Fault tolerance of key infrastructure components
  • Easier deployment and testing of new middleware distributions
  • Distribution of pre-configured components
  • Cheaper development nodes
  • Simplified training machines deployment
  • Performance partitioning between local and grid services
  • On-demand access to cloud providers

If you are interested in more details about how virtualization and cloud computing can support compute Grid infrastructures you can have a look at my presentation "An Introduction to Virtualization and Cloud Technologies to Support Grid Computing" (EGEE08). I also recommend the report "An EGEE Comparative study: Clouds and grids - evolution or revolution?".

There exist technology which supports the above use case. The OpenNebula engine enables the dynamic deployment and re-allocation of virtual machines on a pool of physical resources, providing support to access on-demand to Amazon EC2 resources. On the other hand, Globus Nimbus provides a free, open source infrastructure for remote deployment and management of virtual machines, allowing you to create compute clouds.

There will be a Grid under the Cloud

There is a growing interest in the federation of cloud sites. Cloud providers are opening new infrastructure centers at different geographical locations (see IBM or Amazon Availability Zones) and it is clear that no single facility/provider can create a seemingly infinite infrastructure capable of serving massive amounts of users at all times, from all locations. David Wheeler once said, "Any problem in computer science can be solved with another layer of indirection… But that usually will create another problem“, in the same line, federation of cloud sites involves many technological and research challenges, but the good news is that some of them are not new, and have been already studied and solved by the Grid community.

As stated above Grid is not only about computing. Grid is a technology for federation. In the last years, there has been a huge investment in research and development of technological components for sharing of resources across sites. Several middleware components for file transferring, SLA negotiation, QoS, accounting, monitoring... are available, most of them are open-source. As also predicted by Ian Foster in his post "There's Grid in them thar Clouds", those will be the components that could enable the federation of cloud sites. On the other hand, other components have to be defined and developed from scratch, mainly those related to the efficient management of virtual machines and services within and across administrative domains. That is exactly the aim of the Reservoir project, the European initiative in Cloud Computing.

Conclusions

In order to conclude this post let me venture some predictions about the coexistence of Grid and Cloud computing in future infrastructures:

  • Virtualization, cloud, grid and cluster are complementary technologies that will coexist and cooperate at different levels of abstraction
  • Although there are early adopters of virtualization in the Grid/cluster/HPC community, its full potential has not been exploited yet
  • In few years, the separation of job management from resource management through a virtualized infrastructure will be a common practice
  • Emerging open-source VM managers, such as OpenNebula, will contribute to speed up the adoption
  • Grid/cluster/HPC infrastructures will maintain a resource base scaled to meet the average workload demand and will transparently access to cloud providers to meet peak demands
  • Grid technology will be used for the federation of clouds

In summary, let's try to forget about hypes and concentrate on the complementary functionality provided by both paradigms. My message to the user community, the relevant issue is to evaluate which technology meets your requirements. It is unlikely that a single technology will meet all needs. My message to the Grid community, please do not see Cloud as a threat. Virtualization and Cloud are needed to solve many of the technical barriers for wider Grid adoption. My message to the Cloud community, please try to take advantage of the research and development performed by the Grid community in the last decade.

Ignacio Martín Llorente

Reprinted from blog.dsa-research.org

Categories: Grid and Cluster

Cloud and Grid are Complementary Technologies

Mon, 10/13/2008 - 07:34
There is a growing number of posts and articles trying to show how cloud computing is a new paradigm that supersedes Grid computing by extending its functionality and simplifying its exploitation, even announcing that Grid computing is dead. It seems... Ignacio Martin Llorente
Categories: Grid and Cluster

Using GridWay With Unicluster Express - Part I

Fri, 10/03/2008 - 18:55
Those folks who are closely following development of the Globus Toolkit software probably noticed that its most recent (4.2) release also includes GridWay metascheduler as one of Globus execution management components. This is very good news for the grid community (especially for organizations with more than one cluster), and the GridWay guys certainly deserve praise for their efforts. For any of the earlier (4.0.x) versions of the Toolkit GridWay must be downloaded and installed separately from the standard GT install. In this article I decided to describe steps necessary for GridWay installation on top of Unicluster Express 3.2 (UCE), which is based on GT 4.0.5. In the following, I will assume that UCE has been installed in its default location (/usr/local/unicluster). Before you start, make sure that you have recent version of (non-gnu) Java compiler installed on your system. You will also need a C compiler, which should be available from your linux distribution.
  1. Download GridWay and unpack it in your scratch area. The current release of the software (5.4) is intended for GT 4.2 (which is not fully backwards compatible with 4.0.x releases), so I used the most recent stable release appropriate for GT 4.0.x series. $ wget http://www.gridway.org/software/files/gw-5.2.3.tar.gz $ tar zxvf gw-5.2.3.tar.gz
  2. Configure your UCE build environment by sourcing the UCE setup file and regenerating the GT flavor information. On my 64-bit development machine, I used gcc64dbg: $ source /usr/local/unicluster/unicluster-user-env.sh $ gpt-build –nosrc gcc64dbg
  3. Make sure that java compiler is in your path. I have used Sun’s JDK 6.0 that was installed in the /opt directory: $ export JAVA_HOME=/opt/jdk $ export PATH=$JAVA_HOME/bin:$PATH Note that Java part of the GridWay build does not use Ant.
  4. Configure GridWay. $ cd gw-5.2.3 $ ./configure --prefix=/opt/gw --with-docs --with-tests The generated makefile will have JAVA_HOME variable set to JRE shipped with UniCluster Express. Since GridWay relies on some java specific header files (e.g. jni.h), and since we are using external JDK, this needs to be corrected manually. One way to do this is to edit src/Makefile and to point JAVA_HOME to your JDK directory: $ vi src/Makefile $ cat src/Makefile | grep "JAVA_HOME =" JAVA_HOME = /opt/jdk
  5. Build and install software (make sure that your installation directory is writeable). $ make $ make install With any luck, your build and installation will complete without any issues. Before starting GridWay, however, there are few configuration details that need to be taken care off. I’ll discuss those in my next post.
Categories: Grid and Cluster

Using GridWay With Unicluster Express - Part I

Tue, 09/23/2008 - 07:05
Those folks who are closely following development of the Globus Toolkit software probably noticed that its most recent (4.2) release also includes GridWay metascheduler as one of Globus execution management components. This is very good news for the grid community... Sinisa Veseli
Categories: Grid and Cluster