Articles

Fabrics 101: Essentials of Ethernet Fabrics



hi my name is chip copper and I'd like to thank you for joining us for Ethernet fabrics 101 it's important to understand exactly what we're going to do here today and so in order to do that I'll show you my disclaimers here but these are very important the first thing is through the next few minutes I'm going to try to spend a lot of time explaining exactly what an Ethernet fabric is but I'm going to try to do that from sort of a vendor independent perspective I don't want you to walk away from this seminar thinking that well that was nothing but a great big commercial for the technologies that brocade as a companies decided to adopt but rather what we're trying to do here is we're going to use an approach that we found to be terribly successful in the last decade back at a time when people said there's absolutely no way we will ever consider using a network between our storage devices and the CPUs well today we know that storage area networks are very successful and as a matter of fact that becomes sort of the de facto standard for deploying large storage arrays the the hope here is that by going through this process of Ethernet fabric education that the industry as a whole will benefit from what we've learned over those 10 years and we'll be able to take that expertise and apply it to yet another area where we can see more benefit coming out of it the second piece I want to emphasize is that even as we're talking here today all the standards that we need to fully implement Ethernet fabrics have not been completed and so what that means is as we're going through the slides you may see something that you might say well didn't they change that or hasn't that been updated I'd encourage you that when we're done going through this material go out look at the different reference sites find out what the state of the different standards are and make sure that you understand what is going to be current and of course I'd encourage you to go back and check all of the vendors websites to find out if there have been any updates or any modifications that have been made excuse me that may change the contents of this just a little bit well this is what we're going to be covering in the next few minutes I want to start off by spending some time on the area of the data center transformation now this data center transformation isn't just because we feel like changing the data center what's happened is that there's been a very real power shift that's happened over the last few years where we've seen almost our new empowerment of the users to be able to dictate exactly how the data center and how the IT infrastructures are going to be rolled out because the users have gotten used to being able to almost getting instant gratification when they find that there's a need that needs to be satisfied for example if you look at the devices that they're carrying around today and when someone wants to bring a new application up they don't open a help ticket they don't send a message into the provider to their cellphone provider but rather they want to be able to go to a catalog they want to be able to find that application and they want to be able to deploy it very quickly that's exactly what we see happening now in the area of IT infrastructures where the people who are used to immediate ratification are finding that when they have to move into traditional data centers and they have to try to convince IT executives to do things to allow them to very quickly and efficiently do jobs in this new scenario they're finding it very awkward they're getting very frustrated and we're almost seeing a drive out of our own corporate data centers because people are finding out that there are other people in the industry who will very quickly accommodate their needs in a much simpler fashion and so what that means is if the data centers that we have today are going to remain relevant we have to look at these trends and we have to figure out what we're going to do to change the direction of how we're deploying to data centers so let's talk about that for a few minutes like to start by talking about the nineties it's important to understand where we came from to understand where these data centers are going and how we can use the infrastructure that we have today the big thing during the 90s was just trying to adapt to this brand new paradigm of computing called client-server in client-server now instead of having one big computer like say a mainframe that that's going to be handling all my terminal interactions we looked at a fundamental restructuring of applications so that now I can take what used to be one big application and I can divide that up onto a number of different platforms well there are a number of people even back then who are naysayers who said you're never going to be able to scale the way that we want a scale to get that business application rolled out but what we found out is that if we took a number of servers and we have those servers cooperate with each other they could accomplish the business goals so the real focus here was how can provide what connectivity we need among those different servers to provide that type of service now at the same time all this was happening we figured out that consolidation was the way that we were going to do that and the way that we would consolidate is it was hard to do that with applications and so we realized that if we start with storage and consolidate the storage then I can take these very large storage arrays and I can divide them up and I can partition pieces of that large logical storage array out to a number of different users and so now I can have a large number of applications running on a large number of platforms cooperating with each other and together that cooperative set of processes would be able to accomplish my business goal so that means during the 90s connectivity was the thing and so there were a lot of people who are trained and let's call it track a traditional classical network design where for example we expect to have a number of layers that are going to be built up to provide access and aggregation or distribution layer we expect to have a core layer but on the storage side because the storage side was different we can see that there was already a fabric that had been introduced into the environment through storage area networks now I'm not here to harp on the benefits of storage area networks but rather to focus on what it is that we learned about a SAN environment what we learned was that there were some real advantages of being able to use a fabric to connect together different pieces of the infrastructure rather than having all of our storage being directly connected that's what we did during the 90s as we started moving in through the 2000s we saw that there was unused capacity that existed on these servers and so we found out that we wanted to take the current network and we want to get more out of the infrastructure that we already have now during the 90s we found out that it was very easy for us to one one application on one platform running two applications on the same platform got to be kind of difficult because what we found out is we tried to install software patches or so forth onto these different environments they started to conflict with each other and so it was very easy for us to to confuse the two applications and jumble things up so how do we isolate these well just increasing the number of servers doesn't seem to make a whole lot of sense and so instead of that what we decided to do is look at virtualization as a way of encapsulating each one of these applications once I capsulate the applications in these virtual machines then the virtual machines can live on the same platform without interfering with each other seems to be a good idea but even when we introduced the idea of virtual machines we saw a problem come up the problem was that now if we're going to have multiple virtual machines that are working on the same application it's going to be necessary for these different virtual machines to start to talk to each other how are they going to do that well the networks that we deployed in the data center weren't really designed to have these different virtual machines talk to each other and so we had to start finding workarounds and we had to start implementing different types of things to accommodate what the network's couldn't do for example we started coming up with the idea of virtual soft switches which were swatted switches that actually run in the hypervisors to allow these different virtual machines to speak to each other the reason they came up with them is not because it was just a nice way to express things the reason they had to do it is because the traditional form of networking wouldn't work with them as a matter of fact there were some real problems having two virtual machines talk with each other if they lived on the same platform because if we were going to use the same networking rules that would always use before I wasn't allowed to send a message out and receive that same message back on the same networking port and so it turns out there were some real problems but we found some workarounds instead of continuing this path of saying we've got a problem workaround problem workaround we're at the point now where because of these changing business needs we have to change our approach we have to look at the different pieces of technology we already have and figure out if there isn't a better way to do things well the assertion that I'll make is that we learned over a decade ago that if we deploy things in a fabric fashion over on the storage side we know that we can quickly deploy it we know that we can easily scale it we know that we can recapture it if for example some of the story just no longer needed and we can sign that off for another application the assertion that we're making here is now in the 2010s if you would the the fabric if that concept is moved over into the ethernet world can become a way that we can take all the benefits that we learned over in the storage world and we comply them on the client server side and so this is time to change the way we're thinking about architecting new data centers it's a time to do away with things like the spanning tree protocol I'm sure if you've spent any time in networking shops at all you know that spanning tree protocol is just the bane of every architect spanning tree protocol basically says that there are a fixed set of rules that I'm going to use for designing my data center for deploying my different hosts and how these are going to be hooked together the problem is that the way those data centers are designed is going to conflict with the way that I want my virtual machines to be deployed throughout the rest of my infrastructure and so we have to get around that conflict well we've got two choices we can either force the applications to change to accommodate our network that's not a very good idea or we can change the way that we're doing our networks to accommodate the changing role of applications that makes sense and that's exactly why we want to consider Ethernet fabrics that's the reason why it's time for us to go into a place that may be uncomfortable for people who've been designing networks now for literally decades using exactly the same design criteria and now start to reconsider how those networks are going to be deployed in a fabric environment that's where we are today the good news is these technologies aren't new as I mentioned previously if Ethan at fabrics were just a brand new thing that we had never seen before we would have a lot to learn but remember that we're using the intellectual property we're using all the best practices that we learned back when we're deploying storage area networks and so that means we know how these things are going to work we know how they're going to be deployed we know how they're going to scale we know how they're going to fail so we know how to repair them when something does go wrong we've got that expertise down and we have it down to the point that today we can successfully run enterprises built on fabrics on the storage side or on the storage side if anything is lost if anything is corrupted if anything is duplicated if anything is transmitted along different paths so it arrives out of order if any of those things happen the storage will go down and yet today storage aren't area networks are the de-facto choice for deploying large-scale storage infrastructures that means we understand these fabrics very well so this is really just the transfer of what we've learned over on the store which side the more stringent environment over into the ethernet environment well where do we go from there as we start to get to about probably 2015 and beyond this is the area where now we've got an infrastructure in place that's really going to allow the user to now have more of a say in exactly how the applications are being deployed so for example if we have a user that says I really need to have follow-the-sun applications or because I've got a storm that's coming in to a particular location and I need a continuity of operations I need to be able to take these applications and move them to another location or from a business viewpoint if I decide that the the the provider that I'm with today isn't maybe providing the services that I want but there's someone else there in a different location who can accommodate my datacenter better I can move my resources over to another data center these all become options as I start to look at increasing the user experience by allowing connectivity over distance we think this once fabrics are adopted is going to be the next big steps how do I take these virtual machines how do I take these applications and how do I easily move them from a point to a point in a way where this is actually going to be enhancing the user environment and again we think that a fundamental building block of this says the way that you're going to be able to move outside of the data center is to be able to move inside of the data center and the way to be able to move inside of the data center is by adopting a fabric as being the fundamental connectivity unit to allow these different applications to talk to each other well the next piece after this is if I really do have all these different pieces put together and now I do have the ability to move virtual machines or applications anywhere I want to go then the next step says well I don't just have to contain these that to my own environment maybe there's someone else out there who's going to be able to provide for me the same type of services or perhaps differentiated services so that I can take these applications and move them around so you can see that this actually says that in a way the role of the data center administrator is going to change so that now instead of being someone who worries about how platforms are going to be deployed and how do i cable the access layer or the aggregation lei or something like that instead of that we see this world changing to be one who who now is going to be the person who's going to be coordinating an infrastructure that's going to allow the applications to move around and to accommodate the business requirements that say I need to consider going to another data center I need to consider going to another location and so there's a very natural progression that's going to take place here now if you look at this you may be saying that's not exactly the way that we're going to be doing it we may for example be going to an external service provider before we look at our own long-distance solutions or we may be looking at someone who's already able to migrate our applications back and forth so the order in which you're doing things may be a little different than the order that I've laid out here but the thing that we do know is that ultimately the goal through all this is going to be to improve the user experience so that when the user comes forward and says these are my business goals these are me requirements this is the time frame that I have to deploy this and at the same time says I don't want to pay for overhead that I'm not using I don't want to go to a particular provider because we're trapped into using them we're going to give them all the flexibility that they need to be able to provide this continuity of operations but you can see from this that a fundamental building block is going to be the Ethernet fabric and the Ethernet fabric will be putting this putting the ability to move around inside there's an interesting phenomenon that took place actually several decades now when we saw a movement that took us into the area of client-server computing many people say what was the thing that convinced people to go from one large computer running all these applications into an environment that supported client-server computing because if we can figure out what what we did then then we're going to understand what it's going to take to get people to move from where they are today with client-server computing into an area where they're more comfortable with doing let's say cloud computing something allows things to move around well I think the answer to that is it's not going to be something that's going to be provided by IT just because we can do cloud computing doesn't mean we're going to do cloud computing there has to be something that drives it there I think that what we've learned over the last couple decades and which can be reflected through a number of very interesting studies is that the of the IT is going to directly reflect the needs of the applications that are being deployed the way that we used to see applications deployed was that we saw these great big blocks this is my application it was sort of like this great big monolithic entity and so for example you could walk into a data center and you could point to a particular box and say see that server that is my email server or see that server that is my database server under the new paradigm of cloud computing and the increase in user experience that can't work anymore because that means that if that big box right over there is my email server I have no ability to scale up when I get more users I have no ability to scale down when I don't have that many people that necessarily need mail services at this location and so what the application vendors are doing is they're going into all of their applications and they're restructuring them so the what used to be one great big application is now being broken down into a number of pieces now the advantage of doing this if you think about it from the software vendor side is that now instead of writing the small version of the application the medium flavor of the application the large flavor of the application they can come up with one flavor of the application and then based on your particular requirements they can decide how many of each building block you need and so for example as you see here the diagram if you only need us to support a certain number of users I will have a smaller number of front end servers a smaller number of databases and a smaller number of business rule processing agents as I start to scale I'll add more now this provides a lot of flexibility in terms of the scalability of the application not in the functionality of the application but rather than the scalability of the application which means now I write that application once once I deploy it and a person determines how large they need it to be they just add the right number of components that changes everything because now that means that I've given my user the flexibility to decide when I need more email boxes when I need to have more sales operations when I need to have more web fronts and they can scale up or conversely they can scale down that means that licensing and the way that I'm apportioning virtual machines and the way that I'm distributing these workloads is going to be extremely dynamic and in fact now it can even be seasonal where for example as I start moving towards the end of the year and I start to see a lot of people who are doing a lot of gift buying or something like that I can increase my sales presence as I start to see things move into the times and perhaps people are watching their budgets or the seasons are over I can see a collapse of that and I can reuse that infrastructure that's the elasticity that I need to have and that's how vendors are we architected their application because of the reactive ticketing of this application it's going to be assumed that the infrastructure inside the data centers is going to be able to support this type of expandability and collapse that's exactly what a fabric is going to do it's going to give you the connectivity that you need where you need it when you need it so you can scale up and you can scale down but there is a consequence that comes about as a result of having applications reorganized this way what's going to happen is that before when I have the great big email machine all my emails go out my users compose to read their emails and they send them back to that machine and so I had an awful lot of what traditionally has been called north-south traffic that is to say I've got things coming in at layer three they go through layer two now the layer one or that should say the access or aggregation layers are going to see the traffic going through to the edge and they go back out because of that restructuring of the application that we just saw in the previous slide there's now going to be a number of applications that are going to be cooperating with each other to accomplish the same thing and so that means that now I might have four or five different virtual machines and so remember way back at the very beginning when we were discussing how the flows of traffic we're going to change and we're going to we have to do away with things like spanning tree and so forth because of the increased traffic among virtual machines this is going to be exacerbated as we see the RER koteki of these applications where now I might have 15 or 20 different virtual machines for the same application that need to talk to each other I can't burden the core of my network with this I want these applications to be able to talk easily with each other and in fact they need to in fact Gardiner's even saying now that we can expect in the next few years 80 percent of all the traffic in the data center to be rather than north-south traffic it's going to be Qwest traffic what that means is that that is going to be traffic where I have virtual machines living inside the same data center that need to communicate with each other to accomplish exactly the same thing that I used to accomplish by having one great big application that lived just on one box so that means I'm absolutely going to have to change the way think about networking inside the data center the other thing that may seem to be a bit scary is the fact that this is not just the same old traffic that I used to have before where I had client-server interactions happening out at the edge because these applications have been broken up into pieces I'm now going to have new traffic that has to move back and forth and so what that means is that the re-architecting these applications is not only going to result in the change in the pattern of communication taking place between these applications but then now the amount of traffic that's moving back and forth to accomplish the same business goal is going to be much greater than it was before so that means not only am I going to be seeing a change in the patterns but now I'm going to have to scale my networks inside of the data center to accommodate this east-west traffic so the thing that's clear here is that what we've been doing for the last few decades as a matter of fact when I was first getting in the networking way back into the 70s we did things a particular way the thing that's kind of interesting is we're still doing things exactly the same way because the people who learn back in the 70s kind of learned what worked and we went into the 80s and then we became architects and we became planners and then we went into the 90s we became official and so here we find ourselves today planning new data centers going forward using exactly the same rules that we've always used before but it's not going to work it's not going to work we are going to have to change data centers in an unprecedented fashion almost exactly the same way that a while ago we we saw a big shift in programming that went from procedural programming towards object-oriented programming well today if you talk to most people they understand exactly what an object is and they understand how to manipulate it the cross over for procedural object-oriented program was kind of tough but we did it I would say that today if you're a person who designs infrastructures are you design IT centers now is the time to get a hold of the concept of fabrics this is the time to recognize the fact that there is a new need that's out there this is not just a matter of finding a new application of the same old infrastructures this is the time we have to reconsider the way that we're deploying these data centers and we have to get our minds around this new mindset or not going to be successful and there is another generation that's coming up just behind people who have grown up with the idea of a cloud environment who's going to be very comfortable with this mode of operation and and are going to be very comfortable adopting these new methodologies so I encourage you especially if you've been doing this for a long time take the time to invest in knowledge in these fabric infrastructures because it's going to become more and more relevant and that'll allow you to apply the expertise of what you knew back in the early days of networking and bring those same concepts forward so that we can be more successful with fabrics in the future so what we've seen here is that the users want to have cloud-like behaviors and so what they want to say they don't want to know anything about IT it's kind of an interesting story that I heard a while ago I was in talking to a user and this did not honor that this is a brocade story and put it by a disclaimer around that it turns out I was in talking to a customer and they happened to not use brocade products at that time and I happen to be dealing for the application users and she came in and she said I just have one question for you she said because I'm not familiar with your networks do your networks use subnets not kind of strange questions but yeah our network she said then you know what if your networks use subnets we have nothing more to talk about you can provide nothing for me good day I said well before you leave can you explain that to me because I don't understand what the problem is here and realize that she'd grown up with the new generation of application deployments and so forth and she said I'm responsible for deploying this application and there are certain ways that I want to see my application deployed I want to have front ends in one place I want to have business rules in another place databases in another place and I want to have these things distributed but every time I go and talk to my IT people about how to do this they keep saying well we can't distribute the virtual machines that you need when I asked why not they say oh because of the subnets you're in the wrong subnets I can't put things into those subnets so she said I have absolutely no idea what a subnet is all I know is that the subnet prevents me from accomplishing my business goals so if your networks use subnets I want nothing to do with them what a clear example of where are our past practices and our comfort zone in the way that we've always designed these networks in the past is now interfering with what the user wants to do the user wants to be able to deploy their business goals because the network is there to serve the application not the other way around at the same time from a business viewpoint we have to realize that if you have assets that are sitting there they're not being utilized you're hurting the bottom line and in the same way if you need more assets than you have to run a particular application and you don't have it and you don't have the ability to easily expand to accommodate that you're hurting the business and so from both a user and a business side it makes perfect sense that we need to start looking at a cloud type infrastructure that's going to allow us to to expand and contract and accomplish exactly what it is that we need to do we've figured out part of the equation the first part of the equation is going back to a hardware machine is limiting we can't do that we have to stick with virtualization but virtualization doesn't stop with the machine it doesn't stop with the storage if we're really going to complete this picture of virtualization then we have to have an infrastructure underneath of all that to have it work together and that's is where I would assert that Ethernet fabrics are going to be key we need to be able to provide from the very beginning and infrastructure so that now when I have my user who comes to me and says this is the way I want to deploy my application instead of saying oh you can't do that because we didn't design our networks that way what we almost have to say is you go ahead and deploy the virtual machines the network will figure out how you've got them deployed and we will automatically adjust to that that's where we need to be that's what the users expect and the good news here again is that Ethernet fabrics can provide that and the reason I say that with confidence once again is not because that's what we think they're going to do it's exactly what we've seen fabrics do for over 15 years now on the storage side we know how these are going to behave and this is exact the kind of facility that we need in the data center on the ethernet side on the layer 2 side to provide the connectivity that we need to accommodate today's changing business goals so why a fabric I mean there are a lot of different things that we could go with in fact you may even be reading about a lot of different competing technologies out there that say well instead of going with fabrics why don't we try to keep spanning tree and and add some more stuff to two spanning tree and see what we can do there there are a couple things that we've learned about fabrics and it's important to spend a little bit of time on them because again we know these things to be true the first thing is that because of the nature of storage traffic we know that one of these networks needs to be resilient we know that these networks cannot fail because while over on the traditional tcp/ip side a failure to deliver a message may result in the timeout or a retransmission on the storage side we know that if anything goes wrong because the operating system doesn't even know that there's a network in there if anything goes wrong blue screen of death panic things stop working and machines crash in order to get around that these fabrics had to be built with the infrastructure they need to be resilient we know how to do that the second piece is that we need to be able to put together these networks so they're going to reflect what my business needs really are if I have a small shop I need to deploy a small fabric if I have a large shop I need to deploy a large fabric if I want to go from a small shop to a large shop I have to know that I can grow the fabric or if I decide to deploy some of my application someplace else and I don't need this infrastructure I need to be able to scale back exactly the same way I need to have that scalability but I also have to have some flexibility in the topology I can't come in and say this is exactly the size that I need because I guarantee you if you come in and say this is the size it needs to be then it's either going to be too large or it's going to be too small you need to have an environment where you can hook things together and also at the time that people are designing what these infrastructures are going to look like we don't know how many floors your data center is going to be or we don't know how close or how far away you're going to have different elements so what that means that the apology has to be flexible we can't come in and say to you well it would sure be nice if we had this fabric running let's say between these three floors and those two building but unfortunately we didn't design it that way so that's not the way you can deploy it completely unacceptable you need to have the ability to put together a topology to put the right ports with the right capabilities in the right place and expect the infrastructure expect the the fabric to be able to accommodate that and finally what we need to do is to accommodate this east-west traffic we need to have more of a flat architecture wait and now flat doesn't necessarily mean linear and we'll talk more about that at an upcoming session but we have to make sure that when we build this network because of this increase in east-west traffic we're not throwing everything up to layer three where we have routers that now have a much greater workload instead of that we have to use the facilities of fabric so that now is I have different virtual machines in different places I can have these different machines communicate with each other without having to go all the way up through the stack of my networking and again we we understand that and we have to figure out how to do it so how do we deploy these things how can I take these fabrics and how can I make my life easier because I've deployed them what turns out there are several concepts and fabrics by the way if you happen to be a traditional networking person and you've never had an opportunity to sit down and discuss administration of a fabric with your sand person take your sand person to lunch just carve out some time and say we need to go because I need to understand exactly what a fabric does and I think what you're going to find out is the way that you administer a fabric is much different than the way that you administer or the traditional network as it exists today for example we talked about object-oriented programming and procedural programming we talked about a fabric or clouds a way of doing things rather than the traditional ways it turns out that from an administrative viewpoint of a fabric there's a difference between policy based management and procedural based management with the procedural based management mechanism I have to go around to every single one of the elements in my environment and I have to manage it now if you're a network administrator and you manage let's say Ethernet switches today you know that every time I bring in a brand new switch that's one more thing that needs to be managed on the fabric side on the other hand when you add another node to the fabric you don't have one more thing to manage because the policy that's already been set up for that particular fabric is going to determine how that particular node is going to behave and so I can almost literally take it out of the box plug it into the wall plug it into the rest of the fabric it will learn the policies and you're up and running all that happens without you having to bring up scripts and having to bring up command lines and figuring out did I touch all the nodes in the network and did I update with them so what that means is the were introduced this concept of a logical chassis the logical chance it can be thought of as being that boundary that is going to be controlled by the policy that I'm deploying for a particular fabric and so what that means is on the inside if I say this is my policy this host can talk to this host got it from the outside what it's going to look like is I've got a great big chassis I don't know how many nodes maybe there's one nodes may be there's 50 nodes inside that chassis it's going to look like one chassis and to the outside world everything is going to look like a single node now later as we start getting to the deployment section you'll see that what that means is that I can take an Ethernet fabric and I can very easily roll that out into the environments that I have today this is not an all or nothing proposition you're not going to find yourself in a situation where for example if you want to go to an Ethernet fabrics you throw at everything else that you have but rather you can start by saying over here in this corner instead of deploying two nodes or three switches or four switches or instead of deploying this brand-new chassis I'm going to see what happens if I deploy those as a fabric instead and because of this concept of a virtual chassis it's going to be very well contained so the rest of your network won't even know that that happens to be a fabric that consists of a number of nodes there's also this concept that says because I'm now using policies if I have a virtual machine that moves from this particular server to this particular server if your user you're going to love this instead of having to fill out a help ticket and waiting two days hoping that the network people will get everything configured in the timely fashion in a correct fashion and so forth instead of that why not just move the virtual machine and expect the infrastructure to figure out that the virtual machine is moved and to automatically adjust to it that's what a fabric can do because now the fabric being managed by policy is going to be intelligent enough that if it sees something moved from one port to another port it can notice that thing it can recognize it it can authenticate it and once authenticating it it can give that particular port whatever characteristics you need to allow that application to get online in a way that it should be set up automatically the third piece of this is that now that I've got this fabric that lives out there if I build the fabric correctly then if I have multiple types of traffic that I want to flow across that fabric I'll be able to have that happen and all those different types of traffic will live together in a way where they don't interfere with each other or put differently instead of saying I have to put in a different network depending on which protocol or which type of storage I decide to use would it be better to just put in an infrastructure and expect that infrastructure to provide whatever type of service you need and so for example if you need Nass if you need I scuzzy if you need Fibre Channel over Ethernet if you need tcp/ip if you need voice over IP if you need video wouldn't it be nice to know that no matter what type of service I get or what type of service that application needs I'm just going to get it from the fabric that is exactly what a fabric can provide for you because the fabric has the ability to go out look at all these different types of data flows figure out what it needs to do for that particular flow and to facilitate the right services on that flow while simultaneously providing the right services for the other flows for all the different applications that's why we need Ethernet fabric for all the reasons I've just mentioned I also think that if you go back and if you've been writing some of these things down you'll find out that many of the things that I'm talking about today simply can't be accomplished with today's infrastructures as they've been out built out for the last two or three decades using these traditional mechanisms it's a strong need for why we need to do Ethernet fabrics now once we've done this instead of the network being a barrier instead of the network being something that's going to slow everything down the network now becomes a way for me to speed things up as an example if you have if you have a chance walk around your corporation or wherever you happen to find yourself today and find your network operation center my guess is that when you walk into your network operation center it's going to be beautiful many network operation centers that I walk into and by the way if I ever have a chance to tour your facility don't not let me see this because of what I'm about to say because I'd still love to see you network operation center but it turns out most of the network operation centers that I walk into our absolutely beautiful they have high-definition screens that are up on the wall when I walk into one of the the management cubes they'll have one of the 24-hour comfortable chairs there might be four or five different displays that are sitting on the desk maybe one or two keyboards you know why all that stuff is there it's there because it needs to be there and every one of those high-definition screens and every one of those chairs and keyboards has to be paid for because of that stuff isn't there the network crashes and that overhead that expense is being spent to continue to deploy networks the way we've always done it in the past on the other hand I challenge you if you happen to use a storage area network today to go find your sand administrator and the odds are instead of having this great big beautiful network operation center instead of that you're probably going to find out that oh it's we run it on a virtual desktop on Bob's laptop and the server's tucked back somewhere in a server room in a corner you know but but remember that that storage traffic and if anything goes wrong with story traffic servers crash now why is it that my best effort Network delivery systems need all this infrastructure to run and yet my storage infrastructure needs relatively little why is that and the answer that is because we've changed the way that we conduct that management we've we've got to see the same thing happen over on the ethernet side because at that point instead of being a business liability where I have to invest all this infrastructure time people expertise into just keeping my network running instead of that let's make the networks be smart enough that these networks can go out they can diagnose themselves they can heal themselves they can figure out what's going on and now it's going to become an asset so that when a business user comes and says I've got something coming up I need to quickly deploy this new application in this new location can you do it for me the networking team is going to say as soon as you move the virtual machine it's already there no help ticket service levels guaranteed you don't even need to tell me what kind of storage you're going to be using when you bring up that virtual machine or that application when it tries to see its storage network is going to figure out and my network is going to provide to you look at what a jump that makes from the way that we do it now where today we give them a worksheet and what kind of network do you need what kind of stories you need oh wait you can't get that kind of storage on that particular network we're going to have to move you to another network we just throw up barrier after barrier after barrier and Ethernet fabrics provide a way for us to make the jump over where those business barriers aren't going to be there anymore well we've just now reviewed what some of the motivations are for us to consider a new technology like Ethernet fabrics what we're going to do now is we're going to start going into a little bit more than integrating onto exactly what some of the technologies are that underlie these different technologies and so in order to do that I'm going to give you a few definitions right up front this is not the only time will hear this definition it's just that sometimes I find myself using these expressions without telling you exactly what they means and so all I want to do here for the next few minutes is just familiarize you with some of the vernacular that I may be using and then we're going to be going into more detail on the next few slides talk about what each of these things are the first thing we have to do is we have to try to figure out what a fabric looks like how does a fabric behave well the way that a fabric is going to work is I've described it previously is that presumably if I have two hosts I have the ability to since I don't have spanning tree anymore I have the ability to use multiple paths those paths will go along a different number of hops I need to make sure that because I've got convergence I can accommodate any type of traffic it may be traffic which has to be absolutely lossless it could be traffic that says you know it it's okay if you throw these frames away because we have found cases as I'll talk about in a little bit where tcp/ip networks actually work better in a lossy environment that in a lossless environment we have to make sure that we're going to provide the most robust services but we're not going to burden the applications that don't necessarily need those services and you're also going to hear me saying fabric a lot what do I need buy fabric in the case of fabric when people today talk about a fabric based infrastructure what they're basically referring to is is an infrastructure that doesn't look like the same infrastructure that I've been used for the past two decades but as one rabbit reflects the the fabric nature that I described just a few minutes ago or in the previous section where we talked about how a fabric does provide multipathing multi hopping varying degrees of resiliency depending on what the application needs to load and so forth and so Gartner has this term called fabric based infrastructure fabric based infrastructure only means that this is the new way of architecting data flows and management to accommodate all those business things that we talked about a little bit previously if I'm going to be talking about the storage area network thus and I'll probably refer to that as just being a storage fabric and the reason that becomes a good example is because we already know what a fabric does we've had fabrics for over 15 years now we just never deployed them over on the ethernet world and so that's why I'm also going to be talking about an Ethernet fabric basically an Ethernet fabric means we're going to take everything that we learned by using storage fabrics for providing fibre channel storage area networking infrastructure and we're going to take the same design principles in the same guidelines and we're going to move those from the storage side over to the Ethernet side so Ethernet fabrics means the context in which we're going to deploying these a finet fabric concepts and then finally as I discussed briefly before this concept of a flat network a flat network basically means a network where things have been laid out in such a way that I can have different elements talking to each other inside of the data center or even across data centers in such a way that I don't always have to go up to layer 3 to do that that is to say that I can have layer 2 switching that's going to take back in place or back and forth between these different elements without going all the way up to the core so that's what I mean by these different turns but don't worry if it didn't get them all now we're going to be revisiting these terms in just a few minutes that was just sort of an initial introduction to them so now let's go through some of the different standards that are out there some of the different terms and the technologies that are being used and we're going to highlight on the technologies that you see here let's start with trill once again you'll find out the trill and shortest path bridging are sort of I'm going to say competing standards they're trying to accomplish the same thing the important thing here is that they are intended to accomplish the same thing which is to say that I want to provide multipathing and I want to provide multi-hop capabilities and later we'll start comparing them and I think it's a fair and biased comparison and I think that what you're going to see at the end of this is that they both pretty much do the same thing they both accomplish different goals the biggest difference between the two quite honestly is which standard body was responsible for writing which version of it so sometimes the argument as to whether you're going to use trill versus shortest path bridging comes down to which of these two standards bodies do have a greater investment in or which of the two do you tend to follow more closely when the case of trill true basically says instead of having the same old Ethernet switch we have before or an Ethernet bridge you may recognize that instead of that we're going to introduce the idea of a routing bridge or an AR bridge a routing bridge or an AR bridge is something that is fundamentally different than the type of Ethernet bridge that we've seen before it's a bridge that has to have more smarts about how it's going to receive frames and how it's going to be sending frames out and how it's going to manage the passes are going to be used for all these different frames for the data plane it has to say where I'm actually moving the information back and forth between the different nodes we're going to be using the trill protocol or put differently if you haven't put a wire sniffer onto a port that's running trill you're going to see the frame headers being decoded as trill frame headers because there's a certain amount of information that we need in there in order to route a frame from a source to the destination and we'll see that in this case the way that we route from a source to a destination actually depends on what each particular hop wants to do and that's part of the difference that now each our bridge has been given the intelligence to decide for this particular frame what's the best way to send it from point A to point B or if I've already got a path established and I need to make sure these things stay in line this is the path I've already selected that's the path I should continue to use so really you should think of trill as being a data plane protocol now how am I going to allow these different trill our bridges to decide what they're Kepala G should be and how they should communicate with each other well the answer to that is we're going to use a control plane protocol that does that and in this case we're going to be using something called Isis the reason that Isis was picked because Isis is a layer 2 protocol it's a link state routing protocol but it doesn't need to have IP running in order to be able to carry out what it needs to get done so that means it's a very natural choice for being integrated with trill we're going to be talking more about Isis in just a few slides but now that we understand the trill basically uses a fundamentally different type of Ethernet switch called an R bridge and there's an encapsulation with the troll protocol let's now look at shortest path bridging one of the real advantages of shortest path bridging is that if you have a traditional Ethernet bridge today and it supports the standards that you see listed there on the page it can work inside of a shortest path bridging environment so if you're saying to yourself hey how do I go about building something's fabric something that's going to give me multipathing and multi hopping but I want to do with what I already have then you should investigate whether or not the vendor that you purchased your Ethernet switches from are going to be able to support shortest path bridging now remember over on the troll side when we are talking about how I'm going to move frames through the network we talked about the troll protocol the troll protocol works by taking the Ethernet frame that's coming in from the edge and we're going to put a trill header on the front of it in the case of shortest path bridging we're going to use a different form of in Kapla kate excuse me encapsulation in this case it's called Mac and Mac that is to say that I'm just going to take the MAC address and I'm going to put another MAC address header on the front of it and that new mac address header is going to be used for figuring out exactly how the flow should go from the ingress to the egress through the rest of the network but it is interesting to notice here that once again at the control plane I am going to be using Isis once again for exactly the same reasons Isis was set up to be a nice layer to link state routing protocol no reason we shouldn't use it here and because it's going to accomplish exactly the same goal and so what we see here is that whether you're using shortest path bridging or whether using trill both our bridges and SPB bridges are going to do exactly the same thing they're going to use their link State protocol to figure out who their neighbors are they're going to figure out the best way to go from point A to point B and then they're just going to encapsulate the frames from A to B inside of some kind of header and here you can see that in the case of trill it's just going to be a trill header if you're using SBB it's going to be a Mac and Mac header so they're still both using encapsulation and they're both going to use information that's contained from this link state routing protocol called Isis so what does this link state routing protocol do if we've sort of removed whether we're moving spit whether we're using spanning tree bridge or trill from this control plane functionality what are we going to do well or how are we going to accomplish that the role of a link state routing protocol is just to make sure that all the nodes that are running in this fabric have exchanged enough information that I know what the topology the fabric is going to be so you can see here that the link state routing protocol that I use inside of a fabric can almost be isolated from the way that I'm moving data frames through the network which is to say that even if Isis happened to not be the link state routing protocol as long as I do have some sort of link state routing protocol in there then both tril and SPB would continue to work but the reason Isis is so important here is because we want to make sure that as we start to put together different networks from different vendors we want to make sure these things are going to play nicely with each other and so that's why it's very important here to find a common link state routing protocol that's going to be able to move these things together now in many cases what vendors today have done is because as you remember I mentioned way back on slide 2 because things are still in a state of flux today is we get all these pieces ironed out in order to try to deploy SPB and trill solutions quickly sometimes instead of using of the link state routing protocol that was recommended in a later revision of the standard people have used other link state routing protocols but where we want to get to is a point where everybody is using the same link state routing protocol because we want to make sure that independently of what multi-hop multipath protocol you're using everyone can talk to each other so that means that in the not-too-distant future I'm sure what you're going to see is that everybody is going to be using the same variants of link state routing protocol based on Isis and from there it's just a matter of what you want to use for moving your data frames back and forth so here we can see a chart that really talks about the differences between trill and SPB you'll notice that the biggest difference in my opinion at least is highlighted at the very beginning which said that there were two different task force they're going about trying to solve this problem you can see one is the ietf one is the I Triple E and so if you want more detail on any one of these protocols I'd encourage you to go to each of these organizations respective websites you get a lot more information there you can see that from a link state protocol they're both based on Isis it's interesting to notice here though that even though they're both based on Isis they have different variations of Isis that they have to deal with so in the case of trill we have some new protocol data units they're going to be used for exchanging information between these different of our bridges in the case of Isis for spanning tree bridges we're going to have some more tlvs or type length and value fields that are going to describe what things are going to look like so there's still some modifications I can't just take vanilla Isis for either of these two solutions but they're close enough that at least we have a framework where can deploy some common infrastructure they both use encapsulation they both support multipath they both support multi hopping you notice that the way that we handle a loop mitigation is a little bit different in the case of trill what we do is we have a time to live field and that time to live field basically is going to get decremented every time I go through a hop and that way when the hop count becomes zero if I happen to have a rogue frame that's running out there then I can get rid of that rogue frame as the time to live becomes zero in the case of SPB what's going to happen instead is that the network is going to be clever enough to know where particular frames should be coming from from a particular mode and if for example I'm always expecting to hear from this particular host on this particular port but for some reason I'm hearing from that particular load on this port now and that's not right I shouldn't forward that frame something's probably caused the loop condition or something like that somewhere else in the network so they're both going to accomplish exactly the same thing they're going to accomplish loop mitigation using different techniques but they're both going to be effective they're both going to result in getting rid of the frames that shouldn't be in the network for packet floats kind of interesting to notice the difference between the two in the case of SPB if I have a flow going from A to B then the flow from B to a is going to be following exactly the same path and it's going to be very symmetric it's going to be very deterministic in the case of trill I'm going to make that decision on a hop by hop basis which is why they're called routing bridges so I'm going to look at the frame I'm going to look at my link state routing tables and I'm going to make a decision at each point again they both accomplish exactly the same thing to get the frame from the ingress to the egress they just do a little bit differently probably the biggest difference is between the two from a deployment and management view point comes their configuration complexity and their troubleshooting in the case of trill configuration complexity is actually pretty easy because you just hook them all together they all figure out what the link state routing protocols or or what the link state tables need to be they figure it out they dynamically deploy and off you go in the case of excuse me SP B it's a little bit different because now I have to spend a little bit more time and planning exactly what my topology is going to be looking like I want to kind of build a little bit more of a symmetry into the infrastructure just to make sure that I've got more paths that are going to be available for equal flow but on the other hand when it comes time to troubleshoot these networks in the case of trill it's going to be a little bit more different a little bit more difficult I should say because now each one of these nodes is going to be making decisions and so instead of just knowing exactly what a path is going to be I'll actually have to query all these different elements along the path whereas in the case of SP B it's going to be easy I've already got om frames that are going to be available to me to move traffic or to diagnose exactly what's going on and so you can see here in terms of configuration complexity and troubleshooting there are a little bit of trade off to decide which is going to be which but I would assert that they're not so large that it rolls either one of the two out that it still pretty much just depends on which of these two approaches you decide to adopt inside of your enterprise and once you decide to adopt them just considering the different vendors that have different feature sets that they are using to enhance or change their environment you may want to have those conversations upfront with the vendors to find out for example if a particular vendor has a preference for one of these over the others you may find out why don't take it for granted have the conversation with them you made a decision to deploy this particular multipath multi-hop infrastructure why did you make that so the next piece of this is that now that we've got this multi hat multipath infrastructure we want to eliminate the need to have everything move up to layer three but as we did before and that's why we introduced this concept of flatten networking I described it briefly before by one emphasize again because it's so important because of the amount of increased east-west traffic we're going to see moving back and forth between these different virtual machines getting these applications in a now distributed virtual machine environment to behave as quickly and as effectively as that single machine that was running my application before means that my networks are going to have to be very very efficient I want to have very low latency and I don't want to have to go all the way up through a layer 3 switch to have my packet frames move to where they're supposed to be that's why a flat network is so important inside of one of these fabrics so they have the ability to distribute the virtual machines however I need to and I can still get very effective very efficient communication taking place between all these different nodes and so I would assert that if you're building out these fabrics the ability to support a flat network is absolutely essential and the good news that pretty much all of the fabric providers today the people who are proposing solutions for Ethernet fabrics all agree with this and we're all making sure that our infrastructure is free the net fabrics do support large flat networking spaces the next piece has to do with how we're going to take all these different flows and combine them onto a single link one of the real advantages of convergence is that now instead of having multiple adapters multiple cables multiple networks the idea is that I do want to be able to take all these different types of storage and excuse me all of these different types of traffic including storage and void and I scuzzy and Fibre Channel over Ethernet and mass FCIP all these different types of traffic and I want to be able to combine them but I want each of them to go onto the network as though it were the only tripe of traffic that were on that network so it's very important for all these different types of traffic to be able to communicate but to co-exist as well without interfering with each other so that means that I have to decide now what changes I'm going to make to the Ethernet network to support all these different types of traffic well the current type of Ethernet that we have today traditional Ethernet supports what's called a pause primitive by deploying a pause primitive I know that if I have a flow and I need to make sure that that flow is going to be lossless I can stop the flow I can stop the frames I can have the sending nodes pause until I get everything that I need and then once the congestion is cleared up I can start to flow again we know from best practices in tcp/ip you don't want to do that for IP traffic you do not want to do that for IP traffic because that's actually going to slow the network down and so what we need to do is we needed to figure out some way to be able to combine all of these different traffic types in a way where I can differentiate the needs of each of the different traffic flows on the same network at the same time well the answer to this is something called data center bridging or DCB you may remember a while ago there was something called Cee which has converged enhanced Ethernet you should think of converged enhance Ethernet is almost being a reference implementation data center bridging Cee did something wonderful ordinarily when a standard comes out there although the standard writers do a wonderful job of specifying exactly how a protocol should work but we're all only human and what that means is that sometimes inside of a specification there are going to be pieces which if not fully fully unambiguously described is going to lead to ambiguity and that means we're going to have non non interoperability well I'm sure that many people watching this presentation are going to be too young to remember but it turns out that when tcp/ip was initially rolled out it was completely incompatible so that meant that if you had brand a tcp/ip and someone else had brand B tcp/ip they absolutely would not communicate with each other now both of them were standard compliant both were standard compliant and yet even though both of them were you could go line by line and protocol frame by protocol frame and show that both of them were in compliance but yet they couldn't talk to each other because they had interpreted what the standard said differently the thing was nice about Cee was that by providing a reference implementation of lossless Ethernet now as data center bridging is coming out it became very for all of the vendors to come up with a data center bridging solution which not only was standards compliant but which also was interoperable with each other that's huge that's absolutely huge because now that means that I can go from lossy Ethernet to lost less Ethernet in a very standardized fashion but also in a way that guarantees me interoperability with all the other vendors very important and that's why if you look at the rollout time frame for lossless Ethernet data center bridging versus other protocols in the past you'll find out yce was so important well the reason I'm telling you all that is because here you can see what different components are data center bridging but even as we're here presenting this today not all of these are done that is to say that not all of these have been voted on now many of them are far enough down the line that we have enough confidence that we know how to deploy these pieces but there are some pieces that aren't quite cooked so let me talk about two of these protocols they're here on this page and we'll finish them up on this page and then we'll go on and we'll discuss the other two and a little bit more detail because they are a little bit more complicated the first thing that you have to be able to do inside of a lossless Ethernet environment and remember now we're talking about losses Ethernet not multi hot multipath that was trill and shortest path bridging when we talk about losses Ethernet all I'm talking about is the ability to have network information go from node a to node B its neighbor in a way that's going to guarantee me that I'm not going to lose anything and then nothing is going to be corrupted okay so the other two dealt with multi hop multipath this deals with the lossless behavior on a link and notice that by putting those two together multi-hop multipath and lossless I build out a lossless multipath multi-hop fabric but I do it based on standards instead so the first piece of this is that I have to be able to identify whether or not the node that's next to me knows anything about data center bridging because lossless Ethernet is different than lossy Ethernet as we start to go through these other protocols that you'll see in just a few seconds so that means the first thing I need to do is I need to ask my neighbor whether or not they can even do lossy or lossless even it and the way that you do that typically is there's a protocol called lldp link layer discovery protocol where I broadcast every once a while across each one of my ports here's what I am and here's what I could do that's what every Ethernet switch expects well as they started to deploy the loss of safety net standard they realized that they don't want to introduce a brand new data type or a brand new frame format that's going to confuse neighboring switches that may have been installed a decade ago that are still out functioning and so as they were looking at a way to deploy the DC the the exchange protocol between these two nodes they did it by taking advantage of lldp and they added in a new TLB type lengthen value field that basically says here is what I am and by the way I speak data center bridging well if I'm doing this on all my other ports and the other port is listening to my lldp transmissions as soon as it hears saying here's all the different things to do and by the way I can speak lossless Ethernet if the switch beside it can also speech lossless Ethernet it's going to send out a frame saying well here's all the things that I can do hey I speak lossless Ethernet – all of a sudden the rules change for that particular link between those two switches because now that instead of just broadcasting this information out they can start to talk hey you speak Lazos Ethernet high-speed glosses hey I do too man we we can now form lossless Ethernet links between us now this has a very subtle implication to it that ordinarily if you come from the world of layer 2 Bridge configuration you know that ordinarily before bridges will do anything you typically have to login you do config T you bring it up you start to change the configuration of the ports isn't it interesting that with the introduction of this particular exchange I am now giving Ethernet switches the ability to determine who their neighbors are and what protocol I'm going to be using to talk to my neighbors as a policy based on the exchange of information rather than it is a static configuration that's huge that absolutely is not the way we've always done it before but it very clearly illustrates the fact that if we're going to start building some of this elastic behavior that I spoke about earlier if we're going to do that then I have to start making the devices smarter and then one way to do that is to have the devices be able to start to figure out what their neighbors are well let's think about that concept for a sec if an Ethernet switch can talk to another Ethernet switch and they can determine that this is a lossless link and this node can talk to this other node and they determine that that's a lossless link what I actually see is a lossless Ethernet cloud forming I can actually start to see boundaries form as these lossless Ethernet switches start communicating with neighbors that say you know I have no idea what that tlvs I'm just a regular Ethernet lossy switch ok then this is where that edge of the lossless cloud ends it's absolutely interesting because now as I bring these switches up they will form lossless clouds very interesting absolutely not the way we've always done things before but it definitely shows this tendency that we're seeing now even as we move forward with configuring individual ports that says we're moving now towards automatic management intelligence in the network rather than static configurations well this exchange is called the datacenter bridge I exchange DC BX and this is the first standard that you can see on this slide and so that means that this is the discovery protocol that I'm using for figuring out whether or not my neighbor can speak lossless Ethernet well I only want to touch on the bottom protocol now we'll talk about the middle two in the next few slides but I want to talk about this quantized congestion notification for just a few minutes because this is one of the ones that's still kind of up in the air we know that they're supposed to be event a vote on this I should say in the not-too-distant future but for right now it's just kind of important to understand where things are it turns out that the reason we need to have this is because on a single Ethernet link I can determine whether or not my neighbor can send me more information depending on whether it's going to be lossy information or lossless information I can turn it on and turn it off but what happens if now three nodes ahead something starts to back up what I want to be able to do is I want to be able to go three nodes ahead and go all the way through the other nodes that are sending me traffic and I want to be able to tell them hey stop stop sending me stuff for a little while I got a little bit congestion here I have to deal with think of this sort of in the same way that at rush hour if you happen to live in a cosmopolitan area where you've got a lot of interstates you know when you come onto an on-ramp a lot of times there's no metering you can dry rot on the ramp and off you go but even sometimes depending on which direction you're going you'll see stoplights that are at the end of the ramp so that if it's they don't hold you for a lot of time what they basically say is come up just hang on for a few seconds and off you go and now the next car can go in the idea is that by throttling back traffic just a little bit they're going to avoid massive congested to congestion somewhere down the line that's exactly what this protocol does it figures out how to manage that into in congestion but again as I mentioned earlier it's not done there's a little bit of disagreement as to exactly how you should go about figuring out who should follow back how you're going to throttle back and and what the algorithms are going to be used in a standardized fashion get predictable behavior so I'd encourage you to continue to check back to the I Triple E site find out what's going on with this qcn with this congestion notification and we still desperately need a solution there it's just not done yet I hope to have an update in the not-too-distant future telling you that this has been solved that it's been done in a standardized fashion and we can guarantee in the end congestion notification in a reasonable fashion so earlier I mentioned that we want to be able to have convergence we want to be able to take all these different traffic types and I want to be able to put them together on exactly the same link how do I do that without having them interfere with each other the biggest differentiation between all these different traffic classes primarily has to do with whether or not the flow is going to be lossless or lossy now I have to put a disclaimer in here because sometimes I'll have Ethernet people who come up and they get a little a little upset about the fact that I may portray Ethernet networks as being lossy or for example someone will say to me well if I'm using the same transceivers and I'm using the same cables how come one is lossy and one is lossless it's important to understand what lossy versus lossless means in this case lossy has to do with how an individual node is going to deal with a particular frame that it's received suppose that I'm a regular Ethernet frame that's sitting out there and I receive a particular frame and my buffers are starting to get filled out I can't do anything with this frame I can't move it I've got more incoming frames maybe of a higher quality of service it's okay for an individual node to say you know I'm going to throw this frame away and the reason I'm going to throw that frame way is because I'm depending on a higher level protocol to retransmit that if it's necessary and so for example in the case of TCP if I'm expecting a TCP frame and I are a TCP stream and I happen to throw a frame away a little while ago I know that somewhere down the line a host is going to say hey I'm missing this chunk of my stream right here can you retransmit oh that's right let me read transmit and now hopefully that missing piece will come in again and if it doesn't come in again we're again are going to request the transmission and so the idea is that's going to be delivered in a best-effort fashion so lossy here is not a derogatory term about lengths it has nothing to do with ports it all has to do with an individual ethernet switches ability to decide to get rid of some of the frames sitting in their buffers if it decides it's in the best interest of the throughput of the network to get rid of that particular frame on the other hand lossless behavior which is what we see in the storage side of a storage area network or storage fabric loss that's behavior says once you get that frame it's yours you are responsible for that frame you can't get rid of it you can't throw it away until you are guaranteed that frame has been moved to your neighbor and your neighbor acknowledges hey I got the frame I'm going to be taking care of it you can let that buffer space go notice what a difference that is in paradigm because now I'm absolutely guaranteeing that under no circumstances that frame is going to be throw it away well if I've got convergence going on if I've got a lot of different traffic types that are coming together on exactly the same Network then I may want to stop to figure out well I don't want TCP IP to be lossless I want to be lossy the answer to end by the same token I dare not have lossy behavior go on for storage traffic because if I'm using this Ethernet fabric for storage traffic and anything goes wrong screens turn blue systems panic applications and platforms go down the answer this was something called priority based flow control priority based flow control says I'm going to be using three bits in a header for a loss of Ethernet frame and that is going to tell me which class of service this particular frame belongs to for each class of service I can tell the network this class of services either going to be lossless or this class of service is going to be lossy and so for example if you look at the diagram here you'll see that in this case class five has been designated as being lossless what that means is if I start to run out of buffer space somewhere down the line and I know I don't have room for the next frame that's going to be coming in I have to tell my upstream stop don't give me anymore frames I can't handle it and I'm not going to be able to accept any more frames coming in from that particular node for that particular channel until this congestion is cleared up and I get some more buffer space and I can accept some more data on the other hands for the other types of traffic that don't need to be lossless because they're either a UDP traffic or maybe they're depending on a higher level protocol like TCP if that's the case then they can be in a different class of service and you'll notice that in this case I don't have lossless behavior turned on and so what priority flow control does on a lossless link is it allows you to tell for each particular type of traffic what class of service that's going to be in understanding that that class of service is going to dictate among other things whether or not this is going to be a lossy or lossless stream of information allows things to run much more effectively and now my TCP IP traffic and my storage traffic can run on exactly the same length and if storage traffic gets blocked up for a little while no worries my IP traffic continue to flow back and forth so now I've got all these different converged traffic types on the same link how am I going to control who gets what bandwidth that's the job of enhanced transmission selection with enhanced transmission selection what I do is I take all the different classes that I talked about before and I'm going to put these different classes together in groups and then what I'm going to do is I'm going to do some traffic shaping to tell how much bandwidth this particular type of traffic is going to be guaranteed and so for example in this diagram here you can see that I've got three different classes of traffic the bottom class of traffic has three different groups that are assigned to it or I should say three different classes assigned to it they're guaranteed 30% now the important thing to notice here is if nothing else is going on on the network they can take more than 30% so let's say that there is no other traffic going back and forth and I need let's say 35% temporarily or 40% of the network I can take that the only thing that I can't do is if I already have more try I think that's going across at that particular bandwidth if I'm already occupying that bandwidth I can't crowd anybody out because this guarantees everybody that in this case three different groups the bottom group gets thirty percent guaranteed middle sixty percent guaranteed the top ten percent guaranteed and if they all transmit simultaneously this is what you get but it is opportunistic which is to say if one of these is not going in a particular time then the other different groups can come in sort of steal a little bit of their bandwidth on a temporary basis giving it back if more traffic starting to be generated so that means through priority flow control and enhanced transmission selection I allow you to have a number of different classes of service and I can guarantee you bandwidth for each of the different groups for these classes of service so that now I can take all these different traffic loads and I can put them on the same link this is also why we're seeing data center bridging that is to say law so see the net being deployed primarily on ten gig networks because if you were to think of this a little bit differently in many cases when I go in I see a hypervisor platform it's not unusual for me to see maybe eight or nine different adapters on the back of this thing for example I'm going to have management ports I will have a virtual machine movement ports I'll have backup ports I'll have storage ports I'll have client server traffic ports of all these different ports that are out there the idea is that if I can combine these types of traffic safely and I can guarantee you bandwidth by combining them then I can take all those one gig links and I can safely deploy them as a smaller number of ten gig links that means that my server form factor can be much smaller that means I have a drastic reduction in the number of infrastructure cabling that I need to deploy and the number of adapters go way down so those are the four elements of loss of C internet and again loss us Ethernet has to do with the lossless behavior between two particular nodes and as we combine those with other technologies like trill and shortest path bridging I now take lossy or lossless depending on the traffic path combining that with multi-hop multipath in a flat network that's what's going to give me an Ethernet fabric so it's actually the combination of all of these different technologies or in the case of trill and SP be picking one of those to accomplish multi-hop multipath these pieces put together are going to be what I use to build out even in fabric so in the previous sections we talked about what the transition is going to be through the data center to take us where we are to how we're going to move to an environment to support a cloud-like infrastructure and then we went through some of the technologies that we're going to be using to accomplish that next question is suppose they have already got a traditional Network and you decide to move to an Ethernet fabric how do you do it well of course the easiest way that was just rip everything out and throw it away and deploy brand new equipment and there you go but strangely enough most people don't want to do that it turns out that most people say gee we kind of need to keep the equipment that we have around for a little bit longer and so what they're looking for is to some way to sort of ease into Ethernet fabrics the idea being that as you may discussed earlier we talked about this idea of this logical chassis that surrounds an Ethernet fabric if I can really contain a fabric like that is there a reasonable way for me to introduce an Ethernet fabric into my environment and then depending on what I find out what my experiences with Ethernet fabrics either expanding it or perhaps restricting it to a particular application and so the idea here is that we don't want to just go in and rip everything out far from it we actually believe that you should preserve the investment in what you already have and I'm sure that you feel exactly the same way and so the idea is that we're going to migrate in stages we're going to come up with a very natural way for you to move from where you are today by again deploying the point fabric and then moving off and so rather than just sort of looking at an arbitrary situation there's some very well understood ways of doing that and let's explore some of those in this section the first use case that you can see here is that now what we're going to do is we're going to start looking at the top of rack environment now typically if we think of a top of rack environment we think of that as being let's say a single access layer switch but the idea here is that now let's suppose it's time if you deploy the next three racks as you deploy those next three racks if you use the traditional way of doing things that's going to mean three more top of rack switches that's going to be three more things that you need to manage well instead of doing that I would offer that a natural way to do it is to say for those three racks I'm going to deploy an Ethernet fabric consisting of either maybe two or three switches where those switches are just going to live in the top of that rack now the thing that's nice about that is once I move to an Ethernet fabric approach now instead of having three new switches I need to manage I only have one if I add a fourth rack to that particular Ethernet fabric no more switches to manage because I'm just going to add that new node to the ethernet fabric still looks like one switch and so what this does is it gives me a very easy way to deploy an Ethernet fabric so that I can get a feeling for how fabrics behave how they're configured and what they look like now the other thing that's nice about this as you can see in the diagram is that even though this Ethernet fabric may consist of two three or more switches more different devices you can see that up stream doesn't know that this is multiple switches it sees this entire fabric as being a single chassis and so what that means in this case is you notice that I've got lighting up to an aggregation layer let's suppose you're using some sort of either a V PC or VSS or multi chassis trunking or some other technology that's out there so that the aggregation layer I can take two separate chassés and make them appear as one look at how resilient we can build out your infrastructure because now the Ethernet fabric has multipath Multi hopping internally but to the outside world it's a single chassis I can now use those technologies I just mentioned to allow you to have multiple lags going up to separate chassis and now what I'm doing is I'm combining the best of both worlds where I'm taking traditional resiliency mechanisms like multi chassis trunking and I'm going to be combining that with new paradigms such as the Ethernet fabric and so now instead of having let's say five switches talking to two switches it's going to look like one logical switch talking to one logical switch very nice way of rolling out a scalable solution so that I can increase the size of the number of elements that are being deployed while not increasing the management complexity this is a very nice way for you to get into Ethernet fabrics and so again perhaps maybe for a one or two rack solution consider an Ethernet fabric top-of-rack solution with the comfort of knowing that I'm going to be able to take that solution and build it into whatever my infrastructure is today still taking advantage of any other technologies of our already deployed in that environment well suppose do you already have a particular technology that you like using at the top of the rack so for example here in use case – you can see that suppose you've already side that you like stacking you like the ability for stacking to combine the management of several switches into one by the way I'm going to take a little bit of a side trip here for just a few seconds and that is a lot of times people will say well isn't stacking an Ethernet fabric no it turns out that stacking actually shares some of the characteristics of Ethernet fabrics and that you do get that single management interface has very rigid topologies it has very rigid distance limitations its uses so often stacking uses proprietary interconnect technologies and so forth so although stacking does have some of the elements of Ethernet fabrics stacking by itself is not a fabric so that's why in this solution what we illustrate is that stacking can be combined with an Ethernet fabric so that now instead of getting let's say it's time for you to deploy some new aggregation layer ports instead of having to go out and buy a new chassis solution let's say we know that a chassis is going to be this tall in this wide which means I have to find space for something that's this tall this wide instead of that let's try a different approach let's go to my five different racks and all I'm going to do now is I'm going to take a one or two you unit and I'm going to deploy that and share all those different racks and I'm going to hook those new units together as an Ethernet fabric if that's the case what they look like from the bottom and what they look like from the top is going to be a large single aggregation layer switch so now instead of having a great big chassis that I have to deal with I've actually been able to distribute my form factor my HVAC my power my cooling I've been able to deploy that across a number of racks and yet it's still all going to be managed in exactly the same way I would have managed one of those chassis racks earlier so I get the best of both worlds so maybe if you decide that you're going to be deploying new aggregation layer switch maybe it makes sense for you to consider even have fabrics there where I can deploy that fabric and maybe I only start out with one of these standalone switches if that's going to be a sufficient number of ports and then just continue to add ports in exact the same way that I would get a chassis and then populate the chassis with blades as I need those particular ports but again in this case the big difference is I don't have to make the investment in that great big chassis housing I can get standalone units which are going to be distributed the form factor will be distributed across the number of racks I'll put them together as I need them this is another very attractive application for Ethernet fabrics and can go into the environment you currently have the third use case says that well we've looked at the case where I'm using an Ethernet fabric at the access layer where I'm using an Ethernet fabric of the aggregation layer why not combine those two because it turns out that because of the policies and because of the way that I've got distributed intelligence through an Ethernet fabric let's just do away with the aggregation layer in the access layer and let's just go with an edge let's take both of this infrastructure and so now I'm going to have a combination of Ethernet fabric switches working together to provide an edge solution that edge is then going to go up into the core they may have a question here you know that I can take Ethernet switches and I can put them together in different fashions the question is what does the network look like inside of one of those fabrics I'm going to give you two answers the first one you won't like the first one is I don't care and the reason I say I don't care is because no matter how you have that apology laid out it's still going to look like a single chassis to the outside world I gave you a first answer it wasn't very good let me give you a second answer the second answer is a much better answer and the answer is that the apology inside of that Ethernet fabric is going to match whatever the business needs were that mandated how you would otherwise roll out that fabric and so for example if you decide I don't want to have a large stack of switches I one distribute the fabric then the switches may be distributed out across the number of top-of-rack solutions they may be put together in a core edge design they may be a mesh they may be a hyper ring they might be a hypercube the answer there is that they're going to be whatever you need to accomplish your business goals now this is opportunity because as we now come into this area where we're redesigning networks based on these new business goals here's I'd like to offer to you is a great way to advance your career the way to advance your career is that people are going to want to know how to architect these fabrics if you happen to be a storage architect you already know that there are many different configurations that you can use for putting together these fabrics those same configurations will be just as valid in an Ethernet fabric space and so if you're a storage fabric expert there's huge opportunity for you here as people are trying to figure out what is the right deployment for my fabric if you're an Ethernet architect then I would argue that you need to know how fabrics work and how fabrics are put together and what the best practices are because now you're going to be drawn into situations where your customers are going to be asking you how should I deploy my next generation data center if you know fabrics and traditional network architecture then you're going to be able to say for this section traditional architecture here's why here's how it's going to be deployed for this new section of fabric here's how the fabric should be laid out here's how it's going to be deployed but the thing that is absolutely clear whether you come from the ethernet world or the storage area networking world is that there's now a huge demand for people who understand what these fabrics are how they're put together and how they're going to be deployed and managed and if you have that expertise it's going to be extremely attractive for you so I'd encourage you even if in your particular situation today you're not considering deploying an Ethernet fabric maybe because you've already got your plans for the next generation laid out I'd encourage you at least read up on it find out what's going on with you finet fabrics it's going to be extremely valuable to your career later on down the line is the concept of a fabric in the ethernet space gets to be more and more commonplace so use case for talks about the fact that we really need to start looking at these Ethernet fabrics as being part of more of a strategic infrastructure we already know that the majority of every businesses out there already has an existing fibre channel storage area network doesn't it make sense to try to take this Ethernet fabric and tie that back into that massive storage network on the other side in the answer that is absolutely and so in this case if you find out that you've got a lot of your intellectual property a lot of data a lot of assets already living out on the existing storage area networking infrastructure instead of moving it instead of porting it instead of trying to figure out how to pull it off of the host that's already connect to just hook up that storage air a networking infrastructure and allow that infrastructure to work for you to say if I need something off of the sand for example I need somethings hooked up on fibre channel use Fibre Channel over Ethernet to get to it if on the other hand I need something that's I scuzzy use the Ethernet fabric to connect to I scuzzy storage if you want to be Fibre Channel over Ethernet but you want to be connected to Fibre Channel over Ethernet storage use that in short what we're saying is that if you're deploying Ethernet fabric as being the core of your layer to network you don't have to make the decisions about what you're going to be hooking up to later on down the line the Ethernet fabric is going to give you the flexibility that you need so that when your business needs mandate that this particular application of this infrastructure has a particular type of service you're going to be able to use exactly the same infrastructure it's a wire once environment knowing that that infrastructure has been wired to take care of all these different needs so what we've done so far is we've talked about what the technologies are for Ethernet fabrics how we see Ethernet fabrics being rolled out over the next few years and how you can take these Ethernet fabrics and how they can integrate them into your environment now it's often good to have a touchstone or to have some way of evaluating exactly where you are along this path and so there is a very nice set of metrics that was rolled out awhile ago by Forrester and they call it the stages of virtualization maturity what this has to do is it's sort of a way of figuring out where you are with virtualization and how much you've been able to exploit virtualization in your particular enterprise and I'd like to show that as as Forrester finds out there's basically four different stages to this now I think what you're going to find out is as we start going through these stages you may think you're pretty far down the line on virtualization but the good news here is that probably you're not which means there's a lot more that you can get out of your existing virtualized environment as you become more comfortable with it and as you start exploiting that virtualization the first step here is acclamation acclamation basically says we're just going to take the world that says I'm going to be running on a single platform a single hardware solution and I'm now going to virtualize it I'm now going to take that that application I'm going to contain it and we're going to roll it out that's acclamation I think we're pretty good with that a lot of people understand what virtualization means the second piece though is consolidation consolidation is the practical application of what we did once we became acclimated to what virtualization can do and so for example this is where we say hey I've got these five virtual machines I can combine those virtual machines on a single platform now the important thing about this is notice that through consolidation I still haven't changed fundamentally what I'm doing all I'm doing is I'm replacing physical Hardware with logical Hardware that's all I've done so far and if I stop there the assertion is that I've lost a big piece of what virtualization can do for me let's go on to the third step to find out if we're there the third step is process improvement what can I do better now because I move to a virtualized environment now that's a big step that means that now the business owners the application owners are going to have to start thinking in terms of well if I can virtualize and I can spin up and I can tear down then suddenly what I can do is I can say if I've got a big sale coming up this weekend or I've got some sort of promotion that's going to be temporarily out on the market or if I'm coming up to the end of quarter end of year and I need more resources there's going to be something that I can do in terms of deploying brand new applications maybe trying them and if they don't work out getting rid of them and if they do work out I'm going to be expanding them I can suddenly start to exploit what I'm doing in this environment because now I understand what virtualization can do for me and I'm actually depending on that virtualization being there as being something that's going to differentiate me and then finally in the fourth step there's this idea of pulling an automation that is to say that now that I understand how I can get my improvements out of my business process I want to have automatic management and I want to have the automatic deployment and recovery of resources throughout the enterprise so that now in an almost automatic seamless fashion the business owners and the application owners are going to be going out making decisions and there's going to be enough intelligence there so that now rather than process improvement being a very deliberate thing that I do the pooling and automation aspects are just going to say that if I deploy in this particular fashion all that process improvement is just going to happen because I already put the infrastructure and the tools in place to allow me to do it so let's look at a couple situations here where you may be with virtualization in general and find out how far you are along this four-step process a lot of people are here today and this is where we basically say with an ethernet fabric I understand that I get this layer 2 connectivity I understand how I can hook together all these different edge nodes but really that's all I'm going to do with it all I'm doing here is I understand what virtualization is and so I'm just going to go through the consolidation and I stop there if you stop here then the assertion I would make is that there's a lot more you could do with this environment because all you've done is replace physical with logical you haven't really changed the way you're doing business you have to look for opportunities here to figure out what can I do now that I can virtualize this how can I take this to the next level how can I improve the way that I'm doing things from a user or a business bottom line viewpoint because I've got this virtualized environment and it's got to be more than just saying well I'm not spending as much on electricity and my HVAC costs are lower now those are all important things but all those come about as a result of consolidation they really don't have anything to do with how I'm making the process better because I'm deploying in this fashion in exactly the same way if you've got storage let's say that we have decided to go with Ethernet but now we're looking at different mechanisms for storage for example instead of automatically using Nassau right scuzzy or FCoE or fiber channel now I'm going to be using all of them and by deploying a particular server in one of these Ethernet fabrics I can pick whichever one I want well you can do that but you can't stop there you have to start looking at this environment trying to figure out but how do I make things better how do I make the process better by saying for example as an application starts to consume more storage bandwidth how can I automatically take it off with a slower form of storage and put it on a faster form of storage or for example is I have a particular application that's coming off of production how do i automatically migrate that down to let's say my tier 2 or tier 3 virtualized server environment and then how do i deploy this or how do i deploy or redeploy the storage that was associated with that particular application and how do I do that to improve the process so again if all you're doing is hooking things up you're certainly getting the first two steps but there's a lot more that you could be getting out the third step of this which now starts to get the process improvement is when we recognize that in this case with fiber channel hey there are assets that are already deployed out there if I can take those existing assets that existing information or data store and I can make that available to a new set of hosts and I can do that through this fabric that lives right here then that means that now I've improved the way that I can do business because I'm providing new information in a faster mechanism to a lot more services that would have otherwise gotten it so in this case I'm actually saying I want to make my environment better not just changing media but actually making the environment better because I've decided to adopt this fiber channel storage and then the last case here when we move beyond three and four is I really completely change the paradigm in this case this now says I can have a virtual machine anywhere I needed to go and when the business comes along and says I need more virtual machines the infrastructure is going to improve that process by automatically going out perhaps there's something that's updated either by the day or by the hour and says where is the best place for me to migrate these virtual machines so that I get the greatest efficiency out of it in one case I happen to hear of a customer who's examining where there were different service providers and during different times of the day electricity costs a different amount so for example during the peak days where they need a lot of electricity for air conditioning and so forth the kilowatt of electricity is more expensive on the other hand at night when it's cooler when there's not as much demand for it it doesn't cost as much how can I automatically build that into an extended private cloud so that now my virtual machines are my business functions are going to the place that's a taha back on the East Coast they just went to cheap power rates move some of my application out there I get my application up and running the user doesn't recognize what's happened I've got process improvement I'm increasing my bottom line but because of the automation and the pulling the user doesn't have to care about it the user just sees the result of having deployed intelligent policies that recognize these are your these are the pragma that you're going to be using this is how you're going to deploying these networks situation changes adjust to it automatically and do it in a way that the user isn't going to be aware of so when we get to this extended private cloud environment if we're really using that type of pulling automation to take advantage of it we're going to say business has changed everything is going to change as I can now move whatever I need to wherever I need it and I'm going to get whatever service I want and I'll be able to continue to think in terms of dollars and cents rather than all my network doesn't go there or energy I'm in the wrong subnet or something like that so what have we been discussing here over the last few minutes we know we know I think is beyond we think at this point we now know that this whole adoption of a cloud mentality is going to change everything because now there's a new generation of users who are coming along who are not going to accept well it's going to take me two or three days to adjust my networker to build out that configuration or worse you can't deploy that way because that's not the way that we've deployed our networks the new generation of users is not going to accept that and so what that means is we have to start planning now especially as we're rolling out new data centers new ways of providing infrastructures so that now as they start to adopt the cloud infrastructure as being the typical way of doing business that um you are going to be able to provide something that's going to accommodate these new these new changing needs and not only are providing the ability to do this change but it turns out at the same time you may be actually increasing the efficiency of your organization where now you can actually build out the apologies and deploy networks and contract and expand in a way that makes sense that reflects the current business environment rather than saying well this is the way we've been doing it for the last thirty years we really need to change the way we're doing things now we've talked about an awful lot of things and during this session and there's probably a lot of things that you'd like to talk about too I'd like to encourage you at this point to go out and find more out about Ethernet fabrics but more especially I'd like to make sure that you start getting in touch with your peers who are probably going through exactly the same thought process that you are right now they're probably trying to decide something about Ethernet fabric so they've heard something that they're not too sure of and they just want to talk it over or maybe you have something you'd like to talk over something you can contribute or perhaps should even like to engaged in the debate to decide which of the multi hot multipath protocols we talked about is the better protocol the important thing here is to get connected and I'd like to offer to you these two different forms for doing something like that thank you so much for spending the time to go through this introduction to Ethernet fabrics we're not done later on down the line we're going to be coming out with Ethernet fabrics 201 which is where we're going to drill down to yet another level and we're going to expose an inventer neutral fashion what Ethernet fabrics are with a little bit more technical detail I hope you can join us for that thank you

3 Comments

  1. Garegin

    the real question is why you need SAN is the first place. as far as I see it, you need better file sharing. that's means filesharing protocols that expose the native filesystem better (NFS still blows baby chunks, when compared to raw ZFS).

    Reply

Leave a Comment

Your email address will not be published. Required fields are marked *