文件名称:
HotSDN-paper-2014-ONOS-Towards-an-Open-Distributed-SDN-OS.pdf
开发工具:
文件大小: 2mb
下载次数: 0
上传时间: 2019-06-29
详细说明:We present our experiences to date building ONOS (Open Network Operating System), an experimental distributed SDN control platform motivated by the performance, scalability, and availability requirements of large operator networks. We describe and evaluate two ONOS prototypes. The first version implemented core features: a distributed, butlogicallycentralized,globalnetworkview;scale-out;and fault tolerance. The second version focused on improving performance. Basedonexperiencewiththeseprototypes,we identify additional steps that will be required for ONOS to support use cases such as core network traffic engineering and scheduling, and to become a usable open source, distributed network OS platform that the SDN community can buildupon.Application
3. PROTOTYPE 2 IMPROVING
Network view aPl
PERFORMANCE
Network view
Our next prototype focused on improving the performance
ONOS Graph Abstraction
of ONOS. This resulted in changes to our network view ar-
chitecture and the addition of an event notification frame
N
Distributed Key-Value Store
work, as shown in Figure 3
RAMCloud)
One of the biggest performance bottlenecks of the first pro
OF ManagerOF Manager
OF Manager
totype was remote data operations. In the second prototype,
(Floodlight
Floodlight
we addressed this issue through two different approaches
te operations as fast as possible, and(2)
乡占当
ducing the number of remote operations that ONOS has to
erform
RAMCloud Data Store. We began the effort focusing on
Figure 3: Prototype 2 Architecture
improving the speed of our remote database operations. To
better understand how data was being stored, we replaced
the first prototype's graph database stack (Titan and Cassan
instrumentation to Titan and Cassandra to record the dra) with a blueprints graph implementation [9] on top of
sequence and timing of operations
RAMCloud [10]. This allowed us to use our existing code,
Data Model Issues. To implement the network view on
which used the Blueprints API for data storage, with Ram
top of Titan, we had modeled all data objects(including ports, Cloud. RAMCloud is a low latency, distributed key-value
flow entries etc. as vertices. This required indexing ver
store, which offers remote read /write latency in the 15-30 us
tices by type to enable queries such as enumerating all the range. A simpler software stack combined with extensive in-
switches in the network. Index maintenance became a bot
strumentation provided many insights into the bottlenecks
tleneck when concurrently adding a large number of objects,
introduced by the network view 's data model
ncluding common cases such as creating switches and links
Optimized Data Model. To address the inefficiency of the
during initial topology discovery or installing a large num- generic graph data model, we designed a new data model
ber of flows. Storing and maintaining references between
optimized for our specific use case. a table for each type of
many small objects also meant that dozens of graph database network object(switch, link, flow entry, etc. )was introduced
update operations could be required for relatively simple op- to reduce unnecessary contention between independent up
erations such as adding or removing a switch, or clearing a dates. The data structures have been further optimized to
flow table. Additionally, users of the network view were ex
minimize the number of references between elements, and
posed to a data representation that was overly complicated we no longer maintain the integrity of the data at the data
and didn't match the mental model of network devices con-
store level. This results in far fewer read/write operations
nected together by network links
for each update. Indeed, most element types can be written
Excessive Data Store Operations. The mapping from Ti
in a single operation because we don' t have to update multi
tan's graph data model to Cassandras key-value data model ple objects to maintain references as generic property graphs
resulted in an excessive number of data store operations, some d
of which were remote. Simple onos operations such as
Topology Cache. Topology information is updated infre-
adding a switch or flow table entry were slowed down due to quently but is read -heavy, so we can optimize for read perfor
several network round trips to read or write multiple objects mance by keeping it in memory on each instance. In our sec
into the data store. In addition, storing vertices, edges, and ond prototype, we implemented a caching layer to reduce the
metadata in a shared table and index introduced unnecessary number of remote database reads for commonly read data
contention between seemingly independent operations
Furthermore, the in-menory topology view implements a set
Polling. In the initial prototype, we did not have time of indices on top of the data to allow faster lookups. These in
to implement notifications and messaging across ONos in-
dices are not stored in the data store, but can be reconstructed
stances. ONOS modules and applications had to poll the froin the data store at any time. This process requires reading
database periodically to detect changes in network state. This an entire snapshot of the topology data from the data store,
had the pernicious effect of increasing the CPU load due to
however, it only occurs when a new ONOS node joins the
unnecessary polling while simultaneously increasing the de
cluster, so it is not time-critica
lay for reacting to events and communicating information
The in-memory topology data is maintained using a
across instances
notification-based replication scheme. The replication scheme
Lessons Learned. Our evaluation of our first oNos pro
is eventually consistent- updates can be received at differ
totype indicated that we needed to design a more efficient ent times in different orders on different instances. How
data model, reduce the number of expensive data store op- ever, to ensure that the systen maintains the integrity of the
erations, and provide fast notifications and messaging across topology schena observed by the applications, updates to
nodes. Additionally, the aPi needed to be simplified to rep
the topology are applied atomically using schena integrity
resent the network view abstraction more clearly without ex-
constraints on each instance. This means an application can
posing unnecessary implementation details
not read the topology state in the middle of an update or ob
serve, for example, a link that does not have a port on each
Event Notifications. We addressed the polling issue by
building an inter-instance publish-subscribe event notifica
tion and communication system based on Hazelcast [11].We
Switch
and 10 Gb/s Infiniband hardware on a RaMCloud cluster
with these combined optimizations, adding a switch took
0.099ms
We also evaluated the latency to add a link, and the results
lost
were similar. With the new data model and serialization opti
mization, we reduced the latency from 0.722 ms(generic data
o△
model) to 0. 150 ms. Using Infiniband with kernel bypass, it
Flow path
is further reduced to 0.075 ms
In our current design, network state is written to RAM-
Flow Entry
Cloud sequentially, so the throughput is simply the inverse
of the later
Figure 4: Network View: Connectivity requests cause flow
paths to be created using flow entries
Table 1: Latency for Adding a Switch
(Ser.= Serialization, Des=Deserialization Unit: ms)
created several event channels among all onos instances
Read Write Ser. Des. Other Total
1. Generic Graph
based on notification type, such as topology change, flow in
Data Model
10135720930.5622.2
stallation and packet-out
New data model
0.280.89
0.0171.19
Network View API. We took the opportunity to replace
3.(2)+Proto. But
0.0060.244
the generic Blueprints graph API with an API designed specif
4.(3)+Infiniband
008001
0.
0.099
ically for network applications. Figure 4 shows the applica
tion's view of the network. Our aPi consisted of three main
areas
3.1.2 Reaction to network events
A Topology abstraction representing the underlying data
The second performance metric is end-to-end latency of
plane network, including an intuitive graph represen- the system for updating network state in response to events
tation for graph calculations
examples include rerouting traffic in response to link failure,
Events occurring in the network or the system that an moving traffic in response to congestion, and supporting VM
application may wish to act on
migration or host mobility. This metric is relevant because it
A Path Installation system to enable applications to set is most directly related to SLAs guaranteed by network op
up flows in the network
erators
For this experiment, we connected a 6-node ONoS cli
3.1 Evaluation
er to an emulated Mininet [15] network of 206 software[12
Our second prototype has brought us much closer to meet
switches and 416 links We added 16,000 flows across the net
ing our performance goals, in particular our internal sys
work, and then disabled one of the switch's interfaces, caus-
tem latency requirements. In the following sections, we will ing 1,000 flows to be rerouted. All affected flows had 6-hop
show the evaluated performance of three categories: (1)ba- path before and a 7-hop path after rerouting
sic network state changes, (2)reaction to network events, and
Table 2 shows the median and 99th percentile of the laten-
(3)path installation
cies of rerouting experiment. The latency is presented as two
3.1.1 Basic Network State Changes
values: (1) the time from the point where the network event
is detected by oNOS(via an Open Flow port status message
The first performance metrics are latency and throughput in this case) to the point where ONOS sends the first Flow-
for network state changes in the network view. As modifying Mod(OFPT_FLOW_MOD)Open Flow message to reprogram the
the network state in a network view is a basic building block
network, and(2) the total time taken by ONos to send all
ofonoS operations its performance has a significant impact
Flow Mods to reprogram the network, including(1). The la
on the overall system's performance
tency to the first Flow Mod is more representative of the sys-
arranged in a typical WAn topology, to a 3-node ONOS clus. tem's performance while the tot
on traffic in the data plane. Note that we do not consider the
ter and measured the latency of adding switches and links to
effects of propagation delay or switch dataplane program-
the network view. The switches were Open vSwitches [12]
ming delay in our total latency measurement, as these can
and had an average of 4 active ports
be highly variable depending on topology, controller place-
Table 1 shows the latency for adding a switch, and its break- ment, and hardware
down With RAMCloud using the generic graph data model
of our first prototype, it requires 10 read and 8 write RAM
Table 2: Latency for Rerouting 1000 Flows
Cloud operations to add a single switch, which takes 22.2
Latenc
Median 99th %ile
ms on our oNoS cluster connected with 10 Gb/s ethernet
With the new data model, adding a switch and a port re
Latency to the 1st Flow Mod 45. 2 ms 75.8 ms
Total latency
71.2m
116ms
is significantly reduced to 1. 19 ms. To improve serialization
ime, we switched serialization frameworks from Kryo [13]
(which is schema-less)to Google Protocol Buffers [ 14](which
3.1.3 Path Installation
uses a fixed schema); this change reduced the operation's la
The third performance metric measures how frequently
tency to 0. 244 ms. We also explored how performance is im
the system can process application requests to update net
proved using optimized network I/o(e.g, kernel bypass)
work state, and how quickly they are reflected to the physical
Noc at Indiana
ON. LAB ONOSRAMCoud on Oper Vinex 26B Flowe 206 Active Switches
ode onos cluster
ONOS Instances o me-aostOO ans-ancey O ons-onoesdo cns-onces Oons-anos6ooo
Chicago
CoRNY
Los Angeles
D.C
O Open vSwitch(Software Open Flaw Switches)
e
Hardware Open Flow Switch(NEC PF5820
nc-onmmmm-naanmnenmmofln
Figure 5: Internet2 Demo Topology and Configuration
Figure 6: The oNos gui shows the correctly discovered
topology of 205 switches and 414 links
network. Application requests include establishing connec
in the noc at Indiana University figure 6 shows the ong
tivity between hosts, setting up tunnels, updating traffic pol
GUI and correctly discovered topology. Note that the lir
icy, or scheduling resources. Specifically, we measured the between Los Angeles and Chicago and between Chicago and
latency and throughput for path installation requests from
Washington D. C are virtual links added by Open virteX
an application
We started with the same network topology used in Sec
4。 RELATED WORK
tion 3.1.2. With 15,000 pre-installed static flows, we added
The development of onos continues to be influenced and
1,000 6-hop flows and measured the throughput and latency
informed by earlier research work [18, 19l, and by many ex-
to add the new flows
Table 3 shows the latency performance. The latency is com
isting SDN controller platforms. Unlike other efforts, ONOS
has focused primarily on use cases outside the data center
puted in the same way as in Section 3. 1. 2 except an applica
such as service provider networks
tion event (i.e. path setup request), rather than a network
Onix [2] was the first distributed sDN controller to imple
event, starts the timer. Throughput is inversely related to la
ment a global network view, and it influenced the original
tency due to serialization of processing in this prototype. For
development of oNOS Onix originally targeted network vir
example, the median of the throughput was 18, 832 paths/sec
tualization in data centers but it has remained closed source
(derived from the median of total latency, 53. 1 ms)
and further development has not been described in subse
quent publications
Table 3: Path Installation Latency
Floodlight [3] was the first open source Sdn controller to
Median 99th %ile
gain traction in research and industry. It is evolving to sup
Latency to the 1st Flow Mod 34.1 ms 68.2 ms
port high availability in the manner of its proprietary sibling,
Total Latency
531ms979ms
Big Switch's Big Network Controller [20] via a hot standby
does not support a distributed architecture for
With Prototype 2, we approach our target latency for sys
scale-out performance
tem response to network events(10-100 ms as stated in Sec
OpenDaylight [21]is an open source SDN controller project,
tion 1), but we still do not meet our target of the path setup
backed by a large consortium of networking companies, that
throughput (1M path/ sec). The current design does not fully
implements a number of vendor-driven features. Similarly
utilize parallelism(e. g, all the path computation is done by
to ONOS, OpenDaylight runs on a cluster of servers for high
a single ONOS instance), and we think we can increase the
availability, uses a distributed data store, and employs leader
throughput as we distribute the path computation load
election At this time the opendavlight clustering architec-
among multiple onos instances
ture is evolving, and we do not have sufficient information
to provide a more detailed comparison
3.2 Demonstration on Internet2
At the Open Networking Summit in March 2014, we de
5. DISCUSSION: TOWARDS AN OPEN
loyed our second prototype on the Internet2 [16] network,
DISTRIBUTED NETWORK OS
demonstrating (1)oNOs network view, scale-out, and fault
In this paper, we have described some of our experiences
olerance,(2)operation on a real WaN,(3) using virtual
ized hardware and software switches, and (4) faster ONOS
and lessons learned while building the first two prototype
versions of oNOS, a distributed sdn control platform which
and link failover. Figure 5 illustrates the system configura
we hope to develop into a more complete Network OS that
tion: a geographically distributed backbone network of five
hardware Open Flow switches, each connected to an emu
meets the performance and reliability requirements of large
lated access network of softw are switches. We used Open
production networks, while preserving the convenience of a
lobal network view
VirteX [17 to create a virtual network of 205 switches and
414 links on this physical infrastructure, and this virtual net-
We are currently working with a small set of partner orga
work was controlled by a single 8-node onos cluster located
nizations, including carriers and vendors, to create the next
version of ONos. We intend to prototype several use cases
that will help drive improvements to the systems APIs, ab
6. ACKNOWLEDGMENTS
stractions, resource isolation, and scheduling Additionally
We would like to thank the following people for their con
we will need to continue work on meeting performance re
tributions to the design and implementation of ONOS: Ali
quirements and developing a usable open source release of
Al-Shabibi, Nick Karanatsios, Unesh Krishnaswamy, Ping
the system
ping Lin, Nick McKeown, Yoshitomo Muroi, Larry Peterson,
Use Cases. The promise of any OS platform, network or
cott shenker, naoki shiota, and Terutaka Uchida
otherwise, is to enable applications. To date, we have imple
Thanks also to John Ousterhout and Jonathan Ellithorpe
mented a limited set of applications on ONOS: simple proac
for their assistance with RAMCloud, and to our anonymous
tive route maintenance, and BGP interfacing (SDN-IP[22 ). reviewers for their comments
Moving forward, we plan on exploring three broad use cases
traffic engineering and scheduling of packet optical core net- 7. REFERENCES
works, SDN control of next generation service provider cen
tral offices and points of presence(PoPs) comprising network,
[1] Open Networking Foundation. Open Flow specification
https://www.opennetworking.org/sdn-resources/
compute, storage, and customer management functionalities;
onf-specifications/openflow/
and remote network management, including virtualization
[2] T Koponen, M. Casado, N. Gude, ]. Stribling, et al. Onix:A
of customer networks
distributed control platform for large-scale production
Abstractions. We expect ONOS to allow applications to
networks. In OSDi 10. volume 10. USENIX. 2010
examine the global network view and create flow paths that
33iFloodlightProject.http://www.projectfloodlight.org/
specify full or partial routes along with traffic that should
[4] Titan Distributed graph database
flow over that route and other actions that should be taken
http://thinkaurelius.github.io/titan/
or use a global match-action (or match-instruction) abstrac
5 A. Lakshman and P Malik. Cassandra: a decentralized
tion which provides the full power of Open Flow to enable
tructured storage system. ACM SIGOPS Operating Systems
Review,44(2),2010
an application to program any switch from a single vantage
[6]Tinkerpop.Blueprints.http://blueprints.tinkerpop.com/
point
[7 P Hunt, M. Konar, F. P Junqueira, and B. Reed. Zookeeper
For applications that do not depend on specific paths
Wait-free coordination for internet-scale systems. In USENIX
through the network, ONOS provides a simple connectivity
10 Annual Technical Conference, 2010
abstraction In this case, ONOS modules may handle the me
[8 Open Networking Summit
chanics of installing a path and maintaining it as the network
http://www.opennetsummit.org/
topology, host location, or usage changes. Our experience
[9] J Ellithorpe. TinkerPop Blueprints implementation for
building and deploying our SDN-IP peering application [221
RamclOud.https
com/ellitron/blueprints-ramcloud-graph/
oug
implement the majority of the required behavior.
[10 J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis
J. Leverich, D. Mazieres, et al. The case for RAMClouds
Isolation and Security. We hope to improve the isolation
Scalable high-performance storage entirely in DRAM. SIGOPS
and security of ONOS applications. We would like to detect
Operating Systems Review, 43(4), Jan 2010
and resolve conflicting policies, routes, and flow entries to
[11]HazelcastProjecthttp://www.hazElcast.org.
allow applications to coexist without undesired interference
[12] B Pfaff, J. Pettit, K. Amidon, M. Casado, TKoponen, and
Additionally, we would like to have a mechanism to man-
S. Shenker. Extending networking into the virtualization layer
age what applications can see what they are permitted to
In hotNets 09. ACM, 2009
do, and what resources they can use. An improved module
[13] Esoteric Software. Kryo
https://github.com/esotericsoFtware/kryo/.
amework may help to enforce isolation while supporting
dynamic module loading and reloading for on-line reconfig
[14] Google Protocol Buffers
https://developers.googlecom/protocol-buffers/.
uration and software upgrades
[15] B Lantz, B Heller, and N McKeown. A network in a lapto
Performance. We are close to achieving low-latency end
Rapid prototyping for software-defined networks. In Hotnets
o-end event processing in ONOS but have not yet met our
10.ACM,2010
throughput goals. We hope to improve the system's through
[16Internet2.http://www.internet2.edu/
put by exploring new ways to parallelize and distribute large
[17 A. Al-Shabibi, M. De Leenheer M. Gerola, A. Koshibe
workloads
W. Snow, and G. Parulkar. Open VirteX: A network hypervisor
Open Source Release. We are currently working with our
In ONs 14, Santa Clara, CA, 2014. USENIX
partners to improve the system's reliability and robustness,
[18] S. Schmid and J. Suomela. Exploiting Locality in Distributed
SDN Control. In HotSDn 13 ACm, 2013
to implement missing features(e. g. Open Flow 1.3 [1)re
[19] A Dixit, F Hao, S Mukherjee, ' T.V. Iashman, and
quired for experimental deployments, and to prepare a us
R. Kompella. Towards an Elastic Distributed SDN Controller
able code base, a development environment, and documen
In hotSdn 13 acm. 2013
ation. We are also working on infrastructure and processes [20] Big Switch Networks. Big Network Controller
that will help to support a community of ONOS
core an
http://www.bigswitch.com/products/sdn-controller/.
application developers. Our goal, by the end of 2014, is the
[21]OpenDaylightProjecthttp://www.opendaylight.org/
open source release of a usable ONOS system that the SDN
[22]P Lin, J. Hart, U. Krishnaswamy, T. Murakami, M. Kobayash
community will be able to examine, use, and build upon.
A. Al-Shabibi, K.C. Wang, and ]. Bi Seamless interworking of
sdN and ip. In SigcoMm 13. ACm, 2013
(系统自动生成,下载前可以参看下载内容)
下载文件列表
相关说明
- 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
- 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度。
- 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
- 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
- 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
- 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.