ZTE Communications ›› 2018, Vol. 16 ›› Issue (2): 42-54.DOI: 10.3969/j.issn.1673-5188.2018.02.008
收稿日期:
2017-03-22
出版日期:
2018-06-25
发布日期:
2019-12-12
LI Dan1, LIN Du1, JIANG Changlin1, Wang Lingqiang2
Received:
2017-03-22
Online:
2018-06-25
Published:
2019-12-12
About author:
LI Dan (tolidan@tsinghua.edu.cn) received the M.E. degree and Ph.D. from Tsinghua University, China in 2005 and 2007 respectively, both in computer science. Before that, he spent four undergraduate years in Beijing Normal University, China and got a B.S. degree in 2003, also in computer science. He joined Microsoft Research Asia in Jan. 2008, where he worked as an associate researcher in Wireless and Networking Group until Feb. 2010. He joined the faculty of Tsinghua University in Mar. 2010, where he is now an associate professor at Computer Science Department. His research interests include Internet architecture and protocol design, data center network, and software defined networking.|LIN Du (lindu1992@foxmail.com) received the B.S. degree from Tsinghua University, China in 2015. Now, he is a master candidate at the Department of Computer Science and Technology, Tsinghua University. His research interests include Internet architecture, data center network, and high-performance network system.|JIANG Changlin (jiangchanglin@csnet1.cs.tsinghua.edu.cn) received the B.S. and M.S. degrees from the Institute of Communication Engineering, PLA University of Science and Technology, China in 2001 and 2004 respectively. Now, he is a Ph.D. candidate at the Department of Computer Science and Technology, Tsinghua University, China. His research interests include Internet architecture, data center network, and network routing.|WANG Lingqiang (wang.lingqiang@zte.com.cn) received the B.S. degree from Department of Industrial Automation, Zhengzhou University, China in 1999. He is a system architect of ZTE Corporation. He focuses on technical planning and pre-research work in IP direction. His research interests include smart pipes, next generation broadband technology, and programmable networks.
Supported by:
. [J]. ZTE Communications, 2018, 16(2): 42-54.
LI Dan, LIN Du, JIANG Changlin, Wang Lingqiang. SOPA: Source Routing Based Packet-Level Multi-Path Routing in Data Center Networks[J]. ZTE Communications, 2018, 16(2): 42-54.
Figure 2. An example to illustrate that random packet splitting may cause packet reordering and FR. (Each box denotes a packet, and the number represents the sequence number of the packet. Although random packet splitting allocates 4 packets to each path during the whole period, the instant loads of the paths are different, leading to difference in the queuing delays of the paths. The arrival order of the first 7 packets can be: 1, 5, 6, 7, 8, 2, and 3, which will result in a FR and degrade the throughput of the flow.)
Figure 3. Arrival sequence of the first 100 packets with the oversubscription ratio of 4:1. (The random packet splitting causes many reordered packets.)
Figure 4. Effect of increasing FR threshold. (As the threshold increases, the throughput improves as well. However, when the FR threshold is larger than 10, the improvement of performance is quite marginal.)
Figure 5. Packets allocation in random packet splitting. (This figure shows how the first 2000 packets are allocated to 4 equal-cost paths. Each group of square columns represents the allocation of 500 packets. Even though almost the same traffic is allocated to each path during the whole transmission period, the instant allocations to the paths are different. The maximum deviation from the average allocation is 13.6%.)
Figure 6. Performance comparison between random packet splitting and SOPA. (The number in the parentheses denotes the FR threshold. Both random packet splitting and SOPA improve the performance as the threshold increases, and SOPA outperforms random packet splitting in all settings.)
Figure 7. An example to showcase the negative effect brought by failure upon packet-level multi-path routing. (There are two flows, flow 0→4 and flow 8→5. If the link between E1 to A1 fails, the flow (0→4) can only take the two remaining paths, while the other flow (8→5) can still use the four candidate paths, which may cause load imbalance across multiple paths of flow 8→5, degrading its performance.)
Figure 8. End to end delay of the packets from flow 8→5. (The failure causes flow 0→4 only can take two remaining paths, which are overlapped with two candidate paths of flow 8→5. The figure shows the packets on the overlapped paths experience much longer delay than the packets allocated to the non-overlapped paths.)
Figure 9. Throughput of 4 flows. (SOPA allocates traffic evenly, and each flow grabs fair share of bandwidth, and the throughput of each flow is about 475 Mbit/s. However, RPS fails to achieve balanced traffic allocation, the average throughput of these four flows is only 378.30 Mbit/s.)
Figure 11. CDF of the flows’ throughputs for the five multi-path routing schemes under permutation workload. (Both SOPA and DRB outperform the other three routing schemes, and SOPA also achieves more balanced traffic splitting than DRB.)
Figure 12. The performance comparison between SOPA and RPS under production workload when failures occur. (In order to show the effect of failure, the performance without failure is also plotted. “NF” means no failure, while “F” denotes that failure has occurred.)
[1] | S. Ghemawat, H. Gobioff, S.-T. Leung , “The google file system,” in Proc. 19th ACM Symposium on Operating Systems Principles, New York, USA, 2003, pp. 29-43. |
[2] | J. Dean and S. Ghemawat , “MapReduce: simplified data processing on large clusters,” in Proc. 6th Symposium on Operating Systems Design and Implementation, Berkeley, USA, 2004, pp. 137-149. |
[3] | M. Isard, M. Budiu, Y. Yu, A. Birrell, D. Fetterly , “Dryad: distributed data-parallel programs from sequential building blocks,” in Proc. 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, New York, USA, 2007, pp. 59-72. doi: 10.1145/1272998.1273005. |
[4] | M. Al-Fares, A. Loukissas, A. Vahdat , “A scalable, commodity data center network architecture,” in Proc. ACM SIGCOMM 2008 Conference on Data Communication, Seattle, USA, 2008, pp. 63-74. doi: 10.1145/1402958. 1402967. |
[5] | A. Greenberg, J. R. Hamilton, N. Jain , et al., “VL2: a scalable and flexible data center network,” in Proc. ACM SIGCOMM 2009 Conference on Data Communication, Barcelona, Spain, 2009, pp. 51-62. doi: 10.1145/1592568. 1592576. |
[6] | C. Guo, G. Lu, D. Li , et al., “BCube: a high performance, server-centric network architecture for modular data centers,” in Proc. ACM SIGCOMM, Barcelona, Spain, 2009, pp. 63-74. |
[7] |
D. Li, C. Guo, H. Wu , et al., “Scalable and cost-effective interconnection of data-center servers using dual server ports,” IEEE/ACM Transactions on Networking, vol. 19, no. 1, pp. 102-114, Feb. 2011. doi: 10.1109/TNET.2010.2053718.
DOI URL |
[8] | M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, A. Vahdat , “Hedera: Dynamic flow scheduling for data center networks,” in Proc. 7th USENIX Symposium on Networked Systems Design and Implementation, San Jose, USA, 2010, pp. 1-15. |
[9] | A. Dixit, P. Prakash, Y. Hu, R. Kompella , “On the impact of packet spraying in data center networks,” in Proc. IEEE INFOCOM, Turin, Italy, 2013, pp. 2130-2138. doi: 10.1109/INFCOM.2013.6567015. |
[10] | J. Cao, R. Xia, P. Yang , et al., “Per-packet load-balanced, low-latency routing for clos-based data center networks,” in Proc. Ninth ACM Conference on Emerging Networking Experiments and Technologies, Santa Barbara, USA, 2013, pp. 49-60. doi: 10.1145/2535372.2535375. |
[11] | IETF. (2013, Mar. 2). IP encapsulation within IP [Online]. Available: https://datatracker.ietf.org/doc/rfc2003 |
[12] | C. Guo, G. Lu, H. J. Wang , et al., “Secondnet: a data center network virtualization architecture with bandwidth guarantees,” in Proc. 6th International Conference on Emerging Networking Experiments and Technologies, Philadelphia, USA, 2010. doi: 10.1145/1921168.1921188. |
[13] | ONF. (2017, Apr. 1). Open networking foundation [Online]. Available: https://www.opennetworking.org |
[14] | A. Curtis, W. Kim, P. Yalagandula , “Mahout: Low-overhead datacenter traffic management using end-host-based elephant detection,” in Proc. IEEE INFOCOM, Shanghai, China, 2011, pp. 1629-1637. doi: 10.1109/INFCOM. 2011.5934956. |
[15] | C. Raiciu, S. Barre, C. Pluntke , et al., “Improving datacenter performance and robustness with multipath TCP,” in Proc. ACM SIGCOMM, Toronto, Canada, 2011, pp. 266-277. doi: 10.1145/2018436.2018467. |
[16] | C. Raiciu, C. Paasch, S. Barr , et al., “How hard can it be? Designing and implementing a deployable multipath TCP,” in USENIX Symposium of Networked Systems Design and Implementation, San Jose, USA, 2012, pp. 29-29. |
[17] | D. Wischik, C. Raiciu, A. Greenhalgh, M. Handley , “Design, implementation and evaluation of congestion control for multipath TCP,” in Proc. 8th USENIX Conference on Networked Systems Design and Implementation, Boston, USA, 2011, pp. 99-112. |
[18] | M. Alizadeh, A. Greenberg, D. A. Maltz , et al., “Data center TCP (DCTCP),” in Proc. ACM SIGCOMM, New York, USA, 2010, pp. 63-74. doi: 10.1145/1851182.1851192. |
[19] | R. Niranjan Mysore, A. Pamboris, N. Farrington , et al., “PortLand: a scalable fault-tolerant layer 2 data center network fabric,” in Proc. ACM SIGCOMM, Barcelona, Spain, 2009, pp. 39-50. doi: 10.1145/1594977.1592575. |
[20] | NS-3 [Online]. Available: http://www.nsnam.org |
No related articles found! |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||