ZTE Communications ›› 2018, Vol. 16 ›› Issue (1): 52-60.DOI: 10.3969/j.issn.1673-5188.2018.01.009
收稿日期:
2017-08-05
出版日期:
2018-02-25
发布日期:
2020-03-16
ZHANG Lifeng, ZHANG Chunhong, HU Zheng, TANG Xiaosheng
Received:
2017-08-05
Online:
2018-02-25
Published:
2020-03-16
About author:
ZHANG Lifeng (zhanglifeng@bupt.edu.cn) is a postgraduate student at Beijing University of Posts and Telecommunications, China. His research interests include data mining and massively parallel processing of data.|ZHANG Chunhong (zhangch@bupt.edu.cn) is a lecture of School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, China. She received her Ph.D. degree in computer science, M.Eng. degree in information technology, B.Eng. degree in telecommunication engineering in 1993, 1996 and 2013 respectively. She was a visiting scholar at Illinois Institute of Technology, USA in 2015. Her research interests include data mining, natural language processing, and ubiquitous computing.|HU Zheng (huzheng@bupt.edu.cn) received his Ph.D. degree from Beijing University of Posts and Telecommunications, China in 2008. He is working in the State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications. His current research interests lie in the user behavior modeling and analysis in mobile internet and social networks. He has published more than 30 papers and been granted more than 10 patents in related area.|TANG Xiaosheng (txs@bupt.edu.cn) received his Ph.D. degree from Beijing University of Posts and Telecommunications, China. He is working in the Beijing University of Posts and Telecommunications. His current research interests include user behavior modeling and analysis in mobile internet and social networks. He has published more than 20 papers and been granted more than 10 patents in related areas.
. [J]. ZTE Communications, 2018, 16(1): 52-60.
ZHANG Lifeng, ZHANG Chunhong, HU Zheng, TANG Xiaosheng. Behavior Targeting Based on Hierarchical Taxonomy Aggregation for Heterogeneous Online Shopping Applications[J]. ZTE Communications, 2018, 16(1): 52-60.
Serial number | Data field |
---|---|
1 | IMSI |
2 | Start time |
3 | End time |
4 | URL |
5 | Domain name |
Table 1 The formats of DPI data
Serial number | Data field |
---|---|
1 | IMSI |
2 | Start time |
3 | End time |
4 | URL |
5 | Domain name |
Online store | Hierarchical taxonomy |
---|---|
Jingdong | 手机→手机通讯→手机 (Mobile Phones → Mobile Communication → Mobile Phones → IPhone 7) |
Amazon | 电子→手机通讯→手机 (Electronics → Mobile Communication → Mobile Phones → IPhone 7) |
Suning | 手机&数码→手机通讯→手机 (Phones& Digital → Mobile Communication → Mobile Phones → IPhone 7) |
Gome | 手机→手机通讯→手机 (Phones → Mobile Communication → Phones → IPhone 7) |
Table 2 Different labels of iPhone 7 from four online stores
Online store | Hierarchical taxonomy |
---|---|
Jingdong | 手机→手机通讯→手机 (Mobile Phones → Mobile Communication → Mobile Phones → IPhone 7) |
Amazon | 电子→手机通讯→手机 (Electronics → Mobile Communication → Mobile Phones → IPhone 7) |
Suning | 手机&数码→手机通讯→手机 (Phones& Digital → Mobile Communication → Mobile Phones → IPhone 7) |
Gome | 手机→手机通讯→手机 (Phones → Mobile Communication → Phones → IPhone 7) |
Label ID | Standard label |
---|---|
001001001 | 手机数码→手机通讯→手机(Phone & Digital → Mobile Communication → Mobile Phone) |
001001002 | 手机数码→手机通讯→对讲机(Phone & Digital → Mobile Communication → Interphone) |
001002001 | 手机数码→手机配件→移动电源(Phone & Digital → Phone Accessories → Mobile Power) |
001002002 | 手机数码→手机配件→蓝牙耳机(Phone & Digital → Phone Accessories → Bluetooth Earphone) |
002001001 | 电脑→电脑整机→笔记本(Computer → Computer machine → Laptop) |
002001002 | 电脑→电脑整机→台式主机(Computer → Computer machine → Desktop) |
002002001 | 电脑 办公设备→打印机(Computer → Office Equipment → Printer) |
Table 3 The samples of standard label system
Label ID | Standard label |
---|---|
001001001 | 手机数码→手机通讯→手机(Phone & Digital → Mobile Communication → Mobile Phone) |
001001002 | 手机数码→手机通讯→对讲机(Phone & Digital → Mobile Communication → Interphone) |
001002001 | 手机数码→手机配件→移动电源(Phone & Digital → Phone Accessories → Mobile Power) |
001002002 | 手机数码→手机配件→蓝牙耳机(Phone & Digital → Phone Accessories → Bluetooth Earphone) |
002001001 | 电脑→电脑整机→笔记本(Computer → Computer machine → Laptop) |
002001002 | 电脑→电脑整机→台式主机(Computer → Computer machine → Desktop) |
002002001 | 电脑 办公设备→打印机(Computer → Office Equipment → Printer) |
Serial number | Regex |
---|---|
1 | ware[iI]d|sku(=|%3D)(\d+) |
2 | order.*ware[iI]d(=|%3D)(\d+) |
3 | jd\.com/(product/)?(\d+)\.html |
4 | productIds=(\d+) |
5 | orderComment/(\d+) |
6 | item\.jd\.com/(\d+) |
Table 4 The Regex of item IDs in Jingdong
Serial number | Regex |
---|---|
1 | ware[iI]d|sku(=|%3D)(\d+) |
2 | order.*ware[iI]d(=|%3D)(\d+) |
3 | jd\.com/(product/)?(\d+)\.html |
4 | productIds=(\d+) |
5 | orderComment/(\d+) |
6 | item\.jd\.com/(\d+) |
Application | Number |
---|---|
Jingdong | 6 |
Suning | 5 |
Amazon | 4 |
Gome | 4 |
Dangdang | 2 |
Taobao & Tmall | 4 |
Table 5 The number of Regex of the six applications
Application | Number |
---|---|
Jingdong | 6 |
Suning | 5 |
Amazon | 4 |
Gome | 4 |
Dangdang | 2 |
Taobao & Tmall | 4 |
Rowkey | Column family: qualifier | Timestamp | Cell value |
---|---|---|---|
Item ID | “label:standard” | default | Standard label |
4005363 | “label:standard” | default | (Computer → Computer → Laptop) |
3726830 | “label:standard” | default | (Phone & Digital → Mobile Communication → Mobile Phone) |
Table 6 The store format of item information and samples
Rowkey | Column family: qualifier | Timestamp | Cell value |
---|---|---|---|
Item ID | “label:standard” | default | Standard label |
4005363 | “label:standard” | default | (Computer → Computer → Laptop) |
3726830 | “label:standard” | default | (Phone & Digital → Mobile Communication → Mobile Phone) |
DPI data size | Time cost in map | |
---|---|---|
Global match | Partial match | |
2.7 Tb | 1 h 13 min | 46 min |
470 Gb | 17 min | 10 min |
Table 7 Experiment results
DPI data size | Time cost in map | |
---|---|---|
Global match | Partial match | |
2.7 Tb | 1 h 13 min | 46 min |
470 Gb | 17 min | 10 min |
IMSI | URL | Label |
---|---|---|
898A93AA58976A49456222D58420B6B1 | http://item.m.jd.com/product/3368118.html | Online shopping →Appliances → Health appliance |
4B0DA828BDF06C1C21BC8926456402CA | http://cd.jd.com/img/channel?callback=jQuery6841693 | skuId=10483088139&_=1503617852363& Online shopping → Home building →Home textile cloth → Sheets |
45DCF28D2D947447F3C9B87E24491131 | |http://product.dangdang.com/23215376.html | Online shopping → Books →Political/Military |
DF6D96FC9B386DABF2E5C0AADB508BC1 | http://item.m.gome.com.cn/product-A0004771496-pop8003858741.html?cmpid=seo_baidu_kapian | Online shopping → Homeappliance → Personal care →Shaver |
Table 8 The final results of the proposed scheme
IMSI | URL | Label |
---|---|---|
898A93AA58976A49456222D58420B6B1 | http://item.m.jd.com/product/3368118.html | Online shopping →Appliances → Health appliance |
4B0DA828BDF06C1C21BC8926456402CA | http://cd.jd.com/img/channel?callback=jQuery6841693 | skuId=10483088139&_=1503617852363& Online shopping → Home building →Home textile cloth → Sheets |
45DCF28D2D947447F3C9B87E24491131 | |http://product.dangdang.com/23215376.html | Online shopping → Books →Political/Military |
DF6D96FC9B386DABF2E5C0AADB508BC1 | http://item.m.gome.com.cn/product-A0004771496-pop8003858741.html?cmpid=seo_baidu_kapian | Online shopping → Homeappliance → Personal care →Shaver |
[1] | H. Asghari, M. van Eeten, J. M. Bauer, and M. Mueller , “Deep packet inspection: effects of regulation on its deployment by internet providers,” in 41st Research Conference on Communication, Information and Internet Policy, Arlington, USA, 2013. doi: 10.2139/ssrn.2242463. |
[2] | R. Antonello, S. Fernandes, C. Kamienski , et al., “Deep packet inspection tools and techniques in commodity platforms: challenges and trends,” Journal of Network and Computer Applications, vol. 35, no. 6, pp. 1863-1878, Nov. 2012. doi: 10.1016/j.jnca.2012.07.010. |
[3] | K. Sha , “Trends and issues related to online shopping market in China,” in IEEE 6th International Conference on Information Management, Innovation Management and Industrial Engineering, Xi’an, China, 2013, pp. 183-187. doi: 10.1109/iciii.2013.6703114. |
[4] | M. Limayem, M. Khalifa, A. Frini , “What makes consumers buy from Internet? A longitudinal study of online shopping,” IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans, vol. 30, no. 4, pp. 421-432, 2000. doi: 10.1109/3468.852436. |
[5] | J. H. Wu, L. Peng, Q. Li , et al., “Falling in love with online shopping carnival on singles' day in China: an uses and gratifications perspective,” in IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), Okayama, Japan, 2016, pp. 1-6. doi: 10.1109/icis.2016.7550801. |
[6] | J. Chen and J. Stallaert , “An economic analysis of online advertising using behavioral targeting,” MIS Quarterly, vol. 38, no. 2, pp. 429-449, 2014. doi: 10.2139/ssrn.1787608. |
[7] | W. Zenghong, C. Yufen, Z. Jun , “Personalized map service user interest acquisition based on browse behavior,” in IEEE International Conference on Control Engineering and Communication Technology (ICCECT), Liaoning, China, 2012, pp. 916-919. doi: 10.1109/iccect.2012.225. |
[8] | Q. Zhu, M. L. Shyu, H. Wang , “Video topic: modeling user interests for content-based video recommendation,” International Journal of Multimedia Data Engineering and Management (IJMDEM), vol. 5, no. 4, pp. 1-21, 2014. doi: 10.4018/ijmdem.2014100101. |
[9] | D. I. Maditinos and K. Theodoridis , “Satisfaction determinants in the Greek online shopping context,” Information Technology & People, vol. 23, no. 4, pp. 312-329, 2010. doi: 10.1108/09593841011087789. |
[10] | R. Olbrich and C. Holsing , “Modeling consumer purchasing behavior in social shopping communities with clickstream data,” International Journal of Electronic Commerce, vol. 16, no. 2, pp. 15-40, 2011. doi: 10.2753/jec1086-4415160202. |
[11] | M. Pazzani and D. Billsus , “Learning and revising user profiles: the identification of interesting web sites,” Machine Learning, vol. 27, no. 3, pp. 313-331, 1997. doi: 10.1023/A:100736990. |
[12] | G. E. Hinton, J. L. McClelland, and D. E. Rumelhart , “Distributed representations,” in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1. Cambridge, USA: MIT Press, 1984, pp. 77-109. |
[13] | IResearch. (2017, May). Online shopping industry monitoring report in China 2016 [Online]. Available: http://wreport.iresearch.cn/uploadfiles/reports/636228578101640793.pdf |
[14] | M. J. Kusner, Y. Sun, N. I. Kolkin, K. Q. Weinberger , “From word embeddings to document distances,” in 32nd International Conference on Machine Learning, Lille, France, 2015, pp. 957-966. |
[15] | W. Y. Zou, R. Socher, D. Cer, C. D. Manning , “Bilingual word embeddings for phrase-based machine translation,” in Conference on Empirical Methods in Natural Language Processing, Seattle, USA, 2013, pp. 1393-1398. |
[16] | T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean , “Distributed representations of words and phrases and their compositionality,” in 26th International Conference on Neural Information Processing Systems, Lake Tahoe, USA, 2013: 3111-3119. |
[17] | T. Mikolov, K. Chen, G. Corrado, J. Dean . ( 2013, Sept. 7). Efficient estimation of word representations in vector space [Online]. Available: arxiv.org/abs/1301.3781. |
[18] | R. Collobert and J. Weston , “A unified architecture for natural language processing: deep neural networks with multitask learning,” in ACM 25th International Conference on Machine Learning, Helsinki, Finland, 2008, pp. 160-167. doi: 10.1145/1390156.1390177. |
[19] | A. Mnih and G. E. Hinton , “A scalable hierarchical distributed language model,” in Neural Information Processing Systems (NIPS 2008), Vancouver and Whistler, Canada, 2008, pp. 1081-1088. |
[20] | J. Turian, L. Ratinov, Y. Bengio , “Word representations: a simple and general method for semi-supervised learning,” in 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 2010, pp. 384-394. |
[21] | Redis. ( 2017, May). Redis [Online]. Available: https://redis.io |
[22] | Scrapy. ( 2017,May). Scrapy [Online]. Available: https://scrapy.org |
[23] | Sogou. ( 2017, May). SogouT [Online]. Available: http://www.sogou.com/labs/resource/t.php |
[24] | J. Dean and S. Ghemawat , “MapReduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008. doi: 10.1145/1327452.1327492. |
[25] | A. B. Patel, M. Birla, U. Nair , “Addressing big data problem using Hadoop and Map Reduce,” in IEEE Nirma University International Conference on Engineering (NUiCONE), Ahmedabad, India, 2012. doi: 10.1109/nuicone.2012. 6493198. |
[26] | J. Dittrich and J. A. Quiané-Ruiz , “Efficient big data processing in Hadoop MapReduce,” Proceedings of the VLDB Endowment, vol. 5, no. 12, pp. 2014-2015, Aug. 2012. doi: 10.14778/2367502.2367562. |
No related articles found! |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||