I spent the summer of 2017 as an R&D intern at VMware's Networking and Security group. Working in a group of three I developed Project Aegis, a machine learning powered approach to firewall policy. We filed two patents for our work, and presented our paper at RADIO, the company's exclusive internal research conference.
MotivationGrouping together nodes performing similar functions within a datacenter is a hard problem. It's also an important part of security policy enforcement, and bad clustering leads to a whole range of security problems. Today, even for large scale deployments, network adminstrators create these groups manually :
A modified version of Latent Dirchlet Allocation topic modelling formed the basis of our approach. Not only is it unsupervised in learning, but it runs in real-time on distributed computation, making it suitable to our approach.
The platform to run our algorithm was designed to be as general as possible, so that future ML projects would benefit from it as well. An IPFix collector would receive data in realtime from the NSX Manager. We used Spark, Kafka, Hadoop, and AirFlow for the analysis pipeline.