The world is awash with data and much more is on the way, creating a tidal wave of Big Data. Data Engineers develop the infrastructure to store, manage, analyse this wave of data, to bridge the gap between Data and Computer Science. This unique course will give you the skills you’ll need to succeed as a Data Engineer.
Why study Data Engineering at Dundee?
The role of “Data Scientist” has been described as the “sexiest job of the 21st Century. However, there is a emerging a new role, that of Data Engineer as more companies are realising they need employees with specific skills to handle the amount of data that is being generated and the coming tidal wave from the Internet of Things.
This MSc has been created with industry input to prepare its students with the skills to handle this wave of data and to be at the forefront of its exploitation. Students on the sister programmes (“Data Science” and “Business Intelligence”) have gone on to work for some of the biggest companies in the industry and we are confident that graduates from this MSc will have the same success.
The School of Computing at the University of Dundee has been successfully offering related MSc programmes such as Business Intelligence and Data Science since 2010. These innovative programmes attract around 40 students per year, drawn from across Europe and Overseas.
What's so good about Data Engineering at Dundee?
You will have 24-hour access to our award winning and purpose-built Queen Mother Building. It has an unusual mixture of lab space and breakout areas, with a range of conventional and special equipment for you to use. It's also easy to work on your own laptop as there is wireless access throughout the building. Our close ties to industry allows us access to facilities such as Windows Azure and Teradata, and university and industry standard software such as Tableau for you to evaluate and use.
The University of Dundee has close ties with the Big Data industry, including Teradata, Datastax and Microsoft. We have worked with SAS, Outplay, Tag, GFI Max, BrightSolid and BIPB, and our students have enjoyed guest lectures from Big Data users such as O2, Sainsbury’s, M&S and IBM.
You will be able to work with a range of leading researchers and tutors, including top vision and imaging researchers and BI experts. Our honorary staff include legal experts, entrepreneurs and renowned industry experts such as John Richards of the newly formed IBM Watson Group.
How you will be taught
The course will be taught by staff of the School of Computing. Depending on the modules you take this will include Andy Cobley, Professor Mark Whitehorn, and Professor Stephen McKenna.
What you will study
The course will be taught in 20 credit modules with a 60 credit dissertation. Students will require to complete 180 credits for the award of the MSc (including 60 credits for the dissertation). Students completing 120 credits (without the dissertation) will be eligible for a Postgraduate Diploma.
Each module on the course is designed to give the student the skills and understanding they need to succeed in the Data Engineering/ Science field. Content on the course includes (but is not limited to):
Cassandra, Neo4j and other nosql databases
The Storm distributed real time computation system
Hadoop, HDFS, MapReduce, and other Hadoop/SQL technologies
Spark and Shark frameworks
Data Engineering languages such as Python, erlang, R, Matlab
Vision systems, which are becoming increasingly important in data engineering for extracting features from large quantities of images such as from traffic, medical and industrial
RDBMS systems which will continue to play an important role in data handing and storage. You will be expected to research the history of RDMBS and delve in to the internals of modern systems
OLAP cubes and Business Intelligence systems, which can be the best and quickest way to extract information from data stores
Goals of machine learning and data mining
Clustering: K-means, mixture models, hierarchical
Dimensionality reduction and visualisation
Inference: Bayes, MCMC
Perceptrons, logistic regression, neural networks
Max-margin methods (SVMs)
Mining association rules
How you will be assessed
The course is assessed through a combination of examinations, coursework, presentations and interviews. Each module is different: for instance the Big data module has 40% coursework, consisting of Erlang programming and a presentation on nosql databases, along with an examination worth 60%.
Our experience suggests that graduates of this course will have most impact in the following areas:
Cloud and web based industries that handle large volumes of fast moving data that need to be stored, analysed and maintained. Examples include the publishing industry (paper, TV and internet), messaging services, data aggregators and advertising services
Internet of Things. A large amount of data is being generated by devices (robotic assembly lines, home power management, sensors etc.) all of which needs to be stored and analysed.
Health. The NHS (and others) are starting to store and analyse patient data on an unprecedented scale. The healthcare industry is also combining data sources from a large number of databases to improve patient well-being and health outcomes
Games industry. The games industry records an extraordinary amount of data about its customers' play activities, all of which needs to be stored and analysed. This course will equip students with the knowledge and skill to engage with the industry.