Astro4Dev Hackathon Requirements & Guidelines

This page is under development – if you have suggestions to make this resource more useful please email us at

Lea esta información en español

Unlike a conventional hackathon which brings together advanced programming/data science practitioners to innovate and solve complex problems, the aim of the Astro4Dev Hackathon is to provide exposure to data science and machine learning and some of the techniques involved. Participants at the hackathon work on interesting real-world problems which are data-driven. The hackathons are aimed at postgraduate students and young professionals in Science, Technology, Engineering & Mathematics (STEM) fields who have a keen interest in data science and would benefit from further exposure and hands-on experience. The hackathon can be planned as an in-person or virtual event.

The following guidelines and requirements listed are relevant to skills development hackathons such as those run as part of the DARA Big Data project, with the aim of providing exposure to data science and machine learning techniques. Other hackathon events may take different formats – these are simply guidelines which can be adapted accordingly.



  1. Determine whether the event would be in-person, virtual or hybrid.
  2. In the case of in-person/virtual events, a venue (e.g. a computer lab) with sufficient desktops/laptops such that there is at least one computer per team.
  3. In the case of hackathon projects which implement machine learning / deep learning, an available computing platform such as a server or cloud should be identified for participants to access during the event, as most projects of this type are computationally intensive. Participants could also make use of free cloud computing services such as Google Colab or Kaggle.
  4. In the case of virtual hackathons, or in-person hackathons where the code will be run on a cloud computing platform, participants should have access to a reliable internet connection with speeds of at least 4 Mbps.
  5. A sufficient number of tutors such that there is at least one tutor for every two teams (each team consisting of 4-5 participants). Tutors should meet the requirements set out in the Tutor Guidelines document.
  6. A selection of one or more DARA Big Data hackathon projects (see projects section below). The projects available span a range of commercial, scientific and development-related topics.




  • Determine the individuals who would benefit from the planned event, i.e. would the school/workshop/hackathon be targeted at students (if so, at what level), educators, working individuals who would benefit from upskilling, or a combination thereof? In the case of virtual events, participants require their own computers as well as a reliable internet connection.

  • Participants should have some level of prior programming experience (Python).


  • Hackathons can be run over a minimum of two days (three days is preferable), however, the hackathon may also be held as part of a larger event, such as a workshop or school, at which lessons and/or talks related to data science and machine learning are delivered. Hackathons are intense by nature, in that participants will cover a lot of content and problem-solving in a short space of time. It is therefore important to follow a well-structured programme.  
  • At the hackathon, participants would work through DARA Big Data tutorials before tackling the ‘hackathon task’, with the assistance of tutors who are familiar with the material and concepts.

Hackathon Projects

The tutorials relevant to each of the projects below are in the form of Jupyter Notebooks and, in addition to Python code, contain information, explanations and links to ensure that they are more accessible to the participants. The tutorials also introduce the techniques necessary to complete the hackathon task.  Click on any of the following projects to access the GitHub repository with the relevant resources.

Flood Detection


NLP – Sentiment Analysis


Image Classification


Pulsars vs RFI


Movie Recommendation App


Music Classification


Tutor Training

In order to run a successful big data hackathon, experienced tutors are required to assist participants with the hackathon tutorials and the hackathon task. A ratio of one tutor for every two hackathon teams (each team consisting of 4-5 participants) is recommended to ensure that tutors are not overloaded and that participants receive assistance timeously. In preparation for a hackathon, tutors should go through these tutor guidelines and make use of one of the training videos below relevant to the chosen hackathon project.

Hackathon Task

  • During the hackathon, participants work in teams of 4-5 individuals in order to enable peer learning and the development of soft skills such as teamwork, communication and collaboration. It is recommended that each team is diverse with regard to skill level, academic background/working profession and gender. 
  • Participants may present their solutions nearing the end of the event, showing their understanding of the techniques used, or demonstrating the accuracy of their solutions. Certificates and prizes may also be awarded where possible.


Should the chosen hackathon project indicate that additional computing power is recommended for that project, participants may make use of the Google Colab computing platform.

Organizers may also contact the Inter-university Institute for Data Intensive Astronomy (IDIA) for possible compute support through the use of their research cloud facility.


  • Resources that may aid participants in their preparation ahead of the hackathon may be circulated to all participants approximately two weeks prior to the event. These resources cover Python programming, as well as a basic introduction to certain machine learning concepts.
  • Recorded talks by members of DARA Big Data and IDIA on topics in data science and machine learning are also available and may be played as part of the event for the benefit of the participants. It is also recommended that invited talks be given by academic or industry experts in the field, in order to make participants aware of the many applications of data science and machine learning. 


  • In the case that the event is not targeting a pre-selected group of individuals, e.g. all Honours and Masters students in the University’s Physics department, a poster/advert (see example poster) containing all relevant information can be created and disseminated via mailing lists and social media in order to attract suitable applicants
  • We have found that including a link to a Google Form where individuals can apply is an effective method for obtaining registrations for the event. 
  • If a selection process is required, participants may be selected based on their existing Python experience or their motivation for attendance.


  • The success or impact of the event may be assessed using feedback from participants and tutors, and/or the quality of the solutions produced. Example pre- and post-event surveys can be found here.
  • Organisers may create networks for both themselves and the participants to keep in touch with each other in order to assess the long term impact of the event or to carry out participant assessments post-event. These can be in the form of mailing lists, Facebook groups, WhatsApp groups etc.