Today, I am happy to share my delightful interview with Emilia Gómez about her ISMIR presidency, the challenges the MIR community was facing in the last few years, the potential challenges for ISMIR in the future, and the importance of evaluation in MIR research.
Emilia is one of the most significant figures in MIR research, and she is involved in many different projects. At the moment, she is the Lead Scientist of the HUMAINT project at the Centre for Advanced Studies, Joint Research Centre, European Commission. She is also an Associate Professor of the Department of Information and Communication Technologies (DTIC), Universitat Pompeu Fabra (UPF), where she leads the MIR (Music Information Research) lab of the Music Technology Group (MTG).
Thank you very much for accepting to do this interview. With many significant works over the years, you became one of the most influential figures in MIR. I would like to start by asking the series of events that led you to get involved in this community.
I got to know the ISMIR community when I started my master’s program at IRCAM. After my engineering studies, I joined ATIAM where I learned many things about music acoustics, signal processing and computing. I think it was around the moment the ISMIR community was starting. While doing this master’s program, I met colleagues that are well known in the MIR community, e.g. Fabien Gouyon, Olivier Lartillot, Geoffroy Peeters.
After my masters, I started my Ph.D. at UPF, in the context of a European project called CUIDADO, which was also in collaboration with IRCAM. We started doing research on sound and music description, and music information retrieval. This was my first contact with the MIR field. Later, my Ph.D. turned to the area of pitch content analysis, mostly on melody and tonality description of music audio signals. That’s how I started publishing at ISMIR. My first ISMIR conference was in Paris in 2002. From that, I started to regularly attend and follow ISMIR conferences and publish there, and it became one of the major events that I was attending. I got familiar with the different research topics and the community. Later, I got more and more involved at ISMIR. For instance, I was part of the local committee of ISMIR 2004 in Barcelona, I took part in the first WiMIR gatherings and got involved in the initiative. I also participated in MIREX in several tasks, e.g. melody extraction. In 2013, I applied and got elected as a member (member-at-large) of the ISMIR board, starting as a board member in 2014. I later became president-elect after Malaga in 2016, and president 2018. Finally, I stepped out at the end of December 2019.
You just finished your presidency at ISMIR board. 2 years as a member-at-large, 2 years as the president-elect, and 2 years as the president. In terms of the number of conference participants, we see a huge increase from about 280 participants in 2015 to about 500 participants in 2019. Can you compare the status of the ISMIR community of today and of 4 years ago? What were the challenges you have been dealing within this period of time?
We have indeed seen an increase in the number of participants at the conference. We have tried to keep ISMIR as a single-track conference with a stable scientific program. We try to get a conference where all the program is affordable for attendees, with no parallel sessions. As you mentioned, the number of participants has increased a lot, and the number of submitted papers. As a consequence, both the acceptance rate and its scientific quality have improved. Of course, there were high quality papers at early ISMIRs but I think ISMIR has become a conference that is of very high quality in terms of scientific excellence. The papers at ISMIR are widely cited, and our publications have an impact on other conferences and scientific forums on the field. I am very happy that ISMIR has become a leading conference in our field.
About the increase in the number of participants, I think that it has been very related to the impact that our research is having on industry. We have seen a higher presence of the industry in the community in recent years, especially among attendees, and an increase in sponsorships that has allowed us to provide financial support for students to attend the conference. I think the success of ISMIR is related to the fact that it has become a leading place to get to know about the latest advances in the MIR field. We have been able to put ISMIR in an applied research environment with good technology transfer. So, in my opinion, this combination of scientific excellence and real-world application has made this increase in participation happen. That is the balance we have now between industry and academia.
The main challenge was and is to keep the diversity in the field. For instance, ISMIR has always been an interdisciplinary community. We are always proud to say that we have many disciplines involved, e.g. signal processing, machine learning, musicology, music cognition, library sciences. Most funding and efforts come from the more applied side of ISMIR like machine learning, and it’s always a challenge to make our conference attractive for researchers from other disciplines, for example, music cognition. The format then has to accommodate also scientific practices in the humanities, music theory or musicology. I think the main challenge is to keep this diversity in the future, and this is very much what I have promoted during my presidency.
In addition to the diversity of disciplines, we need a diversity of cultures represented as music is different in different parts of the world. That’s why we have promoted having ISMIR in Europe, America, and Asia in alternating editions. This is something that we achieved in the last years with two very successful ISMIR conferences in Suzhou and Taipei, and it will happen again with ISMIR India in 2021. This also shows the diversity of cultures and the need to adapt MIR research to them. Moreover, we need to keep diversity in terms of industry vs academia, aspects such as gender (as WiMIR is trying to promote), and seniority. This is the main challenge we need to address because protecting diversity, of course, is very challenging but in my opinion the most and powerful characteristic of ISMIR. You can see some diversity indicators of ISMIR 2019 here.
This year’s conference, which you were a co-general chair, was the one that had the most participants in ISMIR’s history. You talked about the challenges you faced in these last 4 years. What challenges do you see in front of the ISMIR community and the MIR research for the upcoming years?
I think the challenge in the future, as mentioned before, is to keep this identity with a very large number of participants. As an example, a very practical challenge that ISMIR organizers face nowadays is to accommodate a large number of participants. In many cases, you cannot use university facilities or auditoriums because of the capacity. This year, we had a great chance having TU Delft, and its facilities, to organize the conference. It’s not a common case that universities have such large auditoriums. Because of this, we have been considering at the ISMIR board that we might need to give organizers more time in advance to book such large auditoriums. So I think one of the challenges will be organizational. The larger the conference, the more complicated it is to organize.
In the near future, we also need to consider the impact of ISMIR on climate change, and how we can make ISMIR as sustainable as possible. There are more and more concerns about the carbon footprints of scientific conferences, and how they can be minimized. I think that this is another aspect that our community needs to address in the future. How can we make ISMIR more accessible for people that cannot or do not want to travel? Other conferences try to implement some remote assistance facilities for instance.
In terms of scientific excellence, I think, the step forward is to make sure that papers do have major scientific value and novelty. This is something that has been expressed by the last Program Committee (PC) chairs at ISMIR. In the paper submission process, it was requested to explain what the point of your paper really was, and we asked the reviewers to also check for scientific contributions on that.
And I also think it’s important to consider the social and ethical impact of ISMIR. In this context, in the last edition, I co-organized a tutorial on Fairness, Accountability, and Transparency (FAT) in MIR. There were also some conference sessions and a keynote talk. I see this, the ethical and social impact of MIR, as an area that the community might need to explore more.
You talked about bringing the industry closer to academia but the scientific contribution of the academic papers may not be related to the goals of the industry. At the same time, something that works better and faster may not provide any novel scientific contribution. What can be done to keep the balance and to integrate the two worlds?
I personally think that it’s great that we have so much presence from the industry. Music was traditionally a research field that was very difficult to get funding for. Now that the industry is interested, more funding opportunities emerge, and that is good. For instance, at ISMIR, there are sponsors that support students or researchers who cannot fund their participation. So, this synergy is good. I also think that to keep the quality and interest of the ISMIR scientific program, it’s good to have different kinds of papers and scientific contributions: on the one hand, there is fundamental research, or basic research, and on the other hand, there is research that is exploited and evaluated in real-world scenarios. I think this is one of the challenges of the MIP-Frontiers project but also of the ISMIR community as a whole. I think both sides are important but we need to balance them. That’s why it’s important to have PC chairs, a program committee and an ensemble of reviewers that are diverse. These different perspectives are needed to establish how a good ISMIR paper should be. So, it is a challenge but I think we need to keep both parts in our minds and have both contributions present when we build the ISMIR program. This should be an effort of the community, not only the board members, or the PC chairs, but all reviewers must be aware of the need for diversity.
HUMAINT, one of your main projects, investigates the impact of machine intelligence on human behavior. What kind of connections can you draw between this project and your research in MIR?
Part of the HUMAINT project is also related to music. During my career, I’ve always been developing MIR algorithms, and with my team, we develop algorithms that are later applied in different application domains, e.g. music recommendation, generation or identification. With HUMAINT, I became more and more interested in understanding what impact our algorithms have on listeners and creators. In addition, I think we need to make sure that the systems we create firstly have people in mind. So it’s more about awareness, not only of the algorithm itself but of the impact they have. For instance, a related idea is that we shouldn’t evaluate MIR systems only by their accuracy, but we need to incorporate other aspects such as fairness, transparency and diversity. We had a paper on this topic this ISMIR (Porcaro and Gómez, 2019) related to music playlists’ diversity. Aspects such as transparency, e.g. how do we explain our models, and also such as fairness. These are aspects that we should also address from the engineering perspective. Not only when are applied, but while we are building those systems.
As many other fields, MIR has witnessed a significant increase in the number of research using machine learning (ML) techniques. I would like to ask for your brief opinion on this trend. Do you think we can use ML algorithms for music recommendation? On the one hand, due to their black-box nature, we have problems with fairness, accountability, and transparency. On the other hand, sometimes the technological developments come before there are some theory or any kind of explanation. How should ML research in MIR balance the need to create something that works and the need to explain the reason behind a working model?
In the early times, we worked with small datasets, and we also designed our own features. Then, as an engineer, you had a clearer sense of what kind of music you were working with because we used to listen to these datasets. At the same time, you had a clear understanding of features, because you were designing all the steps. So, the knowledge was within the algorithm already implicitly because when you were developing the algorithm, you had this knowledge. Of course, when we work with large datasets, one hardly listens to music. It’s very rare that you will listen to all those million songs you work with. Also, as we don’t develop the algorithms but many times we exploit existing libraries, we sometimes miss the detailed picture of what we are really developing. The knowledge is a bit separated from the engineering aspect. You could even do research on MIR without any knowledge of the music domain. Some ML researchers from other domains can take a huge dataset of music and just apply their methods there. There is no need to have domain knowledge as there used to be. But in my opinion, having this knowledge about the music domain, and incorporating the knowledge into deep learning systems makes them better.
Of course, if something works, maybe you have solved the problem, but if you don’t evaluate well and if you don’t understand the limitations of the system, it would probably be less robust. If you don’t evaluate for fairness, your system may discriminate, and you may not be aware of that. I think this knowledge may really make systems better.
I truly believe that the knowledge gap will be closed in the future. Of course, when we try to develop a system architecture, we first take the existing architectures that were applied to image processing and apply them to spectrograms. Later on, we say “okay, now we have to create our own architectures because the spectrograms are not the same as images”. Later on, some people introduced the idea of “okay, we can take advantage of pitch estimation algorithms, chroma features, MFCCs or vocoder features; why don’t we incorporate them into our systems”. So, I think the knowledge is coming, and the fact is that we cannot explain a very complex architecture but we can understand its limitations. You can test, you can evaluate differently. I think a good evaluation of deep learning systems will bring up knowledge, and also will make systems behave better.
Understanding why a certain model gives a certain prediction is something that many researchers work on. On the other hand, when we spend more time on evaluation as you suggested, we can at least understand the limitations. Maybe for the next ISMIR conferences there should be some required error analysis section.
Yes, usually people don’t include the error analysis or robustness tests anymore. As a reviewer, I always check for that. I always check whether there is some kind of analysis of the results. I understand that, for instance, with the deep learning models, you cannot evaluate all possible configurations and architectures, but you should at least do some good analyses. I think this is something we should improve to have a better scientific understanding. It’s also very common that if you have a paper on evaluation at ISMIR, reviewers do not appreciate it. This is a message to our reviewers: when reviewing papers, you have to be aware also that a good methodology is as important as a novel idea for an algorithm. Reviewers might tend to value more the novelty of the algorithm than maybe the methodological approach.
I don’t know if it would be possible to force people to do error analysis in papers. For example, in ML communities, you have the common practice of performing ablation studies where you change some aspects of your model and try to prove that your design choices are robust but you still just report some accuracy on some dataset. Instead, while doing an error analysis, you look at the failed cases on these datasets, even with the same architecture, it would not take a lot of time to analyze them, and it can be a very valuable contribution.
I’m sure that when there are standard evaluation methodologies coming from the ML community, they will propagate to ISMIR but we need to keep in mind also that our community is diverse. Each discipline has a different evaluation methodology. For instance, at ISMIR, we also use qualitative methods such as user listening experiments in addition to machine learning methodologies. As a field, I think we have a big potential to combine both.
Whatever the discipline is and whatever evaluation methodology is used, the failed cases can be analyzed, based on the evaluation used in the paper. This can be the common part that may connect all these different communities within MIR research.
For instance, one effort of PC chairs this year was promoting submissions that “explicitly discuss reusable insights that go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the community”. So at least, in the review process, attention was paid to novelty, scientific quality, reusable insight, stimulation potential, importance, and readability. So I think the community is also moving toward defining some aspects we want to see in ISMIR papers. One of these would be reproducibility. Already in Porto, it was explicitly asked for or evaluated during the review process. It has become a de-facto practice to share code or data. Of course, not all the papers can do that but a great number of them have supplementary code or material. I would say also reusable insights. So, I think our community is going through incorporating that in the review process.