"crime fiction" . "Stylometry and pastiche. A case study from French crime fiction" . "

\"Ars\u00E8ne

Ars\u00E8ne Lupin. Source: http://commons.wikimedia.org/wiki/File:Lupin01.jpg. Image in the public domain.

\n

When I enthusiastically present literary scholars with the surprising accuracy stylometric methods display in many cases of authorship attribution (given appropriate conditions, such as sufficient material, a certain homogeneity in the genre of the texts, and state-of-the art distance measures), some come up with a clever question: What if one author parodies the style of another author? Will stylometric methods be fooled?

\n

What if… indeed! Up to now, all I was able to reply to this was to acknowledge the interest of the question, refer to one or two relevant papers[1] and make an educated guess: If the author is very good, he or she can pull it off, but in most cases, I suspect that the author signal remains strong even in a pastiche, i.e. the imitation of the style or character of a work of art.[2] But I had never come across a good test-case from French literature for an actual stylometric experiment.

\n

The other day, when expanding my collection of French crime fiction novels, I finally came across a suitable test-case. The authors Pierre Boileau and Thomas Narcejac, better known as Boileau-Narcejac[3], have not only jointly written a large number of suspense-oriented crime fiction novels, from the 1950s to the late 1980s, with their collaboration being an interesting case for stylometry in itself. As admirers of early twentieth-century proponents of the genre, they have also produced, in the course of the 1970s, five pastiches of Maurice Leblanc’s novels featuring Ars\u00E8ne Lupin. The five novels are: Le Secret d’Eunerville (1973), La Poudri\u00E8re (1974), Le Second visage d’Ars\u00E8ne Lupin (1975), La Justice d’Ars\u00E8ne Lupin (1977) and Le Serment d’Ars\u00E8ne Lupin (1979). These novels feature Maurice Leblanc’s famous “gentleman-cambrioleur” Ars\u00E8ne Lupin[4] and lead him to new adventures in which to be bold, save several people’s lives, meet beautiful women, and possibly steal (and/or restitute) some valuable jewlery or works of art in the process. In their brief prefatory note to La Poudri\u00E8re, the authors explicitly designate their novels as “pastiches” and state the following (my translation): “There is a Leblanc style [\u00E9criture], whose mouvement it is not all that difficult to reproduce ; but there is, in Leblanc, a creativity, a way of approaching the absence of verisimilitude with a natural elegance, which intimidated us a lot.”[5] Basically, they say: it was easy to imitate the style, but hard to imitate the plot.

\n

My reading experience would rather suggest the opposite: it seems easier to imitate the characters and events of a type of novel than its style. So, to approach the question of style, what if we prepared a collection of texts to test whether these pastiches of Leblanc by Boileau-Narcejac (four of which I have) are more similar either to Leblanc’s originals or to Boileau-Narcejac’s other crime fiction novels? Besides the contestants, such a collection should also contain crime fiction contemporary to Leblanc’s originals (such as novels by Gaston Leroux) as well as contemporary to Boileau-Narcejac’s pastiches and their other novels (such as novels by Georges Simenon, L\u00E9o Malet, Fr\u00E9d\u00E9rid Dard, and Jean-Patrick Manchette). This makes for a collection of 123 crime fiction novels, written by seven different authors (Boileau-Narcejac counted as one) during a period of around seventy years (1908-1977). While I have all these texts at my disposal, unfortunately, I do not have any ‘normal’ crime fiction by Boileau-Narcejac exactly contemporary to the pastiches, only a number of slightly earlier ones, so if their Leblanc pastiches turn out to be different from their ‘normal’ novels, this could be due to chronology. Also, narrative perspective and its consequences for the frequencies of pronouns and verb forms may come into our way (French being a highly inflected language). But let’s see.

\n

First things first, I wanted to see what could be done with a well-established distance measure such as Eder’s Delta (a variant of Burrows’ Delta which has served me well on French texts), using the stylo for R package.[6] In order to minimize the influence of narrative perspective on the word frequency profiles of the texts, I used a custom list of stopwords including personal pronouns, possessive pronouns and some very common verbs in their first and third person form.[7] In addition, I excluded the names “Ars\u00E8ne” and “Lupin”, in order not to let the simple mention of the hero’s name influence the assessment of stylistic similarity.[8] The pastiches by Boileau-Narcejac have their own “author” label (BoilNarcP), so they will be set apart visually in the results.

\n

As always, the possibilities for parameter setting are endless, and adding a custom-built list of stopwords to the mix does not help. But let’s start with a carefully cherry-picked if conservative set of parameters: the 400 most frequent words only, without the above-mentioned stopwords, culling of 20% applied to somewhat level the playing field, and using Eder’s Delta. The result is the following dendrogram:

\n

\"Dendrogram

Figure 1: Dendrogram of stylistic similarities (stylo, Eder’s Delta, 400 MFW, custom stoplist, 20% culling). Click to enlarge.

\n

As you can easily see from figure 1, the results are almost flawless: the 123 texts cluster into perfectly coherent, strongly-separated author-based groups (only exception: one novel by Malet which clusters with Dard). These combine into larger groups (from top to bottom): one made up of Boileau-Narcejac relatively close to\u00A0Fr\u00E9d\u00E9ric Dard as well as Jean-Patrick Manchette, with L\u00E9o Malet a bit more removed; a second one with Maurice Leblanc and Gaston Leroux, clearly based on chronology and shared narrative perspective. And finally, Georges Simenon on his own.

\n

The four pastiches of Maurice Leblanc’s Ars\u00E8ne Lupin, written by Boileau-Narcejac, which are shown in green, are right among the other texts by Boileau-Narcejac. Nothing indicates their special status. On the level of analysis used here, and with the parameters used, any potentially existing “Maurice Leblanc” style in these four novels remains invisible. It is not entirely clear to me whether this is a success or a failure, but it is certainly surprising!

\n

\"Principal

Figure 2: Principal Component Analysis (same settings as above). Click to enlarge.

\n

A Principal Component Analysis with analogous settings reveals the same relationships, adding some nuance (see figure 2): Simenon is the only author removed from the others in the first dimension. Also, there is not just closeness, but overlap between Leroux and Leblanc. Manchette seems to have a significantly larger stylistic range compared to what the dendrogram suggests. And again, the pastiches are right in the middle of the other Boileau-Narcejac texts. Interestingly, the one text by Boileau-Narcejac approaching the Maurice Leblanc novels, further to the upper right, is not one of the pastiches, but the 1957 novel Les Magiciennes.

\n

Now, things actually get a little bit more mixed-up when using a longer wordlist: at some point, three novels by Malet join the Dard-cluster; and at some other point, three novels by Leblanc also join the Dard-cluster. The reasons for this, and the type of stylistic similarity involved, remain to be investigated. However, the two types of Boileau-Narcejac texts always cluster solidly together, without any intrusions from texts by other authors.

\n

\"Results

Figure 3: Results from classification tasks (several algorithms). Click to enlarge.

\n

The Boileau-Narcejac pastiches’ solid stylistic identification as authored by Boileau-Narcejac is also confirmed by various classification tasks performed using the classify() function in stylo for R, this time using a lot more features (the 5000 most frequent words) and no custom list of stopwords (the idea is that the algorithms will sort it all out). Again, while not perfectly unanimous on all cases, the results are rock solid for the pastiches (see figure 3).

\n

So, what exactly is going on? It is possible to approach these distant-reading results from some more angels. For example, when trying to understand how exactly the normal novels and the pastiches by Boileau-Narcejac are similar, one could look at their word frequencies in comparison. Just for illustration, figure 3 below shows the word frequencies for the first 50 most frequent words, not for each of the 123 novels, but as averages across the novels by one “author”: Boileau-Narcejac, their pastiches, Leblanc (the target of their pastiche) and Fr\u00E9d\u00E9ric Dard.

\n

\"Average

Average word frequencies for four “authors” (first 50 most frequent words). Click to enlarge.

\n

Of course, this is only a small portion of the 400 most frequent words used for the cluster analysis and principal component analysis above. But the issue becomes clear, I think: Sure, one can perfectly well find words for which the the ‘normal’ novels and the pastiches by Boileau-Narcejac seem to “stick together” while Leblanc and Dard deviate from them and/or from each other (e.g. for “pas”, “en”, “mais”, “bien”). But one can just as well find words for which the ‘normal’ novels deviate from the pastiches, which seem closer to Leblanc (e.g. “\u00E0”, “dans”, “comme”). Each of these words merit an investigation into their stylistic and content-wise contribution to the four groups of novels. However, it is clear that only quantitative methods, namely distance measures, can add up all these subtle differences and similarities into a score.

\n

Distance measures, yes, but correlation tests could also be interesting to assess the similarity between these word frequency distributions. The following is a table of Pearson’s correlation tests on the average word frequencies for each of the 400 most frequent words across all novels by a given author (Boileau-Narcejac and their pastiches being, again, treated as two separate authors).

\n

\"Correlation

Correlation table between authors (average word frequencies across all novels by one author). Click to enlarge.

\n

The table shows how much correlation there is between authors. Generally speaking, correlations are very strong, and differences are subtle (even with the confidence level set to 0.99).\u00A0 You could think of it as a “poor man’s distance matrix”. In any case, it shows without surprise (by now) that Boileau-Narcejacs Ars\u00E8ne Lupin-pastiches are more similar to these authors other novels (correlation score of 0.9938), than to the original Ars\u00E8ne Lupin-novels by Maurice Leblanc (0.9871).

\n

So, what if an author tries to imitate the style of another author? Will stylometric tools be fooled, or not? In this test case, they have clearly not been fooled. In terms of quantitative stylistics, the Leblanc pastiches by Boileau-Narcejac are clearly written in the style of Boileau-Narcejac. As for the plot, which is unmistakeably Leblanc’esque, that is another story.Notes

\n
    \n
  1. One on an Alice in Wonderland pastiche — Harold Somers, Fiona Tweedie: “Authorship Attribution and Pastiche”, in: Computers and the Humanities, 37.4, 2003, 407-429, paywalled at: http://link.springer.com/article/10.1023%2FA%3A1025786724466 — another one on detecting fraudulous authorship in online media — Sadia Afroz et al., “Detecting Hoaxes, Frauds, and Deception in Writing Style”, Proceedings of the 2012 IEEE Symposium on Security and Privacy, 2012, p. 461-475, available at http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6234430 — [edit: and a third one on Raymond Chandler imitations — Lee Sigelman and William Jacoby, “The Not-So-Simple Art of Imitation: Pastiche, Literary Style, and Raymond Chandler”, Computers and the Humanities 30.1, 1996, 11-28].
  2. \n
  3. See art. “Pastiche”, Wikipedia, http://en.wikipedia.org/wiki/Pastiche.
  4. \n
  5. See also: Claude Mespl\u00E8de, “Boileau-Narcejac”, Dictionnaire des litt\u00E9ratures polici\u00E8res, Nantes: Joseph K, 2007, vol. 2, p. 410-411.
  6. \n
  7. A legal agreement with the heir, Claude Leblanc, was necessary to allow Boileau-Narcejac to feature the character; O tempora, o mores!
  8. \n
  9. French original: “Il y a une \u00E9criture Leblanc, dont il n’est pas trop difficile de reproduire le mouvement ; mais il y a, chez Leblanc, une invention, une mani\u00E8re de c\u00F4toyer l’invraisemblance avec naturel et \u00E9l\u00E9gance, qui nous intimidait beaucoup.”
  10. \n
  11. Maciej Eder, Mike Kestemont, and Jan Rybicki. (2013). “Stylometry with R: a suite of tools”. Digital Humanities 2013: Conference Abstracts. Lincoln: University of Nebraska-Lincoln, pp. 487-89. [pre-print].
  12. \n
  13. Here is how to do this (explanation courtesy of Maciej):
    \n1. Run stylo normally:
    \n> results1 = stylo()
    \n2. Assign the frequencies to a variable:
    \n> frequencies1 = results1$frequencies.0.culling
    \n3. Define a new variable with frequencies without stopwords:
    \n> frequencies2 = delete.stop.words(frequencies1, stop.words=c(\"your\",\"stop\",\"words\") )
    \n4. Run stylo again with culled table of frequencies:
    \n> stylo(frequencies = frequencies2).
    \nThat’s it.
  14. \n
  15. Here is the list of stopwords: “je”, “j”, “tu”, “il”, “nous”, “vous”, “elle”, “ils”, “elles”, “moi”, “lui”, “me”,”m”,”se”,”te”,”t”, “ma”, “sa”, “mes”, “son”, “mon”, “ses”,”votre”, “ai”, “a”, “suis”, “est”, “sont”, “ont”, “avais”, “avait”, “avaient”, “avez”, “\u00E9tais”, “\u00E9tait”, “\u00E9taient”, “fus”, “fut”, “fais”, “fait”, “font”, “fis”, “fit”, “sais”, “sait”, “\u00E7a”, “c”, “dis”, “dit”, “disent”, “Ars\u00E8ne”, “Lupin”.
  16. \n
\n" . "Boileau-Narcejac" . "
\"Ars\u00E8ne

Ars\u00E8ne Lupin. Source: http://commons.wikimedia.org/wiki/File:Lupin01.jpg. Image in the public domain.

\n

When I enthusiastically present literary scholars with the surprising accuracy stylometric methods display in many cases of authorship attribution (given appropriate conditions, such as sufficient material, a certain homogeneity in the genre of the texts, and state-of-the art distance measures), some come up with a clever question: What if one author parodies the style of another author? Will stylometric methods be fooled?

\n

What if… indeed! Up to now, all I was able to reply to this was to acknowledge the interest of the question, refer to one or two relevant papers[1] and make an educated guess: If the author is very good, he or she can pull it off, but in most cases, I suspect that the author signal remains strong even in a pastiche, i.e. the imitation of the style or character of a work of art.[2] But I had never come across a good test-case from French literature for an actual stylometric experiment.

\n

The other day, when expanding my collection of French crime fiction novels, I finally came across a suitable test-case. The authors Pierre Boileau and Thomas Narcejac, better known as Boileau-Narcejac[3], have not only jointly written a large number of suspense-oriented crime fiction novels, from the 1950s to the late 1980s, with their collaboration being an interesting case for stylometry in itself. As admirers of early twentieth-century proponents of the genre, they have also produced, in the course of the 1970s, five pastiches of Maurice Leblanc’s novels featuring Ars\u00E8ne Lupin. The five novels are: Le Secret d’Eunerville (1973), La Poudri\u00E8re (1974), Le Second visage d’Ars\u00E8ne Lupin (1975), La Justice d’Ars\u00E8ne Lupin (1977) and Le Serment d’Ars\u00E8ne Lupin (1979). These novels feature Maurice Leblanc’s famous “gentleman-cambrioleur” Ars\u00E8ne Lupin[4] and lead him to new adventures in which to be bold, save several people’s lives, meet beautiful women, and possibly steal (and/or restitute) some valuable jewlery or works of art in the process. In their brief prefatory note to La Poudri\u00E8re, the authors explicitly designate their novels as “pastiches” and state the following (my translation): “There is a Leblanc style [\u00E9criture], whose mouvement it is not all that difficult to reproduce ; but there is, in Leblanc, a creativity, a way of approaching the absence of verisimilitude with a natural elegance, which intimidated us a lot.”[5] Basically, they say: it was easy to imitate the style, but hard to imitate the plot.

\n

My reading experience would rather suggest the opposite: it seems easier to imitate the characters and events of a type of novel than its style. So, to approach the question of style, what if we prepared a collection of texts to test whether these pastiches of Leblanc by Boileau-Narcejac (four of which I have) are more similar either to Leblanc’s originals or to Boileau-Narcejac’s other crime fiction novels? Besides the contestants, such a collection should also contain crime fiction contemporary to Leblanc’s originals (such as novels by Gaston Leroux) as well as contemporary to Boileau-Narcejac’s pastiches and their other novels (such as novels by Georges Simenon, L\u00E9o Malet, Fr\u00E9d\u00E9rid Dard, and Jean-Patrick Manchette). This makes for a collection of 123 crime fiction novels, written by seven different authors (Boileau-Narcejac counted as one) during a period of around seventy years (1908-1977). While I have all these texts at my disposal, unfortunately, I do not have any ‘normal’ crime fiction by Boileau-Narcejac exactly contemporary to the pastiches, only a number of slightly earlier ones, so if their Leblanc pastiches turn out to be different from their ‘normal’ novels, this could be due to chronology. Also, narrative perspective and its consequences for the frequencies of pronouns and verb forms may come into our way (French being a highly inflected language). But let’s see.

\n

First things first, I wanted to see what could be done with a well-established distance measure such as Eder’s Delta (a variant of Burrows’ Delta which has served me well on French texts), using the stylo for R package.[6] In order to minimize the influence of narrative perspective on the word frequency profiles of the texts, I used a custom list of stopwords including personal pronouns, possessive pronouns and some very common verbs in their first and third person form.[7] In addition, I excluded the names “Ars\u00E8ne” and “Lupin”, in order not to let the simple mention of the hero’s name influence the assessment of stylistic similarity.[8] The pastiches by Boileau-Narcejac have their own “author” label (BoilNarcP), so they will be set apart visually in the results.

\n

As always, the possibilities for parameter setting are endless, and adding a custom-built list of stopwords to the mix does not help. But let’s start with a carefully cherry-picked if conservative set of parameters: the 400 most frequent words only, without the above-mentioned stopwords, culling of 20% applied to somewhat level the playing field, and using Eder’s Delta. The result is the following dendrogram:

\n
\"Dendrogram

Figure 1: Dendrogram of stylistic similarities (stylo, Eder’s Delta, 400 MFW, custom stoplist, 20% culling). Click to enlarge.

\n

As you can easily see from figure 1, the results are almost flawless: the 123 texts cluster into perfectly coherent, strongly-separated author-based groups (only exception: one novel by Malet which clusters with Dard). These combine into larger groups (from top to bottom): one made up of Boileau-Narcejac relatively close to\u00A0Fr\u00E9d\u00E9ric Dard as well as Jean-Patrick Manchette, with L\u00E9o Malet a bit more removed; a second one with Maurice Leblanc and Gaston Leroux, clearly based on chronology and shared narrative perspective. And finally, Georges Simenon on his own.

\n

The four pastiches of Maurice Leblanc’s Ars\u00E8ne Lupin, written by Boileau-Narcejac, which are shown in green, are right among the other texts by Boileau-Narcejac. Nothing indicates their special status. On the level of analysis used here, and with the parameters used, any potentially existing “Maurice Leblanc” style in these four novels remains invisible. It is not entirely clear to me whether this is a success or a failure, but it is certainly surprising!

\n
\"Principal

Figure 2: Principal Component Analysis (same settings as above). Click to enlarge.

\n

A Principal Component Analysis with analogous settings reveals the same relationships, adding some nuance (see figure 2): Simenon is the only author removed from the others in the first dimension. Also, there is not just closeness, but overlap between Leroux and Leblanc. Manchette seems to have a significantly larger stylistic range compared to what the dendrogram suggests. And again, the pastiches are right in the middle of the other Boileau-Narcejac texts. Interestingly, the one text by Boileau-Narcejac approaching the Maurice Leblanc novels, further to the upper right, is not one of the pastiches, but the 1957 novel Les Magiciennes.

\n

Now, things actually get a little bit more mixed-up when using a longer wordlist: at some point, three novels by Malet join the Dard-cluster; and at some other point, three novels by Leblanc also join the Dard-cluster. The reasons for this, and the type of stylistic similarity involved, remain to be investigated. However, the two types of Boileau-Narcejac texts always cluster solidly together, without any intrusions from texts by other authors.

\n
\"Results

Figure 3: Results from classification tasks (several algorithms). Click to enlarge.

\n

The Boileau-Narcejac pastiches’ solid stylistic identification as authored by Boileau-Narcejac is also confirmed by various classification tasks performed using the classify() function in stylo for R, this time using a lot more features (the 5000 most frequent words) and no custom list of stopwords (the idea is that the algorithms will sort it all out). Again, while not perfectly unanimous on all cases, the results are rock solid for the pastiches (see figure 3).

\n

So, what exactly is going on? It is possible to approach these distant-reading results from some more angels. For example, when trying to understand how exactly the normal novels and the pastiches by Boileau-Narcejac are similar, one could look at their word frequencies in comparison. Just for illustration, figure 3 below shows the word frequencies for the first 50 most frequent words, not for each of the 123 novels, but as averages across the novels by one “author”: Boileau-Narcejac, their pastiches, Leblanc (the target of their pastiche) and Fr\u00E9d\u00E9ric Dard.

\n
\"Average

Average word frequencies for four “authors” (first 50 most frequent words). Click to enlarge.

\n

Of course, this is only a small portion of the 400 most frequent words used for the cluster analysis and principal component analysis above. But the issue becomes clear, I think: Sure, one can perfectly well find words for which the the ‘normal’ novels and the pastiches by Boileau-Narcejac seem to “stick together” while Leblanc and Dard deviate from them and/or from each other (e.g. for “pas”, “en”, “mais”, “bien”). But one can just as well find words for which the ‘normal’ novels deviate from the pastiches, which seem closer to Leblanc (e.g. “\u00E0”, “dans”, “comme”). Each of these words merit an investigation into their stylistic and content-wise contribution to the four groups of novels. However, it is clear that only quantitative methods, namely distance measures, can add up all these subtle differences and similarities into a score.

\n

Distance measures, yes, but correlation tests could also be interesting to assess the similarity between these word frequency distributions. The following is a table of Pearson’s correlation tests on the average word frequencies for each of the 400 most frequent words across all novels by a given author (Boileau-Narcejac and their pastiches being, again, treated as two separate authors).

\n
\"Correlation

Correlation table between authors (average word frequencies across all novels by one author). Click to enlarge.

\n

The table shows how much correlation there is between authors. Generally speaking, correlations are very strong, and differences are subtle (even with the confidence level set to 0.99).\u00A0 You could think of it as a “poor man’s distance matrix”. In any case, it shows without surprise (by now) that Boileau-Narcejacs Ars\u00E8ne Lupin-pastiches are more similar to these authors other novels (correlation score of 0.9938), than to the original Ars\u00E8ne Lupin-novels by Maurice Leblanc (0.9871).

\n

So, what if an author tries to imitate the style of another author? Will stylometric tools be fooled, or not? In this test case, they have clearly not been fooled. In terms of quantitative stylistics, the Leblanc pastiches by Boileau-Narcejac are clearly written in the style of Boileau-Narcejac. As for the plot, which is unmistakeably Leblanc’esque, that is another story.

\nNotes
  1. One on an Alice in Wonderland pastiche — Harold Somers, Fiona Tweedie: “Authorship Attribution and Pastiche”, in: Computers and the Humanities, 37.4, 2003, 407-429, paywalled at: http://link.springer.com/article/10.1023%2FA%3A1025786724466 — another one on detecting fraudulous authorship in online media — Sadia Afroz et al., “Detecting Hoaxes, Frauds, and Deception in Writing Style”, Proceedings of the 2012 IEEE Symposium on Security and Privacy, 2012, p. 461-475, available at http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6234430 — [edit: and a third one on Raymond Chandler imitations — Lee Sigelman and William Jacoby, “The Not-So-Simple Art of Imitation: Pastiche, Literary Style, and Raymond Chandler”, Computers and the Humanities 30.1, 1996, 11-28].
  2. See art. “Pastiche”, Wikipedia, http://en.wikipedia.org/wiki/Pastiche.
  3. See also: Claude Mespl\u00E8de, “Boileau-Narcejac”, Dictionnaire des litt\u00E9ratures polici\u00E8res, Nantes: Joseph K, 2007, vol. 2, p. 410-411.
  4. A legal agreement with the heir, Claude Leblanc, was necessary to allow Boileau-Narcejac to feature the character; O tempora, o mores!
  5. French original: “Il y a une \u00E9criture Leblanc, dont il n’est pas trop difficile de reproduire le mouvement ; mais il y a, chez Leblanc, une invention, une mani\u00E8re de c\u00F4toyer l’invraisemblance avec naturel et \u00E9l\u00E9gance, qui nous intimidait beaucoup.”
  6. Maciej Eder, Mike Kestemont, and Jan Rybicki. (2013). “Stylometry with R: a suite of tools”. Digital Humanities 2013: Conference Abstracts. Lincoln: University of Nebraska-Lincoln, pp. 487-89. [pre-print].
  7. Here is how to do this (explanation courtesy of Maciej):
    \n1. Run stylo normally:
    \n> results1 = stylo()
    \n2. Assign the frequencies to a variable:
    \n> frequencies1 = results1$frequencies.0.culling
    \n3. Define a new variable with frequencies without stopwords:
    \n> frequencies2 = delete.stop.words(frequencies1, stop.words=c(\"your\",\"stop\",\"words\") )
    \n4. Run stylo again with culled table of frequencies:
    \n> stylo(frequencies = frequencies2).
    \nThat’s it.
  8. Here is the list of stopwords: “je”, “j”, “tu”, “il”, “nous”, “vous”, “elle”, “ils”, “elles”, “moi”, “lui”, “me”,”m”,”se”,”te”,”t”, “ma”, “sa”, “mes”, “son”, “mon”, “ses”,”votre”, “ai”, “a”, “suis”, “est”, “sont”, “ont”, “avais”, “avait”, “avaient”, “avez”, “\u00E9tais”, “\u00E9tait”, “\u00E9taient”, “fus”, “fut”, “fais”, “fait”, “font”, “fis”, “fit”, “sais”, “sait”, “\u00E7a”, “c”, “dis”, “dit”, “disent”, “Ars\u00E8ne”, “Lupin”.
" . "parody" . "Ars\u00E8ne Lupin" . "similarity of texts" . . . "style" . . "2015-01-04T16:35:03Z" . "My research" . "When I enthusiastically present literary scholars with the surprising accuracy stylometric methods display in many cases of authorship attribution (given appropriate conditions, such as sufficient material, a certain homogeneity in the genre of the texts, and state-of-the art distance measures), some come up with a clever question: What if one author parodies the style of another author? Will stylometric methods..." . "https://dragonfly.hypotheses.org/745" . "stylometry" . "pastiche" . "Christof Sch\u00F6ch" . "Maurice Leblanc" .