Réseau Quetelet

Question bank user guide

Introduction

The question database of the Reseau Quetelet enables a search of data that are archived and distributed by the members of the Réseau (CDSP, CMH-ADISP, INED). It can be carried out using question texts, response category codes and labels and variable names and labels. The question database complements the data catalogue search that enables search using the title or the abstract of the surveys. That searching tool gives the possibility to explore surveys from the questions asked to the interviewees. The question database has three objectives:

  • to assist in questionnaire design,
  • to identify temporal series of questions,
  • to promote secondary analysis of existing data sets.

Home page

Queries on the question database can be performed using words. All that is needed is to type the request in the database search field and click on the "Search" button or just type "Enter" key. Autocomplete features are enabled by default and guarantee successful search results.

By default, the search is carried out in questions texts coming from French studies. It is extended to additional words sharing the same root and does not take into account stop words.

The top menu bar gives access to: advanced search, to user settings to the study list , to the user guide and to the about section menu containing a description of the search engine.

The data query language

Any query carried out in English language gives access to the survey data documented in English. Note that the studies and the queries are not translated. So it is not possible to search at the same time in French and English studies.

Search field

By default, only the "question" field is taken into account, that is the literal meaning of the question as it is asked by the interviewer .

Additionaly, you can extend the search scope to include the two other fields by clicking on "categories" and/or "variable". The "categories" field covers both the code and the label. The "variable" field refers to the variable's name and label.

For example:

  • "What is your marital status ?" refers to the text of the question.
  • The codes and the labels are respectively: 1 - Bachelor, 2 - Married, 3 - Widowed, 4 - Divorced.
  • "MATRI" is the variable name and "marital status of the interviewee" is the variable label.

Simple Search

The question database of the Reseau Quetelet is a search engine. Its features are similar to those of Google or Yahoo.

The keywords that you enter in the search bar are found in the text of the question as it is asked by the interviewer, and possibly in the response categories and the variable name and label.

Typed words format

The question database does not take into account the following items:

  • letter case
  • accented letters
  • words order

The search is not case-sensitive: a search on "english" will yield the same result as "English".

The question database does not take into account the noun's gender and the number of typed terms. By default, the query is extented to the words beginning with the same root, as it is specified according to an algorithm for suffix removing. For example, "political" will search for "politics", but also for "polite".

NB: It is possible to disable the reduction of words to their root forms in the user settings.

Stop words

The search engine ignores stop words (articles, preposition, relative pronouns, etc.) belonging to the following list: English , Français . These common terms are not very meaningful and should have negative effects on the relevance of search results.

NB: Nevertheless, it is possible to include the stop words in your data search by selecting the corresponding option from the user settings.

Autocomplete

When entering a text, autocomplete proposes to expand the character string with respect to the terms indexed in the question database. The propositions are sorted out, for each suggested item, according to the number of search results.

Moreover, autocomplete can provide a set of words enclosed in quotation marks. Quotation marks belong to the query language of the question database and allow users to search for contiguous terms or terms which are separated by stop words.

The query language

Combining words with boolean operators

Boolean operators (AND, OR, NOT) can help you make a better search by combining more items that are relevant. They must be typed in uppercase letters . By default, the search engine uses the AND operator and the results will include all of the items corresponding to the words you have entered.

Examples:

  • If you enter right vote strike, the search will default to the logical operator AND, that is, right AND vote AND strike. Only results with these three terms will be displayed in the window.
  • If you enter strike OR demonstration, the search engine will generate results that contain at least one of these two terms: either strike, either demonstration, either both.
  • If you enter strike NOT right, the search engine will only generate results that contain strike but not right.

NB: The question database does not take into account synonymous, acronyms and abbreviations. OR operator can help you optimize query results (for example: party OR movement, television OR TV...).

Phrase search

If you wish to search for a phrase (or an exact combinaison of words), you just have to enter it with quotation marks. You should be aware that data search is, by default, extented to the words beginning with the same root and does not take into account stop words.

For example, "voting right" within quotation marks will find all items that contain voting and right in that order. But each word can be substituted for another word with the same root (vote, etc.) and occasionally be separated by one or several stop words. By contrast, voting right (without quotation mark) will find all items that contain both right and voting (or any word beginning with the same root), whatever their position in the sentence. If you search for items containing the exact phrase "voting right", you should change the textual parameters in the user settings (excluding words with the same root) and enter it with quotation marks.

Wildcard characters

It is possible to use wildcard characters in order to replace 0, 1 or several letters in a character string.

  • The question mark allows to replace a character. For example str?ke will find strike, stroke or strake.
  • When used within or at the end of a word, asterisk (*) indicates that zero or more of any characters may be matched. For example, politic* will return all items containing politic, politics, political, politically.

NB: You have to disable the reduction of words to their root forms in the user settings. to use this feature.

Searching words with similar spelling

You can search for words with similar spelling by using the tilde symbol (~) at the end of a given word, such like in España (press simultaneously "Alt Gr" key + "2" key) with a similarity index from 0.5 to 0.9. The similarity index between two character strings depends on the number of transformations (insertion, deletion and word substitution) necessary to obtain two identical strings.

For example, politic~0.5 will return items containing politics, political but also policy, police, etc.

NB: You have to disable the reduction of words to their root forms in the user settings. to use this feature.

Proximity between keywords

You can find words within a specific distance from each other by using the tilde (~) symbol at the end of a phrase, followed by an integer specifying the number of words intervening between terms. If you use that option, enter words within quotation marks. For example, to search for party and sympathetic whithin 3 words close to each other, enter: "party sympathetic"~3.

You have to pay attention to the following points:

  • Stop words are not relevant for the proximity operations (except when you check the corresponding option in the user settings)
  • Changing the order in which terms are entered into the search does not affect the results.
  • The text used to evaluate the proximity value is set up in the following order: the question text > response categories > variable label.

Boosting a term

A word or a phrase can be "weighted" more heavily when terms are followed by a caret (^) and a boost factor (an integer). Example : Entering "diploma father^2" will boost "father" compared to "diploma".

That procedure does not affect the nature and number of search results. It just modifies the relevance score and thus determins the display order of search results.

Creating a complex query

When you create a complex query which combines multiple operators, you can use parentheses to determin the scope of operation delimiters. For example, if you are searching for questions combining Extreme-Right and immigration, use the query: ("extreme right" OR "nationalism") AND (immigr* OR foreign*)

Results display

Results sorting

By default, results are sorted by their relevancy score. This score depends on the number of occurrences of search terms whithin the target field and the right combinations of these terms with the query results. Higher is the number of occurrences of a query term, higher is the relevancy score. The operation takes also into account the frequency of each term listed in the question database. The presence of unusual keywords increase the relevance score. Typed words receives a higher weight than the other words beginning with the same root.

Other sorting criteria are available. The search results can be sorted by:

  • data producer
  • study date (ascending or descending order)
  • position of the question in the data file (ascending or descending order)

Question description

The default search results are the following:

  • the question text
  • all the items and the associated variables in case of battery of questions,
  • the response categories
  • the variable's name and label, together with a link to frequencies and summary statistics,
  • the study with a link to its description.

By default, the search engine hides the question items and response categories to reduce information displayed in the results page.

loupe When clicking on the zoom icon, you can see the complete list of items and categories.

It is possible to modify the results display, for example to display 5 categories per result.

NB: When clicking on user settings, you can complete the question description. For example, you can choose to display the interviewer instructions, the filter question, the position of the question in the data file, etc. Question can be appropriately contextualized by simply adding the link to the questionnaire and the navigation to the previous and following questions.

Filtering results

By clicking on the left menu, you can get more refined results, i.e., by:

  • data producer
  • survey serie
  • study
  • decade
  • searching within results
  • concept

NB 1: By clicking on "Searching within results", you can use the query language.

NB 2: In order to refine the concept based-search, check the corresponding option in the user settings. The concept associated to a question is not available unless the survey has been thematized by the producer. Only the pre-electoral and post-electoral surveys, produced by the CEVIPOF, are thematized.

Clicking on "Filter results" to validate the checked refinings or the additional keywords.

Clicking on "Reset filters" in the refining menu to review your initial results again.

Customization of the filter menu

It is possible to customize the order of refining units. For example, you can move the refining per "Studies" by clicking on the title unit, sliding it up or down and then putting it to the place of your choice.

The display menu is automatically saved after each edit.

The refining units can be sorted by:

  • checked/unchecked refinings
  • alphabetical order (for producers, survey series, single surveys) and chronological order (for decades)
  • number of search results (ascending or descending order)

Results selection and export

ajouter In order to save data results in the basket, click on "Add to selection".

When clicking on the basket, the selected data results will be presented with a summary table. The study, the producer, the distributor, the variable's label and name, the question text and the response categories are recalled.

supprimer Delete these results from the selection.

tout supprimer Delete all the results from the selection.

csv Export the selected results in CSV format.

xls Export the selected results in Excel format.

User settings

User settings option is used to personalize searches, i.e. textual analysis and questions description. It is also possible to configure the interface.

Once the modifications are completed, you need to save settings for them to be considered in the next searches.

Textual analysis parameters

  • Searches extension to the words with the same root

    This option is selected by default so that the query is not limited to the search of the exact word. That facilitates the searching process in offering a more flexible query - by gender, number, names, adjectives, conjugate verb forms, etc.

    The algorithm used to determin the root of words is based on the knowledge of the grammatical and syntactic rules of natural language. A root word is not necessarily a real word.

    NB 1: For this reason, some results may appear as not relevant.

    When unchecking that option in user settings, the engine will use as default search terms as they have been entered.

    NB 2: By using wildcard characters, you make search with exact words more flexible without extending search to the words with the same root.

  • Stop words

    By checking that option, stop words (articles, preposition, relative pronouns) will be taken into account.

Interface set up

It is possible to modify search form and results display parameters :

  • uncheck the autocomplete,
  • display between 5 and 25 results per page
  • choose a default sorting criteria other than relevance score ,
  • highlight matches in yellow.

Results description

The question description is limited, by default, to the question text, the response categories, the associated variable and the study which contains the question. It can be completed by:

  • the concept
    NB: The concept associated to a question is not available unless the survey has been thematized by the data producer. This option makes it possible to refine results per concept. Only the pre-electoral and post-electoral surveys, produced by the CEVIPOF, are thematized.
  • the interviewer instructions
  • the texts following and preceding the question
  • the question universe (if different from the study universe) or the question filter
  • the question position in the data file (which provides, in general, an approximation of the question position in the questionnaire).
  • additional information about variable coding, mode of data collection specific to this question, etc.

It is possible to contextualize the question by adding:

  • the link to the questionnaire(s) if files are available
  • the navigation to the previous and following questions

Advanced search

Query help

The advanced search mode allows you to generate complex queries without needing to know the language query.

Once you have selected the search language (French, by default), you have to define search filters, that is:

  • Search field (question text, response categories, variable's name and label or all of them at once)
  • Combining operations between inclusion/exclusion rules ("must contain", "may contain", "must not contain") and: "all of these words" (AND operator), "one or more of these words" (OR operator), " the exact phrase" (quotation marks), "these two words are distant from one another of at most..." (Tilde symbol).

Search in several fields

The main advantage of the advanced search mode is the ability to combine more than one search field.

Example: In order to search for questions about France in excluding those about birth country, you can proceed as following:

  • Filtre 1 = all | must contain | all of these words | France
  • Filtre 2 = question | must not contain | all of these words | birth country
  • Filtre 3 = question | must not contain | all of these words | place birth

Other search modes

The privileged use of the Question database is to search in the whole of questions and variables available in the Réseau Quetelet and then filter the results. Nevertheless, it can be adapted to most specific needs :

  • From a survey serie
    It is possible to limit one's search to a serie in particular. From the serie description (accessible with the study list), click on "All of the questions and variables from this serie" in order to download the results of the given serie.
  • From a study
    It is possible to explore a survey in particular. From the study description (accessible with the study list), click on "All of the questions and variables" in order to download the results of the given study.
  • From a concept
    It is possible to generate all of the questions associated with a concept. From the concept list (accessible with the study list), click on a given concept and you will obtain the whole set of questions associated with it.NB: Only the pre-electoral and post-electoral surveys, produced by the CEVIPOF, are thematized.

In order to refine search results, enter keywords and more complex queries by clicking on "Searching within results".