Les bookmarks de Clem

Point process models for presence-only analysis-a review - Renner - Methods in Ecology and Evolution - Wiley Online Library

Dans la veine des articles sur le sujet en ce moment, celui-ci m'a l'air incontournable! Ya la crème des auteurs incontournables sur le sujet. À récupérer et à lire absolument.

alire · ecologie · stats

February 23, 2015 at 08:59:44 GMT+1 · permalink

·

http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12352/abstract;jsessionid=2320DFED551DDF8E48C6E89D50CCB400.f01t01

Du krigeage et du libre et/ou gratuit: inventaire des solutions dans le domaine spatial (logiciels SIGs, applications, librairies, mais sans doute non exhaustif...) | PortailSIG

Intéressant...

stats

February 20, 2015 at 09:16:43 GMT+1 · permalink

·

http://www.portailsig.org/content/du-krigeage-et-du-libre-etou-gratuit-inventaire-des-solutions-dans-le-domaine-spatial-logici

hierarchical models are not Bayesian models | Xi'an's Og

Petite mise au point par Christian Robert:
(1) Lele et al. ne sont pas les inventeurs du data cloning. L'approche date du début des années 90 et des versions plus efficaces basées sur le recuit simulé ont été développées;
(2) Il y a de fortes chances que le data cloning termine sa course sur le mauvais mode de la vraisemblance en cas de vraisemblance multimodale
(3) lorsque la vraisemblance est multimodale, le choix d'un mode plutôt qu'un autre pour faire l'inférence n'est pas évident, et comme l'indique Christian Robert: In which sense is the MLE more objective than a Bayes estimate, then?
(4) et concernant la critique principale, à savoir l'influence de la prior sur les résultats: "the impact of a prior on some aspects of the posterior distribution can be tested by re-running a Bayesian analysis with different priors, including empirical Bayes versions or, why not?!, data cloning, in order to understand where and why huge discrepancies occur. This is part of model building, in the end."

Le point 4 me paraît au final frappé au coin du bon sens. Je sais que le bayésien pose des problèmes philosophiques à certaines personnes, mais il ne faut pas jeter le bébé avec l'eau du bain. Certes, on peut faire du modèle hiérarchique sans faire de bayésien. Mais ce n'est pas moins subjectif.

stats

February 18, 2015 at 08:52:30 GMT+1 · permalink

·

https://xianblog.wordpress.com/2015/02/18/hierarchical-models-are-not-bayesian-models/

Ecologists need robust survey designs, sampling and analytical methods - Hayward - Journal of Applied Ecology - Wiley Online Library

À lire.

alire · ecologie · stats

February 13, 2015 at 11:22:13 GMT+1 · permalink

·

http://onlinelibrary.wiley.com/doi/10.1111/1365-2664.12408/abstract;jsessionid=AF14A166B0E0B3388F7B8A50D10ACA01.f01t03

Is non-informative Bayesian analysis dangerous for wildlife??? | Xi'an's Og

Tiens? une réponse de Christian Robert à l'article de Subhash Lele critiquant l'utilisation des priors non-informatif. Bon. Faudra vraiment que je lise l'article et la réponse, le débat a l'air intéressant! La conclusion de Christian Robert:

I find it rather surprising that a paper can be dedicated to the comparison of two arbitrary prior distributions on two fairly simplistic models towards the global conclusion that “non-informative priors neither ‘let the data speak’ nor do they correspond (even roughly) to likelihood analysis.”

ecologie · stats

February 12, 2015 at 09:26:54 GMT+1 · permalink

·

https://xianblog.wordpress.com/2015/02/12/is-non-informative-bayesian-analysis-dangerous-for-wildlife/

In praise of exploratory statistics | Dynamic Ecology

Via Mathieu. Ça à l'air bien intéressant!

ecologie · statistique · stats

February 12, 2015 at 09:03:19 GMT+1 · permalink

·

https://dynamicecology.wordpress.com/2013/10/16/in-praise-of-exploratory-statistics/

Miscellaneous math resources | John D. Cook

diverses ressources maths intéressantes. Je stocke ici les titres des docs stockés chez John D. Cook.

Probability and statistics:

How to test a random number generator
Predictive probabilities for normal outcomes
One-arm binary predictive probability
Relating two definitions of expectation
Illustrating the error in the delta method
Relating the error function erf and Φ
Inverse gamma distribution
Negative binomial distribution
Upper and lower bounds for the normal distribution function
Canonical example of Bayes’ theorem in detail
Functions of regular variation
Student-t as a mixture of normals

Other math:

Chebyshev polynomials
Richard Stanley’s twelvefold way (combinatorics)
Hypergeometric functions
Outline of Laplace transforms
Navier-Stokes equations
Picking the step size for numerical ODEs
Orthogonal polynomials
Multi-index notation
The pqr theorem for seminorms

math · stats

February 6, 2015 at 14:46:32 GMT+1 · permalink

·

http://www.johndcook.com/blog/2015/02/04/miscellaneous-math-resources/

Some big news about MAXENT | methods.blog

pas mal de refs intéressantes

stats · écologie

February 5, 2015 at 16:54:44 GMT+1 · permalink

·

https://methodsblog.wordpress.com/2013/02/20/some-big-news-about-maxent/

[1502.00725] Cheaper and Better: Selecting Good Workers for Crowdsourcing

Tiens, c'est intéressant. Je n'ai lu que le résumé, mais ils développent un algorithme d'optimisation combinatoire qui montre qu'il est préférable d'embaucher un petit nombre de gens très compétents pour collecter la donnée, plutôt qu'un grand nombre de gens moyens. Ça a des implications rigolottes.

stats

February 4, 2015 at 08:53:30 GMT+1 · permalink

·

http://arxiv.org/abs/1502.00725

[1502.00483] Is non-informative Bayesian analysis appropriate for wildlife management: survival of San Joaquin Kit Fox and declines in amphibian populations

Une nouvelle charge en bonne et due forme contre le bayésien en écologie.

alire · stats · écologie

February 3, 2015 at 09:48:13 GMT+1 · permalink

·

http://arxiv.org/abs/1502.00483

[1502.00318] Setting the stage for data science: integration of data management skills in introductory and second courses in statistics

Ben décidément, ya des publis intéressantes qui sortent aujourd'hui! En résumé, selon les auteurs, 5 éléments clés à développer:
1. Pensée créative au sujet de la donnée: être capable de modifier la forme de la donnée. Ce que j'appellerais des compétences à établir la donnée.
2. Capacité à gérer des données de différentes tailles (concepts de bases de données, et concepts informatiques associés)
3. Compétences dans un langage de programmation stat (R, python, julia)
4. Apprendre à manipuler des gros jeux de données bien merdiques, pour lesquels il n'y a aucun but ou méthode stat spécifique
5. Un éthos concernant la reproductibilité.

L'article tourne autour de R, de markdown, et du package dplyr qui est décrit dans le détail (faut vraiment que je me mette à dplyr).
Pas mal de discussion autour de la notion de data management, et sur l'importance de maîtriser le SQL.

data · science · stats

February 3, 2015 at 09:07:08 GMT+1 · permalink

·

http://arxiv.org/abs/1502.00318

ESA Online Journals - A Guide to Bayesian Model Selection for Ecologists

Idem, à récupérer et à lire

alire · ecologie · stats

February 3, 2015 at 09:05:54 GMT+1 · permalink

·

http://www.esajournals.org/doi/abs/10.1890/14-0661.1

ESA Online Journals - Using spatio-temporal statistical models to estimate animal abundance and infer ecological dynamics from survey counts

a l'air intéressant. À récupérer et à lire

alire · ecologie · stats

February 3, 2015 at 09:05:19 GMT+1 · permalink

·

http://www.esajournals.org/doi/abs/10.1890/14-0959.1

Crowdsourcing data analysis: Do soccer referees give more red cards to dark skin toned players? - Statistical Modeling, Causal Inference, and Social Science Statistical Modeling, Causal Inference, and Social Science

La VACHE!
29 équipes de data analyst -- 61 analystes appartenant à 13 pays, travaillant dans des domaines divers incluant la psychologie, la statistique, l'économie, sociologie, liguistique, management, avec ou sans PhD, de différents grades -- analysent le même jeu de données pour identifier si les arbitres tendent à plus donner du carton rouge aux noirs. Il y a une variabilité monstre dans les résultats. Pour une question aussi simple, les odds ratio varient de 0.89 à 2.93!!! l'effet varie d'un facteur 1 à 3!!!

Les méthodes varient de la régression logistique fréquentiste ou bayésienne, binomiale ou multinomiale, ZIP, Poisson multilevel, modèles mixtes logistiques ou Poisson, hiérarchiques, régression Poisson,

Dans le résumé: "Crowdsourcing data analysis highlights the contingency of results on choices of analytic strategy, and increases identification of bias and error in data and analysis. Crowdsourcing analytics represents a new way of doing science; a data set is made publicly available and scientists at first analyze separately and then work together to reach a conclusion while making subjectivity and ambiguity transparent".

C'est bluffant comme étude.
Via le blog d'Andrew Gelman (http://andrewgelman.com/2015/01/27/crowdsourcing-data-analysis-soccer-referees-give-red-cards-dark-skin-toned-players/)

stats

January 28, 2015 at 20:59:37 GMT+1 · permalink

·

https://osf.io/j5v8f/

Brian Neelon R Programs

Pas mal de programmes R illustrant l'implémentation MCMC pour différents modèles

bayésienne · stats

January 28, 2015 at 16:19:26 GMT+1 · permalink

·

http://people.duke.edu/~neelo003/r/

xkcd: P-Values

Marrant

marrant · stats · webcomics

January 26, 2015 at 09:41:59 GMT+1 · permalink

·

http://xkcd.com/1478/

Advice for going solo | John D. Cook

Quelques conseils pour ceux qui veulent se lancer dans la consultation statistique indépendante. En substance:
* Tout lâcher et se consacrer à ça exclusivement
* Les choses bougent lentement. Même si on a une piste dès le premier jour, il peut se passer des mois avant qu'elle ne se concrétise
* Plus le client est gros, et plus les choses avancent lentement. Le paiement des factures inclus.
* Avoir du pognon, beaucoup de pognon de côté au début.

divers · stats

January 22, 2015 at 21:02:06 GMT+1 · permalink

·

http://www.johndcook.com/blog/2015/01/22/advice-for-people-going-solo/

Extended Kalman filter example in R

À lire quand j'aurai lu la première partie

alire · stats

January 13, 2015 at 12:14:49 GMT+1 · permalink

·

http://www.magesblog.com/2015/01/extended-kalman-filter-example-in-r.html

A Data Science Rant | Inside Analysis

Un gars qui râle contre l'arrivée de la "data science" dans le milieu des affaires. Et contre cet effet de mode.
Autant je peux comprendre l'agacement des gars qui voient passer un effet de mode par an, autant je suis d'accord avec lui -- de façon générale -- sur sa liste de points. autant certains de ses arguments sont fallacieux, comme

"if there were a particular activity devoted to studying data, then there might be some virtue in the term “data science.” And indeed there is such an activity, and it already has a name: it is a branch of mathematics called statistics. It doesn’t need a name upgrade, or if it does, we should call it Statistics 2.0."

Certes, la statistique est une branche des mathématiques, mais pas l'analyse de données!!!

statistique · stats

December 20, 2014 at 20:51:25 GMT+1 · permalink

·

http://insideanalysis.com/2013/08/a-data-science-rant/

Don't, don't, don't, don't . . . We're brothers of the same mind, unblind - Statistical Modeling, Causal Inference, and Social Science Statistical Modeling, Causal Inference, and Social Science

Encore un post super intéressant de Gelman. Je me rends compte que j'avais mal compris ce concept de data science. Jusque là, je pensais qu'il s'agissait en quelque sorte du métier de biométricien (surtout à cause de post comme celui-ci: http://learnitdaily.com/what-is-a-data-scientist/), et je me rends compte que c'est très très loin d'être la vision dominante.
En fait ce concept de data science est à rapprocher du débat autour des big data. Comme l'indique Gelman:

It’s been said that the most important thing in statistics is not what you do with the data, but, rather, what data you use.

Le concept de data science se rapproche plutôt du premier point. Donc, si je comprends bien, le concept de data science part de la donnée et non du problème. On a un jeu de données, que peut-on en faire?

"the point of data science (as I see it) is to be able to grab the damn data."

En un sens, la question est la même que celle posée par le concept de big data: la donnée existe, on ne peut pas l'ignorer. Après, c'est un peu dérangeant ce côté "oublions la statistique, les statisticiens sont des dinosaures dont on ne doit pas se préoccupper", c'est la porte ouverte à toutes les escroqueries (une collecte opportuniste de la donnée introduit des biais dans l'inférence, c'est inévitable). Mais apparemment, c'est surtout le blogger discuté ici qui a ce comportement, et ça ne reflète pas forcément l'ensemble des data scientists.

Au passage, très bonne remarque:

" So I think it’s important to keep these two things separate: (a) reactions (positive or negative) to the hype, and (b) attitudes about the subject of the hype."

Ce n'est pas parce qu'un sujet est à la mode qu'il est bon. Mais ce n'est pas pour autant qu'il est mauvais.

Bon, d'accord, je lance une veille là-dessus.

data · science · stats

December 16, 2014 at 14:16:58 GMT+1 · permalink

·

http://andrewgelman.com/2014/12/13/dont-dont-dont-dont-brothers-mind-unblind/