Les bookmarks de Clem

Intéressant : une explication ultra-intuitive de ce qu'est un qqplot de la loi normale. Je galère toujours à expliquer ça aux collègues, et l'explication ici est limpide. On remplit deux vases d'une certaine forme (dont l'un ayant une forme gaussienne) avec de l'eau versée à un certain débit, et on trace la hauteur du niveau d'eau dans le deuxième vase en fonction de la hauteur du niveau d'eau dans le vase gaussien. Ya même une application sous R, mais comme il le dit lui-même, ce n'est pas forcément nécessaire, les gens comprennent assez rapidement sans ça...

base · stats

September 13, 2017 at 09:23:47 GMT+2 · permalink

·

http://tandfonline.com/doi/full/10.1080/00031305.2016.1200488

Multivariate Chebyshev Inequality With Estimated Mean and Variance: The American Statistician: Vol 71, No 2

Article TRES intéressant : il s'agit d'une extension de l'inégalité de Bienaymé-Tchebycheff au cas multivarié. En clair, cet article donne une borne à la distance de Mahalanobis calculé sur un échantillon tiré de n'importe quelle distribution. Je vois une application rigolotte avec l'utilisation de la Distance de Mahalanobis pour la mesure de suitability de l'habitat (Clark et al. 1993). Si lambda est la distance calculée entre un point disponible donné sur une zone d'étude et la niche de l'espèce sur cette zone, l'inégalité donne une borne supérieure sur P(D2<ĺambda), quelle que soit la forme réelle de la niche (multinormale ou pas). Ce serait intéressant de voir si ça permet de construire des cartes de suitability utiles... À suivre...

habitat · mahalanobis · niche · selection · stats

September 13, 2017 at 09:02:38 GMT+2 · permalink

·

http://tandfonline.com/doi/full/10.1080/00031305.2016.1186559

A Simple Parametric Model Selection Test: Journal of the American Statistical Association: Vol 0, No 0

À lire aussi. Décidément, j'avais du retard dans ma veille, yavait des choses intéressantes.

model · selection · stats

September 13, 2017 at 08:30:43 GMT+2 · permalink

·

http://www.tandfonline.com/doi/full/10.1080/01621459.2016.1224716

Basis Function Models for Animal Movement: Journal of the American Statistical Association: Vol 112, No 518

A lire aussi. Productif le Hooten...

alire · stats

September 13, 2017 at 08:28:30 GMT+2 · permalink

·

http://amstat.tandfonline.com/doi/full/10.1080/01621459.2016.1246250

Bayesian Synthetic Likelihood: Journal of Computational and Graphical Statistics: Vol 0, No 0

À lire. M'a l'air rigolo.

alire · stats

September 13, 2017 at 08:26:24 GMT+2 · permalink

·

http://amstat.tandfonline.com/doi/full/10.1080/10618600.2017.1302882

Multivariate Chebyshev Inequality With Estimated Mean and Variance: The American Statistician: Vol 71, No 2

Semble intéressant. À lire

alire · stats

July 18, 2017 at 08:06:01 GMT+2 · permalink

·

http://www.tandfonline.com/doi/full/10.1080/00031305.2016.1186559

A farewell to the sum of Akaike weights: the benefits of alternative metrics for variable importance estimations in model selection - Galipaud - 2017 - Methods in Ecology and Evolution - Wiley Online Library

Un débat à lire

AIC · statistiques · stats

June 23, 2017 at 21:53:10 GMT+2 · permalink

·

http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12835/abstract

ACP et données binaires

De Jolliffe (2002), page 339 : "It is true that variances, covariances and correlations have especial relevance for multivariate normal x, and that linear functions of binary variables are less readily interpretable than linear functions of continuous variaibles. However, the basic objective of PCA -- to summarize most of the 'variation' that is present in the original set of p variables using a smaller number of derived variables -- can be achieved regardless of the nature of the original variables.

For data in which all variables are binary, Gower (1966) points out that using PCA *does* provide a plausible low-dimensional representation. This follows because PCA is equivalent to a principal coordinate analysis based on the commonly used definition of similarity between two individuals (observations) as the proportion of the p variables for which the two individals take the same value."

Donc oui, on peut faire de l'ACP sur données binaires et ça a du sens. Je stocke ça ici.

ACP · PCA · statistique · stats

June 21, 2017 at 16:20:06 GMT+2 · permalink

·

http://caloine.ouvaton.org/shaarli/?vMLL8Q

xkcd: Machine Learning

Un xkcd que je me mets sous le coude. Très belle illustration de ce que certains pensent être le machine learning.

marrant · stats

May 17, 2017 at 09:19:03 GMT+2 · permalink

·

https://xkcd.com/1838/

Saturday Morning Breakfast Cereal - Blood of the Bayesian

Encore un SMBC marrant

bayesien · divers · marrant · stats

May 11, 2017 at 12:30:23 GMT+2 · permalink

·

http://www.smbc-comics.com/comic/blood-of-the-bayesian

Piecewise regression - Cross Validated

Comment les splines rejoignent la piecewise regression. Je n'y avait jamais pensé, mais une régression par segment est effectivement un ajustement de spline...

regression · splines · stats

March 31, 2017 at 13:54:33 GMT+2 · permalink

·

https://stats.stackexchange.com/questions/93633/piecewise-regression

Logistic distribution - Wikipedia

Oui, j'aurais dû m'en douter:
Si $X \sim U(0, 1)$ alors $\log(X/(1−X)) \sim Logistic(0, 1)$
Autrement dit, si X suit une loi uniforme entre 0 et 1, le logit de X suit une loi logistique (0,1).

Exemple sous R:
oo <- rlogis(10000)
hist(exp(oo)/(1+exp(oo)))

Ce dernier histogramme est bien uniforme. C'est assez pratique pour définir, dans un modèle bayésien, une prior sur Y=logit(X) en s'assurant que la prior de X est uniforme entre 0 et 1.
Quand il est plus pratique de définir Y comme paramètre d'intérêt (e.g. dans un metropolis avec une proposal gaussienne, quand c'est merdique d'avoir des bornes et qu'on ne veut pas passer son temps à jongler entre les logit et inverse logit).

maths · stats

March 20, 2017 at 13:35:34 GMT+1 · permalink

·

https://en.wikipedia.org/wiki/Logistic_distribution

Factoextra R Package: Easy Multivariate Data Analyses and Elegant Visualization - Easy Guides - Wiki - STHDA

Via Mathieu. Ya deux-trois graphiques qui laissent rêveurs... à creuser plus en détail.

multivarié · R · statistiques · stats

February 21, 2017 at 15:28:21 GMT+1 · permalink

·

http://www.sthda.com/english/wiki/factoextra-r-package-easy-multivariate-data-analyses-and-elegant-visualization

La fièvre des diplômes de big data

A lire

bigdata · informatique · stats

February 9, 2017 at 11:32:54 GMT+1 · permalink

·

http://mobile.lemonde.fr/campus/article/2017/02/08/la-fievre-des-diplomes-de-big-data_5076480_4401467.html?xtref=

How to find the right answer when the 'wisdom of the crowd' fails : Nature News

Un algorithme qui va essayer d'identifier la bonne réponse à une question factuelle posée à une foule: la bonne réponse n'est pas nécessairement la réponse majoritaire.
À creuser.

expert · informatique · stats

January 27, 2017 at 09:25:33 GMT+1 · permalink

·

http://www.nature.com/news/how-to-find-the-right-answer-when-the-wisdom-of-the-crowd-fails-1.21370?WT.mc_id=TWT_NatureNews